Web-based Data Collection – Explorations in Africa, Asia and South America

By Stephanie Steinmetz University of Amsterdam/AIAS , WageIndicator researcher, and Kea Tijdens, University of Amsterdam/AIAS, research coordinator WageIndicator


Throughout the last decades, surveys have become a part of daily life for people in most developed countries. They are used by so many organizations for so many purposes that one gets the impression that conducting a survey is a very easy undertaking. However, looking at the vast amount of literature on survey methodology, it becomes clear how difficult it is to conduct a high quality survey. In particular with the increased and widespread use of different types of web surveys and the question of the representativeness of resulting data, the discussion has been continuously fueled with questions about data quality and reliability for scientific use. Findings for developed countries have consistently shown that, in particular, non-probability web surveys are affected by sample biases, such as an overrepresentation of the young, the higher educated, and men (De Pedraza et al., 2007; Steinmetz et al., 2014). Therefore, it seems logical that no claims of representativeness for the general population can be made on the basis of such data. However, at the same time, it has also been underlined that ‘[t]here are times when a non-probability online panel is an appropriate choice. […] there may be survey purposes and topics where the generally lower cost and unique properties of Web data collection is an acceptable alternative to traditional probability-based methods.’ (AAPOR Online Panels, 2010, p. 5)

Potential for Mixed-mode Approaches in Developing Countries

Against this background, we have explored in a recent publication (Tijdens and Steinmetz 2015) in how far volunteer web surveys might offer a useful tool and an alternative to data collection in developing countries. To answer this question is particularly urgent for two reasons. First, there is increasing demand for current and detailed data which are indispensable for economic and social policy analysis and evaluation, as well as for development planning and program management. Second, this increasing demand might also lead to an increase in the emergence of web surveys in developing countries. Due to a lack of proper sampling frames, these surveys will often be of a non-probability nature. As a consequence, it seems of utmost importance to include developing countries in the discussion of sample bias and representativeness which might shed light also on the question whether the bias is related to Internet usage, and the potential to use the web in mixed-mode approaches in these countries.

Comparison of Survey Types

Since no researchers thus far have investigated whether similar methodological challenges of web surveys exist for developing countries, and under what circumstances web surveys might be a useful or complementary tool for surveying understudied populations, we wanted to examine two aspects in more detail: 1) in how far data of similar probability-based face-to-face and non-probability-based web surveys are comparable with available labor force data and 2) whether and, if so, how the web samples differ from face-to-face samples, and how the differences are related to low or high Internet usage rates.

For the analyses we used the pooled WageIndicator data of the comparable web and face-to-face surveys (2009–2013) covering the national labor force in 10 developing countries on three continents (Guatemala, Honduras, and Paraguay in Latin America; Indonesia in Asia; and Ghana, Kenya, Mozambique, Senegal, Uganda, and Zambia in Africa) as well as available labor force data stemming from ILO’s Economically Active Population Estimates and Projections (EAPEP 6th edition, published by the International Labor Organization, 2011).

With respect to the question whether and how different types of data collection (face-to-face and web) differ from available labor force data, we could show that 1) face-to-face samples resemble the labor force more closely than web samples; 2) in both samples, younger and older men and women can be classified as ‘hard-to-reach’ groups; and 3) in both samples and for both genders, individuals in their 20s and early 30s represent ‘easy-to-reach’ groups (see figure 1). Finally, we could also demonstrate that the observed sample differences seem to converge in countries with higher Internet usage rates.

Figure 1: Relative Difference for women (above) and men (below) between the F2F (left), the WEB (right) & the population, across ten age groups & for ten countries


Comparison of Results in Developed and Developing Countries

To answer our second question, i.e. whether the web and face-to-face samples for the 10 countries are comparable with respect to selected variables of interest (age, gender, household composition, and education), our results show that the samples are not comparable. Interestingly, our findings reveal a coherent picture for most of the selected co-variates across almost all countries, which is much in line with findings for developed countries. In comparison with face-to-face respondents, web respondents are, on average, younger, they more often live alone, and they are higher educated. The only divergent pattern emerges for gender, where we could not observe a coherent pattern across the selected developing countries. In addition, we also wanted to examine whether the observed sample bias can be related to Internet usage rates. However, the results could only partially confirm the assumption that the observed differences between the web and face-to-face samples are smaller in countries with a higher Internet usage rate.

Web-surveys and Sub-populations May Be a Good Match

Finally, when reflecting on the findings and the question whether web surveys might be a useful or supplementary tool for data collection in developing countries, two pictures can be painted: on the one hand, our analyses show that also in developing countries volunteer web surveys are affected by serious biases when compared with the national labor force or comparable probability-based face-to-face surveys. This also implies that claims of generalizability can hardly be made on the basis of these data. On the other hand, one should not dismiss volunteer web surveys as irrelevant. As indicated above, they might be very helpful when it comes to sub-populations, such as specific age or educational groups. Our analyses have shown that also in developing countries, particularly women and men in their 20s or early 30s and higher educated respondents are easy to reach via the web.

Given the cost differences between the two modes of data collection, this finding might be useful to further investigate the possibility of using mixed-mode surveys to target such groups that are more likely to respond via the Internet. This might become particularly relevant when Internet usage increases further in developing countries, thereby reducing the bias of these data overall. One of the most important points to be considered in this context is probably that volunteer web surveys enable explorative and up-to date insights into, for instance, the income and labor market situations of people and populations about which we have no or only rudimentary information.


AAPOR. (2010). Report of the AAPOR task force on online panels. Deerfield, IL: American Association for Public Opinion Research.Babbie, E. (1973). Survey research methods. Belmont, California: Wadsworth Publishing Company. NPS_TF_Report_Final_7_revised_FNL_6_22_13.pdf

International Labor Organization (2011). Estimates and projections of the economically active population, 1990–2020 (6th edition). Geneva: International Labor Organization. [http://laborsta.ilo.org/applv8/data/EAPEP/eapep_E.html]

De Pedraza, P., Tijdens, K., & Muñoz de Bustillo, R. (2007). Simple bias, weights and efficiency of weights in a continuous voluntary web survey. AIAS Working Paper 07–58, Amsterdam: University of Amsterdam.

Steinmetz, S., Bianchi, A., Biffignandi, S., & Tijdens, K. (2014). Improving web survey quality – Potentials and constraints of propensity score weighting. In Callegaro, M., Baker, R., Bethlehem, J., Göritz, A., Krosnick, J., & Lavrakas, P. (Eds.). Online panel research: A data quality perspective (pp. 273–298). (Wiley Series in Survey Methodology). Hoboken: Wiley.

Tijdens, K.; S. Steinmetz (2015): Is the web a promising tool for data collection in developing countries? Comparing data from ten web and face to face surveys from Africa, Asia and South America. International Journal of Social Research Methodology.