Belloni, M., & Tijdens, K.G. (2019). Occupation > industry predictions for measuring industry in surveys. Amsterdam, University of Amsterdam, AIAS-HSI Working Paper 5

Belloni, M., & Tijdens, K.G. (2019). Occupation > industry predictions for measuring industry in surveys. Amsterdam, University of Amsterdam, AIAS-HSI Working Paper 5

Access the publication here: 


Many questionnaires have a question “Please write the main business activity of the organisation where you work”. The answer is commonly asked as an open text field, challenging the survey holder to code the response into an industry classification. Alternatively, in a web-survey respondents can self-identify their industry from a database. Task 8.4 in SERISS includes two deliverables, D8.10 and D8.11. For D8.10 a database of 321 industry names was developed and translated for use in 99 countries, all coded in 3- or 4- digits according to the classification NACE Rev. 2. The database facilitates survey respondents to self-identify their industry from this lookup table by either an autosuggest box or a two-level search tree. Concerning D8.11, the WageIndicator web survey shows that respondents tend to skip the question about industry relatively more often compared to other questions, presumably because they judge answering the question as cognitively too demanding. Therefore, for D8.11 an occupation>industry prediction has been developed, providing survey respondents with a limited set of industries, most likely for their occupation. Of course, the limited list of industries, shown to the respondent, always includes an option ‘other’, with the full look-up table shown in the next step.

A multi-country occupation>industry prediction for 4-digit ISCO-08 occupations requires a dataset large enough to include as many countries as possible from among those covered by WP8. Such multi-country datasets do not exist and therefore we decided to merge datasets from several sources. We relied on the most recent waves of ESS and EWCS which use classification structures which are homogeneous and currently in place. In addition to CAPI surveys, we exploited some web-surveys, mostly the WageIndicator database. The initial idea to include controls in the predicting equations using auxiliary variables was dropped in favour of a pooled dataset NACE Rev. 2 with valid observations for only two variables: a 4- digit ISCO-08 code and a 2-digit NACE Rev. 2 code. We explored country-differences, but it turned out that the estimated most likely industries were very similar across country groups.

We then estimate a set of linear probability models (LPM) – one for each ISCO code. An LPM is a multiple linear regression model with a binary dependent variable (Wooldridge, 2010) – equal to one if the observation reported that specific ISCO unit group and 0 otherwise; the explanatory variables are given by a full set of dummy variables for the 88 divisions (i.e. 2 digits groups) included in the NACE Rev. 2 code. Estimated coefficients represent marginal effects and can be directly interpreted as a probability that each NACE division is associated with that specific ISCO group.

This paper is written as part of the Synergies for Europe's Research Infrastructures in the Social Sciences (SERISS) project, funded under the European Union’s Horizon 2020 research and innovation programme GA No: 654221. AIAS/HSI is a SERISS partner. This paper was a SERISS Deliverable 8.11. The deliverable’s accompanying database with the results of the occupation>industry predictions for 4-digit occupational units is downloadable at .