The Database of Occupations will Jump from 1,700 to 5,000 in 2016

By Kea Tijdens, University of Amsterdam/AIAS, research coordinator WageIndicator

One of the treasures of the WageIndicator is its database of occupational titles. This database is used in three instances:

  1. In the Salary Check, where visitors have to self-identify their occupation before receiving a salary advice
  2. In the web survey, for answering the survey question: ‘What is your occupation?’
  3. In all of the approx. 400 Jobs & Salary pages, which serve as landing pages for a very wide range of search terms, using Google Search, BING, or other search engines.

Search Tree and Word Matching

In the Salary Check and the web survey, visitors can find their occupation through a search tree or by text string matching. In these ways visitors can self-identify their occupation. A search tree or an ‘IPod menu’ (as it is sometimes called) allows visitors to navigate through the occupation database at three levels. Alternatively, semantic matching allows visitors to self-identify their occupation by typing text whereby matches with words in the database are instantly shown. Visitors can then select the most relevant match.

For all WageIndicator countries the occupation database currently holds between 1,600 and 1,700 occupational titles, which are translated in all 43 languages of the WageIndicator websites. However, in some countries some occupational titles do not exist or cannot be translated. For this reason the number of titles varies slightly across countries.

Occupational Database in Great Detail

Over the years, the current database has gradually grown. It started with a few hundred occupational titles in the Netherlands, which then were translated in the languages of neighboring countries. Thanks to the EurOccupations project (2007-2009), the occupational database could be extended to more than a thousand titles. In addition, the titles could all be classified according to the International Standard Classification of Occupations, version 2008 (ISCO-08). This classification is maintained by the International Labor Organisation, ILO. ISCO-08 is a hierarchical classification that distinguishes nine major groups at the highest level of aggregation, stepwise breaking these groups down into 433 occupational units at the classification’s lowest 4-digit level. Our database holds occupational titles in greater detail, notably at 5-digit level.

In the Salary Check, if the web survey generates insufficient observations to compute a wage, several occupations are taken together and a salary is computed for a 4-digit occupational unit. If observations are still insufficient, the broader group at 3-digit is used, or even at 2-digit. In the Jobs & Salary pages all titles are clustered into the 4-digit categories. Hence, there are 433 web pages, each with a list of nested occupational titles. These occupational titles are also included in the description of the page. In this way, visitors may easily navigate through the WageIndicator websites.

Trebling of the Database over the Coming Year

Although 1,600 occupational titles may seem to be a large number, one has to take into account that the number of occupational titles in any country may easily add up to over ten thousand. Web visitors will use any of these thousands of titles to search for information on the web. Therefore, our database should permanently increase its number of occupational titles to meet the search for so many occupational titles.

Recently the European Union funded a project, i.e. SERISS, to enlarge the occupational database to 5,000 titles, all coded according to ISCO-08, and to include more countries and languages. By the end of 2015 or early next year, the WageIndicator database will be increased to approx. 5,000 occupational titles. Such a large database can no longer be navigated by means of a search tree. Therefore in the Salary Check and in the web survey we have to rely solely on semantic matching. For the Jobs & Salary pages this project implies that our websites can meet a wider set of search terms than before.