This repository provides a methodology for analyzing the relationship between technological innovation and occupational tasks. Specifically, it measures how advancements in robotics and artificial intelligence (AI) affect job exposure to automation. By utilizing patent data and semantic analysis, we construct a framework for evaluating the substitution potential of occupational tasks by emerging technologies.
We use the task descriptions from the UK Standard Occupational Classification 2010 (UK SOC 2010), encompassing 363 occupations (ONS, 2010). These descriptions provide granular insights into the inputs required for various job roles. For illustrative purposes, examples of task descriptions for two occupations are included in Table 1.
We rely on patent data from the Google Patents Public Dataset, focusing on patents granted between 1980 and 2020. The dataset includes approximately 1,300,000 worldwide patents published in English. Patents are classified into two technological categories:
- Robot family: Includes patents with titles containing terms like "robot," "mechatronics," "cyber-physical systems," etc.
- AI family: Includes patents with terms like "artificial intelligence," "machine learning," "neural network," etc.
We adopt Webb’s (2019) approach of quasi-labeling patents using predefined keywords. While the procedure does not ensure unique labels, a qualitative review of verb-noun pairs demonstrates sufficient differentiation between technology categories.
To measure the semantic similarity between patents and job tasks, we utilize the Bidirectional Encoder Representations from Transformers (BERT) model (Devlin et al., 2018). BERT’s contextual representation capabilities allow us to compute dense vector embeddings for both patent descriptions and job tasks.
We calculate semantic similarity using the cosine similarity metric between the dense vectors. A matrix
Here,
Following Autor et al. (2024), we retain the top 15% highest similarity scores, applying the threshold:
where
For each occupation
This measure aggregates exposure across all relevant patents and tasks over time.
If you use this repository, please cite:
Martin Lábaj, Tomáš Oleš, Gabriel Procházka. "Impact of Robots and Artificial Intelligence on Labor and Skill Demand: Evidence from the UK." Forthcoming in Eurasian Business Review, 2025.
- data/: Processed datasets, including patent data and occupational task descriptions.
- figures/: Visualizations of patent trends, similarity distributions, and exposure scores.
This project is licensed under the MIT License. See LICENSE
for details.
We welcome contributions! Please open an issue or submit a pull request for any suggestions or improvements.