This repository hosts material for 4x3hours lectures in the context of the Introduction to Natural Language Processing (NLP) class from PSL's Master of Digital Humanities, Fall 2025.
The code and notebooks for the tutorials and hands-on sessions are provided in the code folder. The data used for these sessions is described and stored in data.
- Slides: preview
html,pdf - Notebook(s): BERT Discovery, Word Sense Disambiguation, Semantic Shifts
- Key notions: n-gram, transformers, self-attention, context, masked language model / causal language model
To go further
- (J. Alammar, 2018): The Illustrated Transformer blog post by Jay Alammar.
- (Ghaseminejad Raeini, 2025): The evolution of language models: From N-Grams to LLMs, and beyond.
- (Allen & Hospedales, 2019): Analogies Explained: Towards Understanding Word Embeddings.
Want more hands-on? Check the To go further section in code folder.
- Slides: preview
html,pdf - Notebook(s): Custom BERTopic, Topic Modeling UN General Debates Speeches
- Key notions: document representation, BoW, SentenceTransformer, cosine similarity, topic modeling, BERTopic, LDA
To go further
Dimensionality Reduction:
- (Coenen & Pierce, 2019): Understanding UMAP: explanations and visual demonstration of UMAP (compared with t-SNE).
Topic Modeling:
- (Churchill & Singh, 2021): The Evolution of Topic Modeling.
- (Li et al., 2024): Applying Topic Modeling to Literary Analysis: A Review.
- (Gillings & Hardie, 2022): The interpretation of topic models for scholarly analysis: An evaluation and critique of current practice.
- (Antoniak, 2023): Topic Modeling for the People: an interesting blogpost by Maria Antoniak, sharing a set of steps that you can follow to get coherent topics from most datasets, primarily focusing on LDA. It provides as well many additional references to dig deeper.
- (Egger & Yu, 2022): A Topic Modeling Comparison Between LDA, NMF, Top2Vec, and BERTopic to Demystify Twitter Posts.
- Evaluation Concerns:
- (Chang et al., 2009): Reading Tea Leaves: How Humans Interpret Topic Models.
- (Hoyle et al., 2021): Is Automated Topic Model Evaluation Broken?: The Incoherence of Coherence.
Want more hands-on? Check the To go further section in code folder.
- Slides: preview
html,pdf - Notebook(s): BERT Fine-Tuning Tutorial, Canonicity Prediction Challenge: Performance and Fairness
- Key notions: classification, supervised fine-tuning, performance metrics, fairness
To go further
Text Classification for DH
- (Bamman et al., 2024): On Classification with Large Language Models in Cultural Analytics.
- (Lassen et al., 2024): Literary Canonicity and Algorithmic Fairness: The Effect of Author Gender on Classification Models.
Fairness & Bias
- (Solon Barocas, Moritz Hardt, Arvind Narayanan, 2023): FAIRNESS AND MACHINE LEARNING - Limitations and Opportunities — Full book available with additional resources.
- (Irving & Askell, 2019): AI Safety Needs Social Scientists.
- (Blodgett et al., 2020): Language (Technology) is Power: A Critical Survey of ``Bias'' in NLP.
- (Hovy & Prabhumoye): Five sources of bias in natural language processing.
- (Gallegos et al., 2024): Bias and Fairness in Large Language Models: A Survey.
Interpretability
- Interpretability Blog-post
- (Olah et al., 2018): The Building Blocks of Interpretability: Mainly focusing (or applying) on computer vision, but a very nice and illustrated article on interpretability.
Want more hands-on? Check the To go further section in code folder.
- Slides: preview
html,pdf - Notebook(s): Tutorial_4_LLM_Interaction.ipynb; Hands_on_4_EvalLLM.ipynb
- Key notions: LLMs, autoregressive, pre-training, alignment, evaluation
To go further
LLMs:
- (Cho et al., 2024): Interactive visualisation, with explanations, of the inner working of causal language models. Very nice visualisation and summary of transformer-based LMs! 👀 If you like these visualisation, check also the LLM Visualization by Brendan Bycroft.
- (Zhao et al., 2023): A Survey of Large Language Models. — Comprehensive review of recent advances related to LLMs, background, key findings, mainstream techniques, etc.
LLMs, Biases and Fairness
- (Gallegos et al., 2025): Bias and Fairness in Large Language Models: A Survey
Want more hands-on? Check the To go further section in code folder.