Introduction to Natural Language Processing (NLP) — DH PSL, Fall 2025

This repository hosts material for 4x3hours lectures in the context of the Introduction to Natural Language Processing (NLP) class from PSL's Master of Digital Humanities, Fall 2025.

Week 1 (29/10)
Week 2 (05/11)
Week 3 (12/11)
Week 4 (19/11)

The code and notebooks for the tutorials and hands-on sessions are provided in the code folder. The data used for these sessions is described and stored in data.

Week 1 (29/10): Modeling Language: Towards Contextualized Representations

Slides: preview html, pdf
Notebook(s): BERT Discovery, Word Sense Disambiguation, Semantic Shifts
Key notions: n-gram, transformers, self-attention, context, masked language model / causal language model

To go further

(J. Alammar, 2018): The Illustrated Transformer blog post by Jay Alammar.
(Ghaseminejad Raeini, 2025): The evolution of language models: From N-Grams to LLMs, and beyond.
(Allen & Hospedales, 2019): Analogies Explained: Towards Understanding Word Embeddings.

Want more hands-on? Check the To go further section in code folder.

Week 2 (05/11): Discovering Structure: Semantic Spaces & Unsupervised Modeling

Slides: preview html, pdf
Notebook(s): Custom BERTopic, Topic Modeling UN General Debates Speeches
Key notions: document representation, BoW, SentenceTransformer, cosine similarity, topic modeling, BERTopic, LDA

To go further

Dimensionality Reduction:

(Coenen & Pierce, 2019): Understanding UMAP: explanations and visual demonstration of UMAP (compared with t-SNE).

Topic Modeling:

(Churchill & Singh, 2021): The Evolution of Topic Modeling.
(Li et al., 2024): Applying Topic Modeling to Literary Analysis: A Review.
(Gillings & Hardie, 2022): The interpretation of topic models for scholarly analysis: An evaluation and critique of current practice.
(Antoniak, 2023): Topic Modeling for the People: an interesting blogpost by Maria Antoniak, sharing a set of steps that you can follow to get coherent topics from most datasets, primarily focusing on LDA. It provides as well many additional references to dig deeper.
(Egger & Yu, 2022): A Topic Modeling Comparison Between LDA, NMF, Top2Vec, and BERTopic to Demystify Twitter Posts.
Evaluation Concerns:
- (Chang et al., 2009): Reading Tea Leaves: How Humans Interpret Topic Models.
- (Hoyle et al., 2021): Is Automated Topic Model Evaluation Broken?: The Incoherence of Coherence.

Want more hands-on? Check the To go further section in code folder.

Week 3 (12/11): Learning Patterns: Supervised Tasks and Adaptation

Slides: preview html, pdf
Notebook(s): BERT Fine-Tuning Tutorial, Canonicity Prediction Challenge: Performance and Fairness
Key notions: classification, supervised fine-tuning, performance metrics, fairness

To go further

Text Classification for DH

(Bamman et al., 2024): On Classification with Large Language Models in Cultural Analytics.
(Lassen et al., 2024): Literary Canonicity and Algorithmic Fairness: The Effect of Author Gender on Classification Models.

Fairness & Bias

(Solon Barocas, Moritz Hardt, Arvind Narayanan, 2023): FAIRNESS AND MACHINE LEARNING - Limitations and Opportunities — Full book available with additional resources.
(Irving & Askell, 2019): AI Safety Needs Social Scientists.
(Blodgett et al., 2020): Language (Technology) is Power: A Critical Survey of ``Bias'' in NLP.
(Hovy & Prabhumoye): Five sources of bias in natural language processing.
(Gallegos et al., 2024): Bias and Fairness in Large Language Models: A Survey.

Interpretability

Interpretability Blog-post
(Olah et al., 2018): The Building Blocks of Interpretability: Mainly focusing (or applying) on computer vision, but a very nice and illustrated article on interpretability.

Want more hands-on? Check the To go further section in code folder.

Week 4 (19/11): Large Language Models: Foundations, Generation & Beyond

Slides: preview html, pdf
Notebook(s): Tutorial_4_LLM_Interaction.ipynb; Hands_on_4_EvalLLM.ipynb
Key notions: LLMs, autoregressive, pre-training, alignment, evaluation

To go further

LLMs:

(Cho et al., 2024): Interactive visualisation, with explanations, of the inner working of causal language models. Very nice visualisation and summary of transformer-based LMs! 👀 If you like these visualisation, check also the LLM Visualization by Brendan Bycroft.
(Zhao et al., 2023): A Survey of Large Language Models. — Comprehensive review of recent advances related to LLMs, background, key findings, mainstream techniques, etc.

LLMs, Biases and Fairness

(Gallegos et al., 2025): Bias and Fairness in Large Language Models: A Survey

Want more hands-on? Check the To go further section in code folder.

Name		Name	Last commit message	Last commit date
Latest commit History 58 Commits
code		code
data		data
props		props
slides		slides
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Introduction to Natural Language Processing (NLP) — DH PSL, Fall 2025

Week 1 (29/10): Modeling Language: Towards Contextualized Representations

Week 2 (05/11): Discovering Structure: Semantic Spaces & Unsupervised Modeling

Week 3 (12/11): Learning Patterns: Supervised Tasks and Adaptation

Week 4 (19/11): Large Language Models: Foundations, Generation & Beyond

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Introduction to Natural Language Processing (NLP) — DH PSL, Fall 2025

Week 1 (29/10): Modeling Language: Towards Contextualized Representations

Week 2 (05/11): Discovering Structure: Semantic Spaces & Unsupervised Modeling

Week 3 (12/11): Learning Patterns: Supervised Tasks and Adaptation

Week 4 (19/11): Large Language Models: Foundations, Generation & Beyond

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages