It's High Time: A Survey of Temporal Question Answering

Bhawna Piryani · Abdelrahman Abdallah · Jamshid Mozafari · Avishek Anand · Adam Jatowt
University of Innsbruck · TU Delft

📄 Read the Paper on arXiv | 🗓️ 2025

📋 Table of Contents

📘 Overview
📊 Datasets
🔧 Methods & Approaches
📖 Temporal Tasks
🏥 Domain-Specific Applications
🛠️ Resources & Tools
🚀 Future Directions
📝 Citation

📘 Overview

This repository provides a comprehensive, curated collection of research papers, datasets, methods, and resources focused on Temporal Question Answering (TQA) and Temporal Information Retrieval (Temporal IR). It accompanies our survey paper on how AI models reason about time, adapt to evolving knowledge, answer temporally constrained questions, and retrieve time-sensitive information.

Key Contributions

✨ Comprehensive Survey: Coverage of 27+ datasets, 50+ methods spanning 2003-2025
📊 Unified Taxonomy: Systematic categorization of tasks, datasets, and approaches
🔍 Critical Analysis: Evaluation of current capabilities and fundamental limitations
🚀 Research Roadmap: 7 critical directions for advancing temporal reasoning in AI

Why Temporal QA Matters

Time shapes how we:

🗞️ Retrieve information: "Latest climate policies" vs. "policies from the 1990s"
🧠 Reason about events: Understanding causality, change, and evolution
💬 Interact with AI: Expecting contextually appropriate temporal grounding
🔄 Adapt to change: Handling evolving facts and knowledge updates

📊 Datasets

Quick Statistics

27+ TQA Datasets covering diverse domains and temporal scopes
2.5M+ Questions spanning historical archives (1367) to real-time web (2025)
Dataset Categories: Diachronic, Synchronic, Web-based, Synthetic, KG-based

Featured Datasets

🗞️ Diachronic Datasets (Time-Stamped Historical Documents)

Dataset	Year	#Questions	Source	Time Coverage	Answer Type	Links
ArchivalQA	2022	532K	NYT Corpus	1987-2007	Extractive	Paper · GitHub
ChroniclingAmericaQA	2024	485K	Historical Newspapers	1800-1920	Extractive	Paper · GitHub
StreamingQA	2022	147K	News Articles	2007-2020	Extractive	Paper · GitHub
NewsQA	2017	119K	CNN/Daily Mail	2007-2015	Freeform	Paper · GitHub
TempLAMA	2022	50K	News	2010-2020	Extractive	Paper · GitHub
TORQUE	2020	21K	News	-	Abstractive	Paper · GitHub
ForecastQA	2021	10.3K	News	2015-2019	Multiple Choice	Paper · Website
TDDiscourse	2019	6.1K	News	Unspecified	Extractive	Paper · GitHub

📖 Synchronic Datasets (Wikipedia Snapshots)

Dataset	Year	#Questions	Time Scope	Answer Type	Multi-Hop	Links
ComplexTempQA	2024	100.2K	1987-2023	Extractive	✓	Paper · GitHub
TEMPREASON	2023	52.8K	634-2023	Abstractive	✗	Paper · GitHub
TimeQA	2021	41.2K	1367-2018	Extractive	✗	Paper · GitHub
TemporalAlignmentQA	2024	20K	2000-2023	Abstractive	✗	Paper Github
SituatedQA	2021	12.2K	≤ 2021	Mixed	✗	Paper · GitHub
TempTabQA	2023	11.4K	Infoboxes	Abstractive	✗	Paper · Website
TiQ	2024	10K	Unspecified	Entities	✗	Paper · GitHub
PAT-Questions	2024	6.1K	Present-anchored	Extractive	✓	Paper · GitHub
TRACIE	2021	5.4K	≤ 2020	Abstractive	✗	Paper · GitHub
MenatQA	2023	2.8K	1367-2018	Extractive	✗	Paper · GitHub

🌐 Web & Real-Time Datasets

Dataset	Year	#Questions	Source	Update Frequency	Links
ReaLTimeQA	2023	5.1K	Web Search	Weekly (2020-2024)	Paper · Website
FreshQA	2024	600	Google Search	Periodic	Paper · GitHub

🧪 Synthetic & Reasoning-Focused Datasets

Dataset	Year	#Questions	Focus	Links
COTEMPQA	2024	4.7K	Co-temporal reasoning	Paper · GitHub
UnSeenTimeQA	2024	3.6K	Beyond memorization	Paper · GitHub
Test of Time (ToT)	2024	1.8K	Temporal reasoning eval	Paper · GitHub
TIMEDIAL	2021	1.1K	Temporal commonsense	Paper · GitHub

📚 View Complete Dataset Analysis →

🔧 Methods & Approaches

Evolution Timeline

📅 2003-2010: Rule-Based Era
   └─ TimeML, TERSEO, temporal taggers

📅 2011-2019: Statistical & Early Neural
   └─ Language models, temporal embeddings

📅 2020-2022: Transformer Revolution
   └─ Temporal pretraining, time-aware architectures

📅 2023-2025: LLM & RAG Era
   └─ Retrieval-augmented generation, temporal reasoning

Method Categories

🤖 Temporal Language Models (Click to expand all models)

Model	Year	Key Innovation	Architecture	Paper	Code
TempoT5	2022	Temporal conditioning via prefixes	T5 + timestamp prefixes	Paper	GitHub
BiTimeBERT	2023	Dual temporal encoding (timestamp + content)	BERT + bi-temporal module	Paper	Github
TempoBERT	2022	Time-aware masking strategy	BERT + temporal masking	Paper	GitHub
TALM	2023	Hierarchical temporal word representations	BERT + temporal adapter	Paper	Github
SG-TLM	2023	Syntax-guided + temporal-aware masking	BERT + dual masking	Paper	GitHub
TSM	2023	Temporal span masking	T5 + salient span masking	Paper	Contact authors
Temporal Attention	2022	Time matrix in attention mechanism	Transformer + time matrix	Paper	GitHub
TCQA	2023	Synthetic QA + span selection	T5-based	Paper	Github
Time-aware Prompting	2022	Temporal prompts for generation	GPT-2 + temporal prompts	Paper	GitHub

🔍 Temporal RAG Systems (Click to expand all systems)

System	Year	Pipeline Architecture	Temporal Signals	Paper	Code
TempRetriever	2025	Fusion-based dense retrieval	Query + doc timestamps	Paper	Contact authors
TimeR4	2024	Retrieve-Rewrite-Retrieve-Rerank	TKG timestamps + constraints	Paper	GitHub
MRAG	2024	Modular multi-hop framework	Symbolic + semantic temporal scoring	Paper	Contact authors
TempRALM	2024	Dense retrieval + temporal proximity	Timestamp-based ranking	Paper	Contact authors
TsContriever	2024	Contrastive time-sensitive retrieval	Time-aware embeddings	Paper	Github
FreshLLMs	2024	Search augmentation for recency	Web search integration	Paper	GitHub

🧠 Temporal Reasoning Methods (Click to expand all approaches)

Method	Year	Reasoning Type	Key Contribution	Paper	Code
ECONET	2021	Continual adaptation	Event consistency across updates	Paper	GitHub
ConTempo	2024	Contrastive temporal relations	Unified temporal relation extraction	Paper	GitHub
TIMERS	2021	Document-level relations	Structured inference layers	Paper	GitHub
TRAM	2024	Multi-dimensional reasoning	Event frequency, duration, ordering	Paper	GitHub
TODAY	2023	Differential analysis	Temporal robustness testing	Paper	GitHub
Narrative-of-Thought	2024	Narrative-based reasoning	Recounted narratives for coherence	Paper	GitHub

📜 Classical Methods (Rule-Based & Statistical)

Era	Methods	Key Papers
Rule-Based	TimeML, TERSEO, temporal taggers	Harabagiu & Bejan, 2005, Saquete et al., 2004, Saquete et al., 2004
Statistical IR	Time-based language models, temporal ranking	Li & Croft, 2003, Berberich et al., 2010, Arikan et al., 2009, Alonso et al., 2007, , ,

📚 Complete historical overview →

🔬 Complete Methods Comparison Tables →

📖 Temporal Tasks

Core temporal prediction tasks supporting TQA systems:

Task	Input	Output	Key Applications	Representative Papers
Event Dating	Event description	Event timestamp	Historical analysis, timeline construction	Das et al., 2017, Wang et al., 2021
Document Dating	Document text	Creation date	Digital preservation, metadata recovery	Kumar et al., 2012, Niculae et al. 2014, Vashishth et al. 2018, Jatowt et al. 2007, SalahEldeen and Nelson, 2013
Focus Time Estimation	Document content	Discussed time period	Historical QA, event-centric retrieval	Jatowt et al., 2013, Jatowt et al., 2013, Shrivastava et al., 2017
Query Time Profiling	Search query	Temporal intent/distribution	Time-aware search, query understanding	Kanhabua & Nørvåg, 2010,Jones and Diaz 2007 Dakka et al., 2008, Gupta and Berberich 2014

📋 Detailed Task Descriptions & Methodologies →

🏥 Domain-Specific Applications

Medical Domain

Challenges: Patient timeline reconstruction, symptom progression, treatment sequencing

System/Dataset	Focus	Key Paper
TimeText	Time-oriented clinical QA	Zhou et al., 2008
Temporal Clinical QA	Semantic web techniques	Tao et al., 2010
Time-aware Health QA	Evidence retrieval with recency	Vladika & Matthes, 2024

Legal Domain

Challenges: Evolving statutes, precedent timelines, jurisdiction-specific temporal expressions

System/Dataset	Focus	Key Paper
ChronosLex	Time-aware incremental training	T.y.s.s et al., 2024

Financial Domain

Challenges: Regulatory changes, market events, time-sensitive numerical reasoning

Dataset	Focus	Key Paper
FinQA	Numerical reasoning over financial data	Chen et al., 2021
FinTextQA	Long-form financial QA	Chen et al., 2024
FinDER	Financial QA with RAG	Choi et al., 2025

🏢 Complete Domain Analysis →

🛠️ Resources & Tools

Temporal Taggers & NLP Tools

Tool	Year	Languages	Type	Features	Link
HeidelTime	2010	200+	Rule-based	High precision, domain adaptation	Paper · GitHub
SUTime	2012	English	Rule-based	Stanford CoreNLP integration	Paper · Website
CogCompTime	2018	English	Neural	Compositional temporal understanding	Paper · GitHub
Temponym Tagger	2016	English	Hybrid	Implicit temporal references	Paper

Document Collections

Collection	Period	Size	Domain	Access
NYT Annotated Corpus	1987-2007	1.8M articles	News	LDC License
Chronicling America	1800-1920	Historical	Newspapers	Free Access
Newswire Corpus	1878-1977	2.7M articles	News	HuggingFace
Wikipedia Dumps	Various	TB-scale	Encyclopedia	Wikimedia

Evaluation Frameworks

Temporal Robustness Testing: Wallat et al., 2024
TimeBench: Comprehensive temporal reasoning benchmark (Chu et al., 2024)
TRAM: Multi-dimensional temporal reasoning evaluation (Wang & Zhao, 2024)

🚀 Future Directions

Our survey identifies 7 critical research areas requiring immediate attention:

1️⃣ Dynamic Temporal Knowledge Management

Problem: Static corpora can't handle evolving facts
Challenge: Temporal propagation when updating related events
Needed: Real-time knowledge graphs with dependency tracking

2️⃣ Temporally-Aware LLM Agents

Problem: LLMs hallucinate temporal information
Challenge: Resolving "last Tuesday" or "since our last chat"
Needed: Timeline memory, temporal reference resolution

3️⃣ Diachronic-Synchronic Integration

Problem: Most systems use only one knowledge type
Challenge: Aligning historical trends with current snapshots
Needed: Cross-source temporal alignment algorithms

4️⃣ Temporal Uncertainty & Confidence

Problem: Systems treat all dates as exact
Challenge: "Around 476 AD", "mid-20th century"
Needed: Probabilistic temporal representations

5️⃣ Multilingual & Multimodal TQA

Problem: Most work is English text-only
Challenge: Lunar calendars, visual time cues, cultural references
Needed: Cross-lingual temporal taggers, vision-language models

6️⃣ Implicit Temporal Intent Understanding

Problem: Many questions hide their time constraints
Challenge: Inferring "now" vs. "historically" from context
Needed: Context-dependent temporal intent detection

7️⃣ Evaluation & Benchmarking

Problem: Standard metrics don't capture temporal coherence
Challenge: Measuring temporal grounding, not just accuracy
Needed: Temporal-aware evaluation protocols

✨ Citation

If you find this work useful, please cite 📜our paper:

Plain

Piryani, B., Abdullah, A., Mozafari, J., Anand, A., & Jatowt, A. (2025). It's High Time: A Survey of Temporal Question Answering. arXiv preprint arXiv:2505.20243.

Bibtex

@article{piryani2025s,
  title={It's High Time: A Survey of Temporal Question Answering},
  author={Piryani, Bhawna and Abdullah, Abdelrahman and Mozafari, Jamshid and Anand, Avishek and Jatowt, Adam},
  journal={arXiv preprint arXiv:2505.20243},
  year={2025}
}

🪪License

This project is licensed under the MIT License - see the LICENSE file for details.

📝 Contributing

We welcome contributions to keep this survey comprehensive and up-to-date!

Missing a Paper or Dataset?

If we've missed your work or you know of a relevant paper/dataset that should be included, please send us an email at:

📧 bhawna.piryani@uibk.ac.at

Please include:

Paper title and authors
Link to paper and code/data (if available)
Brief description of the contribution

You can also open an issue on GitHub.

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
docs		docs
images		images
README.md		README.md

DataScienceUIBK/TemporalQA-Survey

Folders and files

Latest commit

History

Repository files navigation