Name	Name	Last commit message	Last commit date
parent directory ..
README.md	README.md

Temporal QA Datasets: Complete Reference

This directory contains comprehensive documentation of all Temporal QA datasets, organized by collection type and characteristics.

📁 Directory Structure

Diachronic Corpora - Time-stamped historical documents
Synchronic Corpora - Wikipedia snapshots at specific points in time
Annotated Temporal Corpora - Explicitly annotated with TimeML/temporal relations
Web & Real-Time - Live search and periodically updated datasets
Synthetic & Specialized - Generated datasets for specific reasoning tasks
Knowledge Graph-Based - Structured temporal KG datasets
Domain-Specific - Medical, Legal, Financial

🗂️ Complete Dataset Catalog

At-a-Glance Comparison Table

Dataset	Year	#Q	Source Type	Time Coverage	Creation	Multi-Hop	Metadata	Paper	Data
DIACHRONIC (Primary Historical Sources)
ArchivalQA	2022	532K	NYT	1987-2007	AG	✗	✓	📄	💾
ChroniclingAmericaQA	2024	485K	Historical News	1800-1920	AG	✗	✓	📄	💾
StreamingQA	2022	147K	News	2007-2020	CS	✓	✓	📄	💾
NewsQA	2017	119K	CNN/DM	2007-2015	CS	✗	✗	📄	💾
TempLAMA	2022	50K	News	2010-2020	CS	✗	✓	📄	💾
TORQUE	2020	21K	News	-	CS	✗	✗	📄	💾
ForecastQA	2021	10.3K	News	2015-2019	CS	✓	✓	📄	💾
TDDiscourse	2019	6.1K	News	Unspecified	CS	✗	✗	📄	💾
TemporalQuestions	2021	1K	NYT	1987-2007	CS	✗	✓	📄	Contact authors
SYNCHRONIC (Wikipedia Snapshots)
ComplexTempQA	2024	100.2K	Wikipedia	1987-2023	AG	✓	✓	📄	💾
TEMPREASON	2023	52.8K	Wiki/Wikidata	634-2023	SC	✗	✗	📄	💾
TimeQA	2021	41.2K	Wikipedia	1367-2018	AG	✗	✗	📄	💾
TemporalAlignmentQA	2024	20K	Wikipedia	2000-2023	AG	✗	✗	📄	Contact authors
SituatedQA	2021	12.2K	Wikipedia	≤ 2021	CS	✗	✗	📄	💾
TempTabQA	2023	11.4K	Wiki Infoboxes	-	CS	✗	✗	📄	💾
TiQ	2024	10K	Wikipedia	Unspecified	AG	✗	✗	📄	Contact authors
PAT-Questions	2024	6.1K	Wikipedia	Present	CS	✓	✗	📄	💾
TRACIE	2021	5.4K	Wikipedia	≤ 2020	CS	✗	✗	📄	💾
MenatQA	2023	2.8K	Wikipedia	1367-2018	AG	✗	✗	📄	💾
WEB & REAL-TIME
ReaLTimeQA	2023	5.1K	Web Search	2020-2024	CS	✗	✗	📄	💾
FreshQA	2024	600	Google Search	Dynamic	CS	✓	✗	📄	💾
SYNTHETIC & SPECIALIZED
COTEMPQA	2024	4.7K	Wikidata	≤ 2023	CS	✓	✗	📄	💾
UnSeenTimeQA	2024	3.6K	Synthetic	-	AG	✓	✗	📄	💾
Test of Time	2024	1.8K	Synthetic	-	AG	✓	✗	📄	💾
TIMEDIAL	2021	1.1K	DailyDialog	-	CS	✗	✗	📄	💾

Legend:

#Q: Number of questions
Creation: AG = Auto-Generated, CS = Crowdsourced, SC = Synthetic
Multi-Hop: Requires reasoning across multiple temporal hops
Metadata: Explicit temporal metadata available

📈 Dataset Characteristics

By Temporal Complexity

Complexity	Datasets	Key Features
Simple	NewsQA, TimeQA, TempLAMA, ArchivalQA	Direct temporal lookups, explicit dates
Complex	ComplexTempQA, TEMPREASON, MenatQA, StreamingQA	Multi-hop reasoning, temporal filtering
Reasoning-Focused	Test of Time, UnSeenTimeQA, COTEMPQA	Synthetic temporal logic, beyond memorization

By Temporal Orientation

Orientation	Datasets	Description
Historical	ChroniclingAmericaQA (1800-1920), TimeQA (1367-2018)	Past events
Recent Past	NewsQA, ArchivalQA, StreamingQA	Modern history (1987-2020)
Present/Future	FreshQA, ReaLTimeQA, ForecastQA	Current/predictive

By Answer Type

Type	Examples
Extractive	ArchivalQA, TimeQA, NewsQA
Abstractive	TEMPREASON, TemporalAlignmentQA, TRACIE
Multiple Choice	ForecastQA, ReaLTimeQA, TIMEDIAL
Entities/Freeform	TiQ, NewsQA

🔧 Common Dataset Issues & Solutions

Issue 1: Temporal Ambiguity

Problem: Questions like "Who is the president?" lack temporal context
Datasets addressing this: SituatedQA, PAT-Questions
Solution: Explicit temporal anchoring or temporal disambiguation

Issue 2: Answer Drift

Problem: Correct answers change over time
Datasets addressing this: FreshQA, ReaLTimeQA
Solution: Periodic dataset updates, versioning

Issue 3: Annotation Quality

Problem: Crowdsourced datasets may have inconsistent temporal understanding
Mitigation: Multiple annotators, expert validation (e.g., ArchivalQA with journalistic expertise)

Issue 4: Limited Temporal Reasoning Types

Problem: Many datasets focus on simple lookup
Datasets addressing this: ComplexTempQA, TEMPREASON, Complex-TR
Solution: Synthetic generation with explicit reasoning templates

📊 Dataset Comparison Framework

When comparing datasets, consider these dimensions:

Temporal Scope: Historical range covered
Temporal Granularity: Day, month, year, era
Question Complexity: Simple lookup vs. multi-hop reasoning
Temporal Explicitness: Explicit dates vs. implicit temporal references
Answer Volatility: How quickly answers become outdated
Evaluation Protocol: Static test set vs. periodic updates
Annotation Quality: Crowdsourced vs. automatic vs. expert
Metadata Richness: Availability of document timestamps, temporal expressions

🔄 Dataset Updates

Dataset	Last Updated	Update Frequency	Notes
ReaLTimeQA	2024-06	Weekly	Continuous updates
FreshQA	2024-03	Periodic	Manual updates
PAT-Questions	2024	Self-updating mechanism	Automated
Others	Static	One-time release	-

📝 Contributing

We welcome contributions to keep this survey comprehensive and up-to-date!

Missing a Paper or Dataset?

If we've missed your work or you know of a relevant paper/dataset that should be included, please send us an email at:

📧 bhawna.piryani@uibk.ac.at

Please include:

Paper title and authors
Link to paper and code/data (if available)
Brief description of the contribution

You can also open an issue on GitHub.

← Back to Main README

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Temporal QA Datasets: Complete Reference

📁 Directory Structure

🗂️ Complete Dataset Catalog

At-a-Glance Comparison Table

📈 Dataset Characteristics

By Temporal Complexity

By Temporal Orientation

By Answer Type

🔧 Common Dataset Issues & Solutions

Issue 1: Temporal Ambiguity

Issue 2: Answer Drift

Issue 3: Annotation Quality

Issue 4: Limited Temporal Reasoning Types

📊 Dataset Comparison Framework

🔄 Dataset Updates

📝 Contributing

Missing a Paper or Dataset?

FilesExpand file tree

datasets

Directory actions

More options

Directory actions

More options

Latest commit

History

datasets

Folders and files

parent directory

README.md

Temporal QA Datasets: Complete Reference

📁 Directory Structure

🗂️ Complete Dataset Catalog

At-a-Glance Comparison Table

📈 Dataset Characteristics

By Temporal Complexity

By Temporal Orientation

By Answer Type

🔧 Common Dataset Issues & Solutions

Issue 1: Temporal Ambiguity

Issue 2: Answer Drift

Issue 3: Annotation Quality

Issue 4: Limited Temporal Reasoning Types

📊 Dataset Comparison Framework

🔄 Dataset Updates

📝 Contributing

Missing a Paper or Dataset?