Skip to content

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 

README.md

Temporal QA Datasets: Complete Reference

This directory contains comprehensive documentation of all Temporal QA datasets, organized by collection type and characteristics.

📁 Directory Structure

  • Diachronic Corpora - Time-stamped historical documents
  • Synchronic Corpora - Wikipedia snapshots at specific points in time
  • Annotated Temporal Corpora - Explicitly annotated with TimeML/temporal relations
  • Web & Real-Time - Live search and periodically updated datasets
  • Synthetic & Specialized - Generated datasets for specific reasoning tasks
  • Knowledge Graph-Based - Structured temporal KG datasets
  • Domain-Specific - Medical, Legal, Financial

🗂️ Complete Dataset Catalog

At-a-Glance Comparison Table

Dataset Year #Q Source Type Time Coverage Creation Multi-Hop Metadata Paper Data
DIACHRONIC (Primary Historical Sources)
ArchivalQA 2022 532K NYT 1987-2007 AG 📄 💾
ChroniclingAmericaQA 2024 485K Historical News 1800-1920 AG 📄 💾
StreamingQA 2022 147K News 2007-2020 CS 📄 💾
NewsQA 2017 119K CNN/DM 2007-2015 CS 📄 💾
TempLAMA 2022 50K News 2010-2020 CS 📄 💾
TORQUE 2020 21K News - CS 📄 💾
ForecastQA 2021 10.3K News 2015-2019 CS 📄 💾
TDDiscourse 2019 6.1K News Unspecified CS 📄 💾
TemporalQuestions 2021 1K NYT 1987-2007 CS 📄 Contact authors
SYNCHRONIC (Wikipedia Snapshots)
ComplexTempQA 2024 100.2K Wikipedia 1987-2023 AG 📄 💾
TEMPREASON 2023 52.8K Wiki/Wikidata 634-2023 SC 📄 💾
TimeQA 2021 41.2K Wikipedia 1367-2018 AG 📄 💾
TemporalAlignmentQA 2024 20K Wikipedia 2000-2023 AG 📄 Contact authors
SituatedQA 2021 12.2K Wikipedia ≤ 2021 CS 📄 💾
TempTabQA 2023 11.4K Wiki Infoboxes - CS 📄 💾
TiQ 2024 10K Wikipedia Unspecified AG 📄 Contact authors
PAT-Questions 2024 6.1K Wikipedia Present CS 📄 💾
TRACIE 2021 5.4K Wikipedia ≤ 2020 CS 📄 💾
MenatQA 2023 2.8K Wikipedia 1367-2018 AG 📄 💾
WEB & REAL-TIME
ReaLTimeQA 2023 5.1K Web Search 2020-2024 CS 📄 💾
FreshQA 2024 600 Google Search Dynamic CS 📄 💾
SYNTHETIC & SPECIALIZED
COTEMPQA 2024 4.7K Wikidata ≤ 2023 CS 📄 💾
UnSeenTimeQA 2024 3.6K Synthetic - AG 📄 💾
Test of Time 2024 1.8K Synthetic - AG 📄 💾
TIMEDIAL 2021 1.1K DailyDialog - CS 📄 💾

Legend:

  • #Q: Number of questions
  • Creation: AG = Auto-Generated, CS = Crowdsourced, SC = Synthetic
  • Multi-Hop: Requires reasoning across multiple temporal hops
  • Metadata: Explicit temporal metadata available

📈 Dataset Characteristics

By Temporal Complexity

Complexity Datasets Key Features
Simple NewsQA, TimeQA, TempLAMA, ArchivalQA Direct temporal lookups, explicit dates
Complex ComplexTempQA, TEMPREASON, MenatQA, StreamingQA Multi-hop reasoning, temporal filtering
Reasoning-Focused Test of Time, UnSeenTimeQA, COTEMPQA Synthetic temporal logic, beyond memorization

By Temporal Orientation

Orientation Datasets Description
Historical ChroniclingAmericaQA (1800-1920), TimeQA (1367-2018) Past events
Recent Past NewsQA, ArchivalQA, StreamingQA Modern history (1987-2020)
Present/Future FreshQA, ReaLTimeQA, ForecastQA Current/predictive

By Answer Type

Type Examples
Extractive ArchivalQA, TimeQA, NewsQA
Abstractive TEMPREASON, TemporalAlignmentQA, TRACIE
Multiple Choice ForecastQA, ReaLTimeQA, TIMEDIAL
Entities/Freeform TiQ, NewsQA

🔧 Common Dataset Issues & Solutions

Issue 1: Temporal Ambiguity

Problem: Questions like "Who is the president?" lack temporal context
Datasets addressing this: SituatedQA, PAT-Questions
Solution: Explicit temporal anchoring or temporal disambiguation

Issue 2: Answer Drift

Problem: Correct answers change over time
Datasets addressing this: FreshQA, ReaLTimeQA
Solution: Periodic dataset updates, versioning

Issue 3: Annotation Quality

Problem: Crowdsourced datasets may have inconsistent temporal understanding
Mitigation: Multiple annotators, expert validation (e.g., ArchivalQA with journalistic expertise)

Issue 4: Limited Temporal Reasoning Types

Problem: Many datasets focus on simple lookup
Datasets addressing this: ComplexTempQA, TEMPREASON, Complex-TR
Solution: Synthetic generation with explicit reasoning templates


📊 Dataset Comparison Framework

When comparing datasets, consider these dimensions:

  1. Temporal Scope: Historical range covered
  2. Temporal Granularity: Day, month, year, era
  3. Question Complexity: Simple lookup vs. multi-hop reasoning
  4. Temporal Explicitness: Explicit dates vs. implicit temporal references
  5. Answer Volatility: How quickly answers become outdated
  6. Evaluation Protocol: Static test set vs. periodic updates
  7. Annotation Quality: Crowdsourced vs. automatic vs. expert
  8. Metadata Richness: Availability of document timestamps, temporal expressions

🔄 Dataset Updates

Dataset Last Updated Update Frequency Notes
ReaLTimeQA 2024-06 Weekly Continuous updates
FreshQA 2024-03 Periodic Manual updates
PAT-Questions 2024 Self-updating mechanism Automated
Others Static One-time release -

📝 Contributing

We welcome contributions to keep this survey comprehensive and up-to-date!

Missing a Paper or Dataset?

If we've missed your work or you know of a relevant paper/dataset that should be included, please send us an email at:

📧 bhawna.piryani@uibk.ac.at

Please include:

  • Paper title and authors
  • Link to paper and code/data (if available)
  • Brief description of the contribution

You can also open an issue on GitHub.


← Back to Main README