-
2025-09-20
🎉🎉🎉 Our paper has been accepted by Nips 2025, we will update our dataset and code for community as soon as possible~ -
2025-06-27
🎉 We update core code for evaluation. -
2025-05-17
🎉 We have released the label json, which is namedQA.json
. -
2025-05-04
🎉 We released the paper$\mathcal{RTV}\text{-}Bench$ : Benchmarking MLLM Continuous Perception, Understanding and Reasoning through Real-Time Video. -
2025-05-03
🌟 We are happy to release the$\mathcal{RTV}\text{-}Bench$ . You can find the$\mathcal{RTV}\text{-}Bench$ fromor
.
- Release the final label json.
- Release the evaluation code.
- Construct a more comprehensive benchmark for real-time video analysis.
- ···
We introduce
- Multi-Timestamp Question Answering (MTQA), where answers evolve with scene changes;
- Hierarchical Question Structure, combining basic and advanced queries; and
- Multi-dimensional Evaluation, assessing the ability of continuous perception, understanding, and reasoning.
Video Categories and Distribution of Question Difficulty and Query Characteristics.
(Left) RTV-Bench overs 3 key domains and 16 sub-class video types.
(Center) Distribution of question difficulty levels across eight representative task types, measured by percentage-based performance ranges.
(Right) Distribution of question queries by video length, categorized into Shallow, Moderate, and Deep levels. The bar heights indicate counts, while the line chart overlays query proportions for each duration bucket.
bash scripts/eval/eval_model.sh
If you find
@article{xun2025rtv,
title={RTV-Bench: Benchmarking MLLM Continuous Perception, Understanding and Reasoning through Real-Time Video},
author={Xun, Shuhang and Tao, Sicheng and Li, Jungang and Shi, Yibo and Lin, Zhixin and Zhu, Zhanhui and Yan, Yibo and Li, Hanqian and Zhang, Linghao and Wang, Shikang and others},
journal={arXiv preprint arXiv:2505.02064},
year={2025}
}