[ECCV2024] Video Foundation Models & Data for Multimodal Understanding
-
Updated
Aug 7, 2025 - Python
[ECCV2024] Video Foundation Models & Data for Multimodal Understanding
Papers, code and datasets about deep learning and multi-modal learning for video analysis
[CVPR 2024 Highlight] GenAD: Generalized Predictive Model for Autonomous Driving
[SIGGRAPH2025] Official repo for paper "Any-length Video Inpainting and Editing with Plug-and-Play Context Control"
Generic PyTorch dataset implementation to load and augment VIDEOS for deep learning training loops.
Awesome papers & datasets specifically focused on long-term videos.
The official repository of "Sekai: A Video Dataset towards World Exploration"
500,000 multimodal short video data and baseline models. 50万条多模态短视频数据集和基线模型(TensorFlow2.0)。
[ACM MM 2025] HoloTime: Taming Video Diffusion Models for Panoramic 4D Scene Generation
Summary about Video-to-Text datasets. This repository is part of the review paper *Bridging Vision and Language from the Video-to-Text Perspective: A Comprehensive Review*
Surveillance Perspective Human Action Recognition Dataset: 7759 Videos from 14 Action Classes, aggregated from multiple sources, all cropped spatio-temporally and filmed from a surveillance-camera like position.
SoccerAct10 is a dataset which contains 10 different soccer actions. This dataset was developed using the videos from YouTube.
The Most Comprehensive Survey of Video Quality Assessment to Date.
Tools for loading video dataset and transforms on video in pytorch. You can directly load video files without preprocessing.
🌱 Starter kit for working with the EPIC-KITCHENS-55 dataset for action recognition or anticipation
[NeurIPS'23] The official implementation of paper "Bitstream-corrupted Video Recovery: A Novel Benchmark Dataset and Method"
Official Code for VideoLT: Large-scale Long-tailed Video Recognition (ICCV 2021)
Keras 3 Implementation of Video Swin Transformers for 3D Video Modeling
Official repo of the ICLR 2025 paper "MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos"
[NeurIPS'22] VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training
Add a description, image, and links to the video-dataset topic page so that developers can more easily learn about it.
To associate your repository with the video-dataset topic, visit your repo's landing page and select "manage topics."