ReCogDrive

A Reinforced Cognitive Framework for End-to-End Autonomous Driving

Yongkang Li^1,2*, Kaixin Xiong^2*, Xiangyu Guo^1,2, Fang Li², Sixu Yan¹, Gangwei Xu^1,2,
Lijun Zhou², Long Chen², Haiyang Sun^2†, Bing Wang², Kun Ma², Guang Chen²,
Hangjun Ye², Wenyu Liu¹, Xinggang Wang^1✉

¹Huazhong University of Science and Technology
²Xiaomi EV

(*) Equal contribution. (†) Project leader. (✉) Corresponding author.

Arxiv 2025

News

Sept. 30th, 2025: We have updated our latest paper with more model details, experiments, and comprehensive visualizations. Meanwhile, we fixed the unintended NumPy issue 🐛 that previously caused inconsistencies in the training metric cache. Now the code ensures reproducible and consistent results. Special thanks to the discussion in issue #10 for bringing this up!
Aug. 24th, 2025: We have released all driving pretraining QA, including 12 driving datasets and our own annotated NavSim data. We have rewritten the scoring, filtering, and evaluation for open-source data. If it’s helpful to you, feel free to star and cite our work! 🚗💨
Aug. 21th, 2025: We release the initial version of code and weight on NAVSIM, along with documentation and training/evaluation scripts. We will also update our new revision of the paper and the pretraining datasets later this month or next month. Please stay tuned! ☕️
Jun. 11th, 2025: We released our paper on Arxiv. Code/Models are coming soon. Please stay tuned! ☕️

Updates

Release Bench2Drive, DriveLM, NAVSIM2.0, Drivebench evaluation frameworks
Release Paper
Release Full Models and Training/Evaluation Framework
Release Full Driving QA Datasets
Release Updated Paper

Abstract

Recent studies have explored leveraging the world knowledge and cognitive capabilities of Vision-Language Models (VLMs) to address the long-tail problem in end-to-end autonomous driving. However, existing methods typically formulate trajectory planning as a language modeling task, where physical actions are output in the language space, potentially leading to issues such as format-violating outputs, infeasible actions, and slow inference speeds. In this paper, we propose ReCogDrive, a novel Reinforced Cognitive framework for end-to-end autonomous Driving, unifying driving understanding and planning by integrating an autoregressive model with a diffusion planner. First, to instill human driving cognition into the VLM, we introduce a hierarchical data pipeline that mimics the sequential cognitive process of human drivers through three stages: generation, refinement, and quality control. Building on this cognitive foundation, we then address the language-action mismatch by injecting the VLM's learned driving priors into a diffusion planner to efficiently generate continuous and stable trajectories. Furthermore, to enhance driving safety and reduce collisions, we introduce a Diffusion Group Relative Policy Optimization (DiffGRPO) stage, reinforcing the planner for enhanced safety and comfort. Extensive experiments on the NAVSIM and Bench2Drive benchmarks demonstrate that ReCogDrive achieves state-of-the-art performance. Additionally, qualitative results across diverse driving scenarios and DriveBench highlight the model's scene comprehension. Code and models are available at ReCogDrive GitHub Repository.

Getting Started

Checkpoint

Results on NAVSIM

Method	Model Size	Training Stage	PDMS	Weight Download
ReCogDrive-Base-VLM	2B	Stage 1	84.1	Model
ReCogDrive-Base-IL	2B + 35M	Stage 1&2	86.5	Model
ReCogDrive-Base-RL	2B + 35M	Stage 1&2&3	90.8	Model
ReCogDrive-Large-VLM	8B	Stage 1	86.4	Model
ReCogDrive-Large-IL	8B + 35M	Stage 1&2	86.5	Model
ReCogDrive-Large-RL	8B + 35M	Stage 1&2&3	90.4	Model

Results on Bench2drive

Closed-loop and Multi-ability Testing Results in CARLA Bench2Drive Leaderboard

Method	Closed-loop Metric ↑				Multi-Ability Test (%) ↑
Method	Efficiency	Comfort	Success	DS	Merging	Overtaking	Emerg. Brake	GiveWay	Traf. Sign	Mean
ReCogDrive	138.18	17.45	45.45	71.36	29.73	20.00	69.09	20.00	71.34	42.03

Results on DriveLM and DriveBench

Method	DriveLM (GPT-Score)	LingoQA (Lingo-Judge)	DriveBench
Method	DriveLM (GPT-Score)	LingoQA (Lingo-Judge)	Percep.	Predict.	Plan.	Behav.	Avg.
ReCogDrive	67.30	67.20	64.95	49.34	70.20	42.36	56.71

Driving Pretraining Datasets

Datasets	Source	Rewritten and filtered Annotations Jsonl
NAVSIM-Traj	-	JSONL
NAVSIM-ReCogDrive	-	JSONL
DriveLM	link	JSONL
Nuinstruct	link	JSONL
NuscenesQA	link	JSONL
Omnidrive	link	JSONL
Senna	link	JSONL
LingoQA	link	JSONL
Drama	link	JSONL
MapLM	link	JSONL
Talk2Car	link	JSONL
Drivegpt4	link	JSONL
CODA-LM	link	JSONL
SUTD	link	JSONL
Bench2drive-Traj	-	JSONL
Bench2drive-QA	link	JSONL

Our ReCogDrive is pretrained on 12 open-source driving datasets. For most of these datasets, we leveraged Qwen2.5VL-72B to re-annotate the answers, applied standardized scoring, and filtered them to obtain 12 high-quality QA datasets. In addition, we built an automated annotation pipeline on Navsim, generating 752k QA pairs. These resources enable VLMs to better adapt to driving scenarios. If you only want to train a VLM for planning on a specific dataset, you can use just that dataset’s trajectories and QA (for example, NAVSIM-Traj and NAVSIM-ReCogDrive) to train the VLM and then perform planning; this can achieve results close to training on the full dataset. We perform large-scale pretraining to improve the VLM’s understanding across diverse driving scenarios.

We open-sourced these high-quality driving QA datasets in the hope of supporting research on Vision-Language-Action (VLA) for driving. If the official maintainers of any dataset prefer that we do not release the JSON annotations, we will remove them immediately. Please note that if you use these datasets, you must comply with the original licenses of the respective datasets. We emphasize that our usage of these datasets is solely for academic research purposes, with no commercial applications involved.

In addition, we provide training data on Bench2Drive, where we further fine-tune our models on mixed data and Navsim real-world scenarios, followed by training on Bench2Drive-Traj and Bench2Drive-QA to better adapt to the CARLA driving environment.

Qualitative Results on NAVSIM Navtest

We compare ReCogDrive (IL and RL) with Transfuser, where RL yields safer and more reliable trajectories in challenging turning scenarios. More visualizations are in the supplementary material.

Qualitative Results on Bench2drive

This visualization demonstrates the driving capabilities of ReCogDrive across diverse scenarios in both real-world settings and the CARLA-simulated Bench2Drive environment. The results show that our model can handle complex maneuvers such as lane following, turning, and interacting with traffic signs, reflecting strong adaptability to various driving contexts.

Contact

If you have any questions, please contact Yongkang Li via email (liyk@hust.edu.cn) or wechat (liyk_0803).

Acknowledgement

ReCogDrive is greatly inspired by the following outstanding contributions to the open-source community: NAVSIM, DPPO, LightningDiT, DiffusionDrive, Senna, GR00T.

Citation

If you find ReCogDrive is useful in your research or applications, please consider giving us a star 🌟 and citing it by the following BibTeX entry.

@article{li2025recogdrive,
  title={ReCogDrive: A Reinforced Cognitive Framework for End-to-End Autonomous Driving},
  author={Li, Yongkang and Xiong, Kaixin and Guo, Xiangyu and Li, Fang and Yan, Sixu and Xu, Gangwei and Zhou, Lijun and Chen, Long and Sun, Haiyang and Wang, Bing and others},
  journal={arXiv preprint arXiv:2506.08052},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
assets		assets
docs		docs
download		download
internvl_chat		internvl_chat
navsim		navsim
scripts		scripts
tutorial		tutorial
vqa_evaluation		vqa_evaluation
.gitattributes		.gitattributes
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
flake8.txt		flake8.txt
gitignore.txt		gitignore.txt
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ReCogDrive

A Reinforced Cognitive Framework for End-to-End Autonomous Driving

News

Updates

Table of Contents

Abstract

Getting Started

Checkpoint

Driving Pretraining Datasets

Qualitative Results on NAVSIM Navtest

Qualitative Results on Bench2drive

Contact

Acknowledgement

Citation

About

Uh oh!

Releases

Packages

Languages

License

xiaomi-research/recogdrive

Folders and files

Latest commit

History

Repository files navigation

ReCogDrive

A Reinforced Cognitive Framework for End-to-End Autonomous Driving

News

Updates

Table of Contents

Abstract

Getting Started

Checkpoint

Driving Pretraining Datasets

Qualitative Results on NAVSIM Navtest

Qualitative Results on Bench2drive

Contact

Acknowledgement

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages