📄 Official Project Page for our CVPR 2025 paper 🎥 Watch the demo video below to see GenManip in action!
GenManip is a simulation platform designed for large-scale evaluation of generalist robotic manipulation policies under diverse, realistic instruction-following scenarios.
Built on NVIDIA Isaac Sim, GenManip offers:
- 🧠 LLM-driven task generation via a novel Task-oriented Scene Graph (ToSG)
- 🔬 200 curated scenarios for both modular and end-to-end policy benchmarking
- 🧱 A scalable asset pool with 10,000+ rigid and 100+ articulated objects with vision-language labels
- 🧭 Evaluation of spatial, appearance, commonsense, and long-horizon reasoning capabilities
Feature | Description |
---|---|
🎯 ToSG-based Task Synthesis | Graph-based semantic representation for generating complex tasks |
🖼️ Photorealistic Simulation | RTX ray-traced rendering with physical accuracy |
📊 Benchmark Suite | 200 high-diversity tasks annotated via human-in-the-loop refinement |
🧪 Evaluation Tools | Supports SR, SPL, ablations, and generalization diagnostics |
Code is released!
You can visit our official website for more information, documentation, and updates.
- GenManip Website for setup, using VLM Agents, and leaderborad
- Code for demogen, render and evaluation
- GenManip Bench (20 tasks)
- Full GenManip Bench with evaluation metrics
- GenManip Assets (10K+ objects)
- More models: Seer, ACT, and beyond
- Objaverse scaling pipeline
- etc.
If our work is helpful in your research, please cite:
@inproceedings{gao2025genmanip,
title={GenManip: LLM-driven Simulation for Generalizable Instruction-Following Manipulation},
author={Gao, Ning and Chen, Yilun and Yang, Shuai and Chen, Xinyi and Tian, Yang and Li, Hao and Huang, Haifeng and Wang, Hanqing and Wang, Tai and Pang, Jiangmiao},
booktitle={CVPR},
year={2025}
}
Have questions or ideas? Reach out via the project page or open an issue. We welcome contributions, collaborations, and feedback from the community!