Skip to content

gullalc/multimodal_r1_papers

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 

Repository files navigation

Deepseek RL (GRPO)-Inspired Research for Vision & Multimodal Reasoning

This repository maintains a curated and updated collection of research papers and repositories focused on Reinforcement Learning (RL)-inspired methods such as Group Relative Policy Optimization (GRPO) applied to Vision and Multimodal Reasoning tasks.

Papers

Date Paper Repository
May 12, 2025 DanceGRPO: Unleashing GRPO on Visual Generation Github
May 6, 2025 X-Reasoner: Towards Generalizable Reasoning Across Modalities and Domains Github
May 6, 2025 Unified Multimodal Chain-of-Thought Reward Model through Reinforcement Fine-Tuning Github
May 5, 2025 R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement Learning Github
Apr 23, 2025 Skywork R1V2: Multimodal Hybrid Reinforcement Learning for Reasoning Github
Apr 21, 2025 Eagle 2.5: Boosting Long-Context Post-Training for Frontier Vision-Language Models Github
Apr 21, 2025 VisuLogic: A Benchmark for Evaluating Visual Reasoning in Multi-modal Large Language Models Github
Apr 14, 2025 InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models Github
Apr 13, 2025 TinyLLaVA-Video-R1: Towards Smaller LMMs for Video Reasoning Github
Apr 10, 2025 VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning Github
Apr 10, 2025 SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Models -
Apr 10, 2025 VLM-R1: A Stable and Generalizable R1-style Large Vision-Language Model Github
Apr 10, 2025 VCR-Bench: A Comprehensive Evaluation Framework for Video Chain-of-Thought Reasoning Github
Apr 9, 2025 VideoChat-R1: Enhancing Spatio-Temporal Perception via Reinforcement Fine-Tuning Github
Apr 8, 2025 Skywork R1V: Pioneering Multimodal Reasoning with Chain-of-Thought Github
Apr 3, 2025 Rethinking RL Scaling for Vision Language Models: A Transparent, From-Scratch Framework and Comprehensive Evaluation Scheme Github
Apr 1, 2025 Improved Visual-Spatial Reasoning via R1-Zero-Like Training Github
Apr 1, 2025 Think or Not Think: A Study of Explicit Thinking inRule-Based Visual Reinforcement Fine-Tuning Github
Mar 31, 2025 Exploring the Effect of Reinforcement Learning on Video Understanding: Insights from SEED-Bench-R1 Github
Mar 28, 2025 OThink-MR1: Stimulating multimodal generalized reasoning capabilities via dynamic reinforcement learning -
Mar 27, 2025 Reason-RFT: Reinforcement Fine-Tuning for Visual Reasoning Github
Mar 27, 2025 Video-R1: Reinforcing Video Reasoning in MLLMs Github
Mar 23, 2025 Vision-R1: Evolving Human-Free Alignment in Large Vision-Language Models via Vision-Guided Reinforcement Learning Github
Mar 21, 2025 OpenVLThinker: An Early Exploration to Complex Vision-Language Reasoning via Iterative Self-Improvement Github
Mar 18, 2025 Cosmos-Reason1: From Physical Common Sense To Embodied Reasoning Github
Mar 18, 2025 DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Grounding Github
Mar 17, 2025 R1-VL: Learning to Reason with Multimodal Large Language Models via Step-wise Group Relative Policy Optimization GitHub
Mar 13, 2025 R1-Onevision: Advancing Generalized Multimodal Reasoning through Cross-Modal Formalization GitHub
Mar 10, 2025 LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through Two-Stage Rule-Based RL GitHub
Mar 10, 2025 MM-Eureka: Exploring Visual Aha Moment with Rule-based Large-scale Reinforcement Learning GitHub
Mar 10, 2025 Boosting the Generalization and Reasoning of Vision Language Models with Curriculum Reinforcement Learning GitHub
Mar 9, 2025 Vision-R1: Incentivizing Reasoning Capability in Multimodal Large Language Models GitHub
Mar 7, 2025 R1-Zero's "Aha Moment" in Visual Reasoning on a 2B Non-SFT Model GitHub
Mar 4, 2025 Visual-RFT: Visual Reinforcement Fine-Tuning GitHub
Feb 27, 2025 MedVLM-R1: Incentivizing Medical Reasoning Capability of Vision-Language Models (VLMs) via Reinforcement Learning -
Feb 27, 2025 Med-RLVR: Emerging Medical Reasoning from a 3B base model via Reinforcement Learning -
Feb 20, 2025 AlphaMaze: Enhancing Large Language Models' Spatial Intelligence via GRPO GitHub

Repositories Only

Surveys

About

Deepseek RL (GRPO)-Inspired Research for Vision & Multimodal Reasoning

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published