UC ERIC Lab

All

36 repositories

iReason
Public
Official code for paper "Hidden in Plain Sight: Probing Implicit Reasoning in Multimodal Language Models"
Python
•0•0•0•0•Updated Jun 4, 2025Jun 4, 2025
GRIT
Public
Official code for paper "GRIT: Teaching MLLMs to Think with Images"
reinforcement-learning visual-reasoning visual-grounding multimodal-reasoning grounded-reasoning thinking-with-image
Python
•1•63•1•0•Updated Jun 2, 2025Jun 2, 2025
3dtown.github.io
Public
JavaScript
•0•0•0•0•Updated Jun 1, 2025Jun 1, 2025
MLRM-Halu
Public
More Thinking, Less Seeing? Assessing Amplified Hallucination in Multimodal Reasoning Models
Python
•1•0•0•0•Updated May 31, 2025May 31, 2025
Soft-Thinking
Public
Official implementation of the paper "Soft Thinking: Unlocking the Reasoning Potential of LLMs in Continuous Concept Space"
soft-reasoning chain-of-thought-reasoning reasoning-models soft-thinking continous-space-reasoning soft-token concept-token
Python
•8•141•2•0•Updated May 30, 2025May 30, 2025
SafeKey
Public
Official code for the paper "SafeKey: Amplifying Aha-Moment Insights for Safety Reasoning"
ai-safety llm-safety large-reasoning-models safety-reasoning
Python
•0•7•0•0•Updated May 30, 2025May 30, 2025
VLMbench
Public
NeurIPS 2022 Paper "VLMbench: A Compositional Benchmark for Vision-and-Language Manipulation"
language-grounding vision-and-language robotic-manipulation compositionality embodied-ai
Python
•
MIT License
•9•91•5•0•Updated May 8, 2025May 8, 2025
MiniGPT-5
Public
Official implementation of paper "MiniGPT-5: Interleaved Vision-and-Language Generation via Generative Vokens"
transformers diffusion-models multimodal-generation multimodal-llm
Python
•
Apache License 2.0
•51•860•6•0•Updated May 8, 2025May 8, 2025
edit-room.github.io
Public
JavaScript
•0•0•0•0•Updated Apr 1, 2025Apr 1, 2025
EditRoom
Public
[ICLR 2025] EditRoom: LLM-parameterized Graph Diffusion for Composable 3D Room Layout Editing
Python
•
MIT License
•0•14•0•0•Updated Apr 1, 2025Apr 1, 2025
MMIR
Public
[ACL 2025 Findings] "Multimodal Inconsistency Reasoning (MMIR): A New Benchmark for Multimodal Reasoning Models"
Python
•0•7•2•0•Updated Feb 25, 2025Feb 25, 2025
MSSBench
Public
[ICLR 2025] Official codebase for the ICLR 2025 paper "Multimodal Situational Safety"
safety ai-agents situational-awareness ai-assistant large-language-models multimodal-large-language-models
Python
•
MIT License
•1•16•3•0•Updated Feb 24, 2025Feb 24, 2025
ProbMed
Public
[ACL 2025 Findings] "Worse than Random? An Embarrassingly Simple Probing Evaluation of Large Multimodal Models in Medical VQA"
evaluation vision-and-language medical-vqa medical-diagnosis llms large-multimodal-models
Python
•1•18•1•0•Updated Feb 21, 2025Feb 21, 2025
Mojito
Public
Official repo for the paper "Mojito: Motion Trajectory and Intensity Control for Video Generation""
motion-control video-generation diffusion-models controllable-generation text-to-video-generation
0•3•0•0•Updated Feb 10, 2025Feb 10, 2025
Aerial-Vision-and-Dialog-Navigation
Public
Codebase of ACL 2023 Findings "Aerial Vision-and-Dialog Navigation"
navigation aerial-imagery drone-navigation vision-and-language vln
Python
•6•50•3•0•Updated Nov 4, 2024Nov 4, 2024
llm_coordination
Public
Code repository for the NAACL 2025 paper "LLM-Coordination: Evaluating and Analyzing Multi-agent Coordination Abilities in Large Language Models"
multiagent llms coordination-game agent-coordination
Python
•
MIT License
•3•36•1•0•Updated Oct 13, 2024Oct 13, 2024
swap-anything
Public
Official implementation of the ECCV paper "SwapAnything: Enabling Arbitrary Object Swapping in Personalized Visual Editing"
image-editing personalization diffusion-models subject-driven-generation photoswapping swap-anything
Python
•
MIT License
•12•256•5•0•Updated Oct 10, 2024Oct 10, 2024
MMWorld
Public
Official repo of the ICLR 2025 paper "MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos"
evaluation video-understanding video-dataset multi-disciplinary multimodal-large-language-models world-model
Python
•
MIT License
•1•28•0•0•Updated Sep 21, 2024Sep 21, 2024
ComCLIP
Public
Official implementation and dataset for the NAACL 2024 paper "ComCLIP: Training-Free Compositional Image and Text Matching"
causality clip svo slip vision-and-language compositionality flickr8k-dataset image-text-matching flickr30k image-text-retrieval
Python
•
MIT License
•3•35•0•1•Updated Aug 18, 2024Aug 18, 2024
Screen-Point-and-Read
Public
Code repo for "Read Anywhere Pointed: Layout-aware GUI Screen Reading with Tree-of-Lens Grounding"
screen-reader ai-agents grounding gui-agents tree-of-lens layout-understanding
Python
•2•28•0•0•Updated Jul 31, 2024Jul 31, 2024
via-video
Public
0•25•1•0•Updated Jun 20, 2024Jun 20, 2024
R2H
Public
Official implementation of the EMNLP 2023 paper "R2H: Building Multimodal Navigation Helpers that Respond to Help Requests"
helper navigation dialogue multimodal embodied-agent response-generation ai-agent
Python
•1•5•0•0•Updated Jun 19, 2024Jun 19, 2024
ViCor
Public
This is the implementation of ACL 2024 Findings paper ViCor: Bridging Visual Understanding and Commonsense Reasoning with Large Language Models
0•3•0•0•Updated Jun 11, 2024Jun 11, 2024
awesome-vision-language-navigation
Public
A curated list for vision-and-language navigation. ACL 2022 paper "Vision-and-Language Navigation: A Survey of Tasks, Methods, and Future Directions"
navigation vision-and-language embodied-agent vision-and-language-navigation
MIT License
•23•495•1•0•Updated May 2, 2024May 2, 2024
Discffusion
Public
Official repo for the TMLR paper "Discffusion: Discriminative Diffusion Models as Few-shot Vision and Language Learners"
vision-and-language few-shot-learning discriminative-learning diffusion-models
Python
•
MIT License
•4•29•1•0•Updated Apr 27, 2024Apr 27, 2024
MultipanelVQA
Public
Code for the MultipanelVQA benchmark "Muffin or Chihuahua? Challenging Large Vision-Language Models with Multipanel VQA"
vqa vlm mllm screen-ai multipanel-understanding
Jupyter Notebook
•
MIT License
•0•7•0•0•Updated Apr 11, 2024Apr 11, 2024
Naivgation-as-wish
Public
Official implementation of the NAACL 2024 paper "Navigation as Attackers Wish? Towards Building Robust Embodied Agents under Federated Learning"
robustness attack-defense federated-learning embodied-agent vision-and-language-navigation
Python
•
MIT License
•0•6•0•0•Updated Apr 10, 2024Apr 10, 2024
minigpt-5.github.io
Public
JavaScript
•1•0•0•0•Updated Apr 3, 2024Apr 3, 2024
photoswap
Public
Official implementation of the NeurIPS 2023 paper "Photoswap: Personalized Subject Swapping in Images"
image-editing personalization diffusion-models generative-ai photoswap
Jupyter Notebook
•
MIT License
•23•350•5•0•Updated Feb 28, 2024Feb 28, 2024
PECTVLM
Public
Code implementation for Findings of EMNLP 2023 paper "Parameter-Efficient Cross-lingual Transfer of Vision and Language Models via Translation-based Alignment"
Smalltalk
•
MIT License
•0•7•0•0•Updated Oct 17, 2023Oct 17, 2023