Skip to content

openvla policy intergration #10

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 28 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
a8e37cb
openvla intergration
DelinQu Jul 4, 2024
7bed3d4
intergrate openvla policy and PR
DelinQu Jul 5, 2024
f0674e7
openvla policy intergration pull request
DelinQu Jul 6, 2024
2905379
update openvla inference scripts
xuanlinli17 Jul 28, 2024
52727f9
update readme
xuanlinli17 Jul 28, 2024
37d8fa8
Add OpenVLA metrics
xuanlinli17 Aug 18, 2024
3667e65
update readme
xuanlinli17 Aug 18, 2024
2b0ecc8
add openvla simple inference
xuanlinli17 Aug 18, 2024
2cd7aae
minor readme modification
xuanlinli17 Aug 18, 2024
8a08029
fix drawer bugs and auto params
DelinQu Dec 29, 2024
0281e63
Update README.md
DelinQu Jan 7, 2025
0936b9e
add spatialvla support, make sure transformers >= 4.47.0
DelinQu Feb 6, 2025
8e9aaaa
add spatialvla support, transformers >= 4.47.0
DelinQu Feb 6, 2025
a192b89
Update README.md
DelinQu Mar 12, 2025
32143c8
Update README.md
DelinQu Mar 24, 2025
4a5cab5
support openpi and fast
DelinQu May 8, 2025
e755709
testing gr00t
DelinQu May 8, 2025
879f73c
support gr00t🎄
DelinQu May 19, 2025
ec87a8a
Update README.md
DelinQu May 19, 2025
df102d9
add dockerfiler
DelinQu May 23, 2025
bbaa86d
add google sheet
DelinQu May 27, 2025
cdcbbae
Update README.md
DelinQu May 27, 2025
06b0cf2
Update openvla_model.py
huihanl Jun 5, 2025
710a9de
Merge pull request #16 from huihanl/main
DelinQu Jun 7, 2025
e61f531
Update main_inference.py
huihanl Jun 8, 2025
e194828
Merge pull request #18 from huihanl/main
DelinQu Jun 9, 2025
7b6cf5f
support action queue
DelinQu Jun 20, 2025
ccfe380
gr00t support action queue
DelinQu Jun 23, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion .gitmodules
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
[submodule "ManiSkill2_real2sim"]
path = ManiSkill2_real2sim
url = https://github.com/simpler-env/ManiSkill2_real2sim
url = https://github.com/allenzren/ManiSkill2_real2sim.git
# url = https://github.com/simpler-env/ManiSkill2_real2sim
61 changes: 53 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# SimplerEnv: Simulated Manipulation Policy Evaluation Environments for Real Robot Setups
# SimplerEnv: Simulated Manipulation Policy Evaluation Environments for Real Robot Setups (Multi-model Support 🔥)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/simpler-env/SimplerEnv/blob/main/example.ipynb)

Expand All @@ -23,12 +23,45 @@ We hope that our work guides and inspires future real-to-sim evaluation efforts.
- [Code Structure](#code-structure)
- [Adding New Policies](#adding-new-policies)
- [Adding New Real-to-Sim Evaluation Environments and Robots](#adding-new-real-to-sim-evaluation-environments-and-robots)
- [Full Installation (RT-1 and Octo Inference, Env Building)](#full-installation-rt-1-and-octo-inference-env-building)
- [Full Installation (RT-1, Octo, OpenVLA Inference, Env Building)](#full-installation-rt-1-octo-openvla-inference-env-building)
- [RT-1 Inference Setup](#rt-1-inference-setup)
- [Octo Inference Setup](#octo-inference-setup)
- [OpenVLA Inference Setup](#openvla-inference-setup)
- [Troubleshooting](#troubleshooting)
- [Citation](#citation)

## Benchmark @ GoogleSheets
> [!TIP]
> We maintain a public Google Sheets documenting the latest SOTA models' performance and fine-tuned weights on Simpler-Env, making community benchmarking more accessible. Welcome to contribute and update!
>
> [simpler env benchmark @ GoogleSheets 📊](https://docs.google.com/spreadsheets/d/1cLhEW9QnVkP4rqxsFVzdBVRyBWVdSm0d5zp1L_-QJx4/edit?usp=sharing)
<img width="1789" alt="image" src="https://github.com/user-attachments/assets/68e2ad3d-b24f-4562-97d9-434f23cede86" />



## Models
> [!NOTE]
> Hello everyone!
> This repository has now fully opened issues and discussions. We warmly welcome you to: 🤗
> Discuss any problems you encounter 🙋
> Submit fixes ✅
> Support new models! 🚀
> Given the significant environmental differences across various models and the specific dependencies required for simulator rendering, I will soon provide a Docker solution and a benchmark performance table. I’ll also do my best to address any issues you run into.
> Thank you for your support and contributions! 🎉
>
> To support state input, we use the submodule `ManiSkill2_real2sim` from https://github.com/allenzren/ManiSkill2_real2sim

| Model Name | support | Note |
| ----------- | ----- | ----- |
| Octo | ✅ | |
| RT1 | ✅ | |
| OpenVLA | ✅ | |
| CogACT | ✅ | OpenVLA-based |
| SpatialVLA | ✅ | [transformers == 4.47.0](https://github.com/SpatialVLA/SpatialVLA) |
| Pi0/Pi0-Fast (openpi version) | ✅ | [openpi](https://github.com/Physical-Intelligence/openpi) |
| Pi0/Pi0-Fast (lerobot version) | ✅ | [lerobot](https://github.com/huggingface/lerobot) |
| GR00T | ✅ | [Isaac-GR00T](https://github.com/NVIDIA/Isaac-GR00T) |


## Getting Started

Expand Down Expand Up @@ -77,7 +110,7 @@ conda activate simpler_env

Clone this repo:
```
git clone https://github.com/simpler-env/SimplerEnv --recurse-submodules
git clone https://github.com/simpler-env/SimplerEnv --recurse-submodules --depth 1
```

Install numpy<2.0 (otherwise errors in IK might occur in pinocchio):
Expand All @@ -97,15 +130,15 @@ cd {this_repo}
pip install -e .
```

**If you'd like to perform evaluations on our provided agents (e.g., RT-1, Octo), or add new robots and environments, please additionally follow the full installation instructions [here](#full-installation-rt-1-and-octo-inference-env-building).**
**If you'd like to perform evaluations on our provided agents (e.g., RT-1, Octo, OpenVLA), or add new robots and environments, please additionally follow the full installation instructions [here](#full-installation-rt-1-octo-openvla-inference-env-building).**


## Examples

- Simple RT-1 and Octo evaluation script on prepackaged environments with visual matching evaluation setup: see [`simpler_env/simple_inference_visual_matching_prepackaged_envs.py`](https://github.com/simpler-env/SimplerEnv/blob/main/simpler_env/simple_inference_visual_matching_prepackaged_envs.py).
- Simple RT-1, Octo, and OpenVLA evaluation script on prepackaged environments with visual matching evaluation setup: see [`simpler_env/simple_inference_visual_matching_prepackaged_envs.py`](https://github.com/simpler-env/SimplerEnv/blob/main/simpler_env/simple_inference_visual_matching_prepackaged_envs.py).
- Colab notebook for RT-1 and Octo inference: see [this link](https://colab.research.google.com/github/simpler-env/SimplerEnv/blob/main/example.ipynb).
- Environment interactive visualization and manual control: see [`ManiSkill2_real2sim/mani_skill2_real2sim/examples/demo_manual_control_custom_envs.py`](https://github.com/simpler-env/ManiSkill2_real2sim/blob/main/mani_skill2_real2sim/examples/demo_manual_control_custom_envs.py)
- Policy inference scripts to reproduce our Google Robot and WidowX real-to-sim evaluation results with sweeps over object / robot poses and advanced loggings. These contain both visual matching and variant aggregation evaluation setups along with RT-1, RT-1-X, and Octo policies. See [`scripts/`](https://github.com/simpler-env/SimplerEnv/tree/main/scripts).
- Policy inference scripts to reproduce our Google Robot and WidowX real-to-sim evaluation results with sweeps over object / robot poses and advanced loggings. These contain both visual matching and variant aggregation evaluation setups along with RT-1, RT-1-X, Octo, and OpenVLA policies. See [`scripts/`](https://github.com/simpler-env/SimplerEnv/tree/main/scripts).
- Real-to-sim evaluation videos from running `scripts/*.sh`: see [this link](https://huggingface.co/datasets/xuanlinli17/simpler-env-eval-example-videos/tree/main).

## Current Environments
Expand Down Expand Up @@ -183,6 +216,7 @@ simpler_env/
policies/: policy implementations
rt1/: RT-1 policy implementation
octo/: Octo policy implementation
openvla/: OpenVLA policy implementation
utils/:
env/: environment building and observation utilities
debug/: debugging tools for policies and robots
Expand All @@ -205,7 +239,7 @@ scripts/: example bash scripts for policy inference under our variant aggregatio

If you want to use existing environments for evaluating new policies, you can keep `./ManiSkill2_real2sim` as is.

1. Implement new policy inference scripts in `simpler_env/policies/{your_new_policy}`, following the examples for RT-1 (`simpler_env/policies/rt1`) and Octo (`simpler_env/policies/octo`) policies.
1. Implement new policy inference scripts in `simpler_env/policies/{your_new_policy}`, following the examples for RT-1 (`simpler_env/policies/rt1`), Octo (`simpler_env/policies/octo`), and OpenVLA (`simpler_env/policies/openvla`) policies.
2. You can now use `simpler_env/simple_inference_visual_matching_prepackaged_envs.py` to perform policy evaluations in simulation.
- If the policy behaviors deviate a lot from those in the real-world, you can write similar scripts as in `simpler_env/utils/debug/{policy_name}_inference_real_video.py` to debug the policy behaviors. The debugging script performs policy inference by feeding real eval video frames into the policy. If the policy behavior still deviates significantly from real, this may suggest that policy actions are processed incorrectly into the simulation environments. Please double check action orderings and action spaces.
3. If you'd like to perform customized evaluations,
Expand All @@ -219,7 +253,7 @@ If you want to use existing environments for evaluating new policies, you can ke
We provide a step-by-step guide to add new real-to-sim evaluation environments and robots in [this README](ADDING_NEW_ENVS_ROBOTS.md)


## Full Installation (RT-1 and Octo Inference, Env Building)
## Full Installation (RT-1, Octo, OpenVLA Inference, Env Building)

If you'd like to perform evaluations on our provided agents (e.g., RT-1, Octo), or add new robots and environments, please follow the full installation instructions below.

Expand Down Expand Up @@ -289,6 +323,13 @@ If you are using CUDA 12, then to use GPU for Octo inference, you need CUDA vers

`PATH=/usr/local/cuda-12.3/bin:$PATH LD_LIBRARY_PATH=/usr/local/cuda-12.3/lib64:$LD_LIBRARY_PATH bash scripts/octo_xxx_script.sh`

### OpenVLA Inference Setup

```
pip install torch==2.3.1 torchvision==0.18.1 timm==0.9.10 tokenizers==0.15.2 accelerate==0.32.1
pip install flash-attn==2.6.1 --no-build-isolation
```

## Troubleshooting

1. If you encounter issues such as
Expand All @@ -307,6 +348,10 @@ Follow [this link](https://maniskill.readthedocs.io/en/latest/user_guide/getting
TypeError: 'NoneType' object is not subscriptable
```

3. Please also refer to the original repo or [vulkan_setup](https://github.com/SpatialVLA/SpatialVLA/issues/3#issuecomment-2641739404) if you encounter any problems.

4. `tensorflow-2.15.0` conflicts with `tensorflow-2.15.1`?
The dlimp library has not been maintained for a long time, so the TensorFlow version might be out of date. A reliable solution is to comment out tensorflow==2.15.0 in the requirements file, install all other dependencies, and then install tensorflow==2.15.0 finally. Currently, using tensorflow==2.15.0 has not caused any problems.

## Citation

Expand Down
24 changes: 24 additions & 0 deletions docker/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
FROM qudelin/simpler-env

# Configure environment variables
ARG PYTHON_VERSION=3.10
ENV DEBIAN_FRONTEND=noninteractive
ENV MUJOCO_GL="egl"
# ENV PATH="/opt/venv/bin:$PATH"

# Install dependencies and set up Python in a single layer
RUN apt-get update && apt-get install -y --no-install-recommends \
build-essential cmake git \
libglib2.0-0 libgl1-mesa-glx libegl1-mesa ffmpeg \
# speech-dispatcher libgeos-dev \
# python${PYTHON_VERSION}-dev python${PYTHON_VERSION}-venv \
# && ln -s /usr/bin/python${PYTHON_VERSION} /usr/bin/python \
# && python -m venv /opt/venv \
# && apt-get clean && rm -rf /var/lib/apt/lists/* \
# && echo "source /opt/venv/bin/activate" >> /root/.bashrc

# Clone repository and install LeRobot in a single layer
# COPY . /lerobot
# WORKDIR /lerobot
# RUN /opt/venv/bin/pip install --upgrade --no-cache-dir pip \
# && /opt/venv/bin/pip install --no-cache-dir ".[test, aloha, xarm, pusht, dynamixel]"
52 changes: 52 additions & 0 deletions scripts/bridge.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
ckpt_path=$1
policy_model=$2
action_ensemble_temp=$3
logging_dir=$4
gpu_id=$5


scene_name=bridge_table_1_v1
robot=widowx
rgb_overlay_path=ManiSkill2_real2sim/data/real_inpainting/bridge_real_eval_1.png
robot_init_x=0.147
robot_init_y=0.028

CUDA_VISIBLE_DEVICES=${gpu_id} python simpler_env/main_inference.py --policy-model ${policy_model} --ckpt-path ${ckpt_path} --action-ensemble-temp ${action_ensemble_temp} --logging-dir ${logging_dir} \
--robot ${robot} --policy-setup widowx_bridge \
--control-freq 5 --sim-freq 500 --max-episode-steps 60 \
--env-name PutCarrotOnPlateInScene-v0 --scene-name ${scene_name} \
--rgb-overlay-path ${rgb_overlay_path} \
--robot-init-x ${robot_init_x} ${robot_init_x} 1 --robot-init-y ${robot_init_y} ${robot_init_y} 1 --obj-variation-mode episode --obj-episode-range 0 24 \
--robot-init-rot-quat-center 0 0 0 1 --robot-init-rot-rpy-range 0 0 1 0 0 1 0 0 1;

CUDA_VISIBLE_DEVICES=${gpu_id} python simpler_env/main_inference.py --policy-model ${policy_model} --ckpt-path ${ckpt_path} --action-ensemble-temp ${action_ensemble_temp} --logging-dir ${logging_dir} \
--robot ${robot} --policy-setup widowx_bridge \
--control-freq 5 --sim-freq 500 --max-episode-steps 60 \
--env-name StackGreenCubeOnYellowCubeBakedTexInScene-v0 --scene-name ${scene_name} \
--rgb-overlay-path ${rgb_overlay_path} \
--robot-init-x ${robot_init_x} ${robot_init_x} 1 --robot-init-y ${robot_init_y} ${robot_init_y} 1 --obj-variation-mode episode --obj-episode-range 0 24 \
--robot-init-rot-quat-center 0 0 0 1 --robot-init-rot-rpy-range 0 0 1 0 0 1 0 0 1;

CUDA_VISIBLE_DEVICES=${gpu_id} python simpler_env/main_inference.py --policy-model ${policy_model} --ckpt-path ${ckpt_path} --action-ensemble-temp ${action_ensemble_temp} --logging-dir ${logging_dir} \
--robot ${robot} --policy-setup widowx_bridge \
--control-freq 5 --sim-freq 500 --max-episode-steps 60 \
--env-name PutSpoonOnTableClothInScene-v0 --scene-name ${scene_name} \
--rgb-overlay-path ${rgb_overlay_path} \
--robot-init-x ${robot_init_x} ${robot_init_x} 1 --robot-init-y ${robot_init_y} ${robot_init_y} 1 --obj-variation-mode episode --obj-episode-range 0 24 \
--robot-init-rot-quat-center 0 0 0 1 --robot-init-rot-rpy-range 0 0 1 0 0 1 0 0 1;


scene_name=bridge_table_1_v2
robot=widowx_sink_camera_setup
rgb_overlay_path=ManiSkill2_real2sim/data/real_inpainting/bridge_sink.png
robot_init_x=0.127
robot_init_y=0.06

CUDA_VISIBLE_DEVICES=${gpu_id} python simpler_env/main_inference.py --policy-model ${policy_model} --ckpt-path ${ckpt_path} --action-ensemble-temp ${action_ensemble_temp} --logging-dir ${logging_dir} \
--robot ${robot} --policy-setup widowx_bridge \
--control-freq 5 --sim-freq 500 --max-episode-steps 120 \
--env-name PutEggplantInBasketScene-v0 --scene-name ${scene_name} \
--rgb-overlay-path ${rgb_overlay_path} \
--robot-init-x ${robot_init_x} ${robot_init_x} 1 --robot-init-y ${robot_init_y} ${robot_init_y} 1 --obj-variation-mode episode --obj-episode-range 0 24 \
--robot-init-rot-quat-center 0 0 0 1 --robot-init-rot-rpy-range 0 0 1 0 0 1 0 0 1;

Original file line number Diff line number Diff line change
@@ -1,10 +1,11 @@
# shader_dir=rt means that we turn on ray-tracing rendering; this is quite crucial for the open / close drawer task as policies often rely on shadows to infer depth
ckpt_path=$1
policy_model=$2
action_ensemble_temp=$3
logging_dir=$4
gpu_id=$5



declare -a policy_models=(
"octo-base"
)
declare -a ckpt_paths=(${ckpt_path})

declare -a env_names=(
OpenTopDrawerCustomInScene-v0
Expand All @@ -22,9 +23,9 @@ EXTRA_ARGS="--enable-raytracing"
scene_name=frl_apartment_stage_simple

EvalSim() {
echo ${policy_model} ${env_name}
echo ${ckpt_path} ${env_name}

python simpler_env/main_inference.py --policy-model ${policy_model} --ckpt-path None \
CUDA_VISIBLE_DEVICES=${gpu_id} python simpler_env/main_inference.py --policy-model ${policy_model} --ckpt-path ${ckpt_path} --action-ensemble-temp ${action_ensemble_temp} --logging-dir ${logging_dir} \
--robot google_robot_static \
--control-freq 3 --sim-freq 513 --max-episode-steps 113 \
--env-name ${env_name} --scene-name ${scene_name} \
Expand All @@ -35,7 +36,7 @@ EvalSim() {
}


for policy_model in "${policy_models[@]}"; do
for ckpt_path in "${ckpt_paths[@]}"; do
for env_name in "${env_names[@]}"; do
EvalSim
done
Expand All @@ -50,7 +51,7 @@ declare -a scene_names=(
)

for scene_name in "${scene_names[@]}"; do
for policy_model in "${policy_models[@]}"; do
for ckpt_path in "${ckpt_paths[@]}"; do
for env_name in "${env_names[@]}"; do
EXTRA_ARGS="--additional-env-build-kwargs shader_dir=rt"
EvalSim
Expand All @@ -62,7 +63,7 @@ done
# lightings
scene_name=frl_apartment_stage_simple

for policy_model in "${policy_models[@]}"; do
for ckpt_path in "${ckpt_paths[@]}"; do
for env_name in "${env_names[@]}"; do
EXTRA_ARGS="--additional-env-build-kwargs shader_dir=rt light_mode=brighter"
EvalSim
Expand All @@ -75,7 +76,7 @@ done
# new cabinets
scene_name=frl_apartment_stage_simple

for policy_model in "${policy_models[@]}"; do
for ckpt_path in "${ckpt_paths[@]}"; do
for env_name in "${env_names[@]}"; do
EXTRA_ARGS="--additional-env-build-kwargs shader_dir=rt station_name=mk_station2"
EvalSim
Expand Down
Loading