Christopher Agia1, Rohan Sinha1, Jingyun Yang1, Rika Antonova2, Marco Pavone1,3, Haruki Nishimura4, Masha Itkina4, Jeannette Bohg1
1Stanford University, 2University of Cambridge, 3NVIDIA Research, 4Toyota Research Institute
The official code repository for "CUPID: Curating Data your Robot Loves with Influence Functions," accepted to CoRL 2025. For a brief overview of our work, please refer to our project page. Further details can be found in our paper available on arXiv.
This repository implements a four-stage, end-to-end data curation pipeline for robot imitation learning policies, built atop the official diffusion policy codebase:
- ποΈ Train initial policies on uncurated data
- π€ Evaluate initial policies and store rollout trajectories
- π― Run data curation with CUPID and analyze curated data quality
- π Re-train policies on curated data and analyze performance
- The official implementation of CUPID and core influence function routines for diffusion policies.
- Data curation support (both data filtering and data selection) for PushT and RoboMimic environments.
- A suite of data curation methods spanning offline and online (i.e., requiring policy evaluation) approaches.
This repository is tested on Ubuntu 20.04 with Python 3.9.15. Follow the installation procedure below to get setup.
Conda: Python packages are managed through Conda. We recommend using Miniconda with Mamba for faster and more robust package management (as in diffusion policy), and we provide Mamba setup instructions here.
MuJoCo: Next, please follow these instructions to install the original version of mujoco210
for Linux. If you run into trouble, you can try our provided instructions here.
Virtualenv: Finally, create the Python virtual environment using mamba
(note: conda
might halt):
# Clone repository.
git clone https://github.com/agiachris/cupid.git --recursive
cd cupid
# Create and activate virtualenv.
mamba env create -f conda_environment.yaml
conda activate cupid
# Replace free-mujoco-py with mujoco-py.
pip uninstall free-mujoco-py
pip install mujoco-py==2.1.2.14
# Login to wandb.
wandb login
Instructions for downloading the official diffusion policy training datasets can be found here. See an example for downloading Robomimic low-dimensional "state" datasets below:
mkdir data && cd data
wget https://diffusion-policy.cs.columbia.edu/data/training/robomimic_lowdim.zip
unzip robomimic_lowdim.zip && rm -f robomimic_lowdim.zip && cd ..
The training data will be accessible under data/robomimic/datasets
. The corresponding training configs can be found at configs/low_dim
.
Experiments are launched through shell (.sh
) scripts. These scripts make it easy to parallelize experiments on a SLURM-managed cluster. Before launching an experiment, youβll need to update a few key variables in the script:
DEBUG=1
β set to 0 to run the experiment, or 1 to print the Python command without executing it.SLURM_HOSTNAME="<enter_hostname>"
β specify the hostname of your SLURM cluster's submit node.SLURM_SBATCH_FILE="<enter_sbatch_file>"
β specify the path to your SLURM batch submission script.- Additional variables required for specific experiments are documented in the sections below (see π).
π‘ Note: If SLURM is not available, the script will default to running jobs sequentially on the local machine.
To serve as an example, all provided shell scripts are pre-configured to run the RoboMimic Lift MH task using the CNN-based Diffusion Policy, and all experiments are repeated over three random seeds per task. You can use these templates and modify them for other tasks or datasets as needed.
Run the following to train a policy on a random subset of uncurated data. Repeated over three random seeds.
See key variables π
- Set
date="<enter_date>"
to the current "train" date. Used to name output training directories. - Option: Configure initial training dataset for demo filtering (Task 1) or demo selection (Task 2) experiments. Please refer to Section 4 of the paper for formal definitions of these two (Task 1 and Task 2) curation settings.
- Set
train_filter=1
andtrain_select=0
to configure the training dataset for demo filtering. - Set
train_filter=0
andtrain_select=1
to configure the training dataset for demo selection.
π‘ Note: All subsequent experiment instructions assume one of the two settings above. To prevent overwriting policy checkpoints, use a different
date
for demo filtering and demo selection experiments. - Set
bash scripts/train/train_policies.sh
Training checkpoints will be saved to data/outputs/train
.
Run the following to evaluate the policy and save rollout trajectories. Repeated over three random seeds.
See key variables π
- Set
date="<enter_date>"
to the current "eval" date. Used to name output evaluation directories. - Set
train_date="<enter_policy_train_date>"
to the "train" date set in Stage 1.
bash scripts/eval/eval_save_episodes.sh
Evaluation results will be saved to data/outputs/eval_save_episodes
.
First, compute the influence of each training state-action pair on all test state-action pairs observed in rollouts.
See key variables π
bash scripts/train/train_trak.sh
The resulting action influence scores will be saved to the corresponding policy's evaluation directory.
Next, compute the performance influence of each training demo by aggregating influences of state-action pairs.
See key variables π
bash scripts/eval/eval_demonstration_scores.sh
The resulting performance influence scores will be saved to the corresponding policy's evaluation directory.
Before re-training the policy on curated data, we need to generate a config file that rank-orders training demos based on their computed scores in Stage 3.2. The notebook notebooks/data_curation.ipynb
implements the logic for doing so. Run the cells in Sec 1
of the notebook to get started.
- To visualize data quality trends for demo filtering (resp., selection), run cell
Sec 2.1
(resp.,Sec 2.2
). - To generate re-training configs for demo filtering (resp., selection), run cell
Sec 3.1
(resp.,Sec 3.2
).
Configs for re-training the policy on curated data will be saved to configs/curation
.
Run the following to re-train the policy on curated data; using the re-training config generated in Stage 3.3.
See key variables π
- Set
date="<enter_date>"
to the current "retrain" date. Used to name output re-training directories. - The script is pre-configured to filter 10%-90% of the training data and select 0% of the holdout data. You can adjust
curation_filter_ratios
andcuration_select_ratios
according to your specific curation needs.
bash scripts/train/retrain_policies.sh
Once the policy has finished training, open notebooks/data_curation.ipynb
and run cell Sec 4.1
(resp., Sec 4.2
) to visualize policy performance trends for demo filtering (resp., demo selection).
Important Note: Provided instructions for running baselines assume that policies have been trained and evaluated following Stage 1 and Stage 2 above. See Appendix B.4 in the paper for a description of our baselines.
We use DemInf's official code directly and refer to their usage instructions.
We recommend using Demo-SCORE's official code, which was released after our custom implementation:
bash scripts/eval/eval_save_episodes.sh # Set `train_demoscore=1`.
bash scripts/train/train_demo_score.sh # Set relevant "train" and "eval" `dates`.
bash scripts/eval/eval_demo_score.sh # Set relevant "train" and "eval" `dates`.
bash scripts/eval/eval_demonstration_scores.sh # Set `eval_online_demo_score=1`.
Proceed to Stage 3.3, uncommenting Demo-SCORE
in notebooks/data_curation.ipynb
where present. Then, follow Stage 4 instructions, uncommenting demoscore
in scripts/train/retrain_policies.sh
.
bash scripts/eval/eval_embeddings.sh # Set relevant "train" and "eval" `dates`.
bash scripts/eval/eval_demonstration_scores.sh # Set `eval_online_state_similarity=1`.
Proceed to Stage 3.3, uncommenting Success Similarity
in notebooks/data_curation.ipynb
where present. Then, follow Stage 4 instructions, uncommenting state_similarity
in scripts/train/retrain_policies.sh
.
bash scripts/eval/eval_embeddings.sh # Set relevant "train" and "eval" `dates`.
bash scripts/eval/eval_demonstration_scores.sh # Set `eval_offline_state_diversity=1`.
Proceed to Stage 3.3, uncommenting State Diversity
in notebooks/data_curation.ipynb
where present. Then, follow Stage 4 instructions, uncommenting state_diversity
in scripts/train/retrain_policies.sh
.
bash scripts/eval/eval_action_variance.sh # Set relevant "train" and "eval" `dates`.
bash scripts/eval/eval_demonstration_scores.sh # Set `eval_offline_action_diversity=1`.
Proceed to Stage 3.3, uncommenting Action Diversity
in notebooks/data_curation.ipynb
where present. Then, follow Stage 4 instructions, uncommenting action_diversity
in scripts/train/retrain_policies.sh
.
bash scripts/eval/eval_policy_loss.sh # Set relevant "train" and "eval" `dates`.
bash scripts/eval/eval_demonstration_scores.sh # Set `eval_offline_policy_loss=1`.
Proceed to Stage 3.3, uncommenting Policy Loss
in notebooks/data_curation.ipynb
where present. Then, follow Stage 4 instructions, uncommenting policy_loss
in scripts/train/retrain_policies.sh
.
CUPID is offered under the MIT License agreement. If you find CUPID useful, please consider citing our work:
@article{agia2025cupid,
title = {CUPID: Curating Data your Robot Loves with Influence Functions},
author = {Agia, Christopher and Sinha, Rohan and Yang, Jingyun and Antonova, Rika and Pavone, Marco and Nishimura, Haruki and Itkina, Masha and Bohg, Jeannette},
year = 2025,
journal = {arXiv preprint arXiv:2506.19121}
}
- Our repository is built atop the official Diffusion Policy repository.
- Our influence function routines are adapted from TRAK.