This repository contains the code for "Quality-Diversity Actor-Critic: Learning High-Performing and Diverse Behaviors via Value and Successor Features Critics". Quality-Diversity Actor-Critic (QDAC) is a quality-diversity reinforcement learning algorithm that discovers high-performing and diverse skills.
This code is supported on Python 3.10 and dependencies can be installed using the following commands:
python -m venv venv
source venv/bin/activate
pip install -r requirements.txtIf you want to run PPGA as well, you will need to install pyribs as well:
pip install baselines/PPGA/pyribsThe experiments were run using one NVIDIA Quadro RTX 6000 with CUDA 11.
To launch an experiment, you can run the following command:
python main.py seed=$RANDOM algo=<algo> task=<task> feat=<feat>where:
<algo>can be any of the following algorithms:qdac: QDACqdac_mb: QDAC-MBppga: PPGAdcg_me: DCG-MEqd_pg: QD-PGdomino: DOMiNOsmerl: SMERLsmerl_reverse: Reverse SMERLqdac_mb_fixed_lambda: QDAC-MB with fixed lambda, it requires an extra parameter+goal.fixed_lagrangian_coeff=<value>where<value>is the value of the fixed lambda (between 0 and 1)qdac_mb_no_sf: No-SFuvfa: UVFA, it requires an extra parameter+goal.fixed_lagrangian_coeff=<value>where<value>is the value of the fixed lambda (between 0 and 1)
<task>and<feat>can be any of the following combinations:task=humanoidandfeat=feet_contacttask=antandfeat=feet_contacttask=walker2dandfeat=feet_contacttask=antandfeat=velocitytask=humanoidandfeat=jumptask=humanoidandfeat=angle
The configurations are located in the configs folder. The results are all saved in the output/ folder. We use WandB for logging.
For qdac_mb, qdac_mb_fixed_lambda, qdac_mb_no_sf, and uvfa, you can specify the Brax backend to use by adding the following parameter:+backend=<backend> where <backend> can be any Brax backend (e.g. spring and generalized). For other algorithms, the backend can be specified by adding the following parameter: algo.backend=<backend>. The spring backend is used by default.
To launch a few-shot adaptation experiment, you can run the following command:
python main_adaptation_<type>.py --algo=<algo> --path=<results_path> --seed=$RANDOMwhere:
<type>can be any of the following types:failure: Only works withtask=humanoidandfeat=feet_contactfriction: Only works withtask=walker2dandfeat=feet_contactgravity: Only works withtask=antandfeat=feet_contacthurdle: Only works withtask=humanoidandfeat=jump
<algo>can be any of the above algorithms, except forqdac_mb_no_sfandqdac_mb_fixed_lambda<results_path>is the path to the results of the quality-diversity experiment
To launch a hierarchical learning experiment, you can run the following command:
python main_adaptation_wall.py algo_name=<algo> path=<results_path> seed=$RANDOMwhere:
<algo>can be any of the above algorithms, except forqdac_mb_no_sfandqdac_mb_fixed_lambda<results_path>is the path to the results of the quality-diversity experiment (only works withtask=antandfeat=velocity)
The results take the form of a csv file in the quality-diversity experiment folder.
@inproceedings{airl2024qdac,
title={Quality-Diversity Actor-Critic: Learning High-Performing and Diverse Behaviors via Value and Successor Features Critics},
author={Grillotti, Luca and Faldor, Maxence and González León, Borja and Cully, Antoine},
booktitle={International Conference on Machine Learning},
year={2024},
organization={PMLR},
}