-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
0 parents
commit 8c73add
Showing
423 changed files
with
67,604 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,34 @@ | ||
*.whl | ||
wandb/ | ||
venv/ | ||
baselines/PPGA/pyribs/build/ | ||
|
||
.pytest_cache | ||
dist | ||
__pycache__/ | ||
*.py[cod] | ||
*.egg-info | ||
MUJOCO_LOG.TXT | ||
; | ||
.idea/ | ||
|
||
# Apptainer | ||
apptainer/*.sif | ||
apptainer/tmp/ | ||
apptainer/var_tmp/ | ||
|
||
# Output | ||
output/ | ||
|
||
results* | ||
|
||
*.png | ||
*.jpg | ||
*.jpeg | ||
*.gif | ||
*.svg | ||
*.eps | ||
|
||
*.csv | ||
.DS_Store |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
MIT License | ||
|
||
Copyright (c) 2024 Adaptive and Intelligent Robotics Lab | ||
|
||
Permission is hereby granted, free of charge, to any person obtaining a copy | ||
of this software and associated documentation files (the "Software"), to deal | ||
in the Software without restriction, including without limitation the rights | ||
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell | ||
copies of the Software, and to permit persons to whom the Software is | ||
furnished to do so, subject to the following conditions: | ||
|
||
The above copyright notice and this permission notice shall be included in all | ||
copies or substantial portions of the Software. | ||
|
||
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR | ||
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, | ||
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE | ||
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER | ||
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, | ||
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE | ||
SOFTWARE. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,100 @@ | ||
# Quality-Diversity Actor-Critic: Learning High-Performing and Diverse Behaviors via Value and Successor Features Critics | ||
|
||
This repository contains the code for **Quality-Diversity Actor-Critic (QDAC)**, an off-policy actor-critic deep reinforcement learning algorithm that leverages a value function critic and a successor features critic to learn high-performing and diverse behaviors. | ||
|
||
## Installation | ||
|
||
This code is supported on Python 3.10 and dependencies can be installed using the following commands: | ||
|
||
```bash | ||
python -m venv venv | ||
source venv/bin/activate | ||
pip install -r requirements.txt | ||
``` | ||
|
||
If you want to run PPGA as well, you will need to install pyribs as well: | ||
```bash | ||
pip install baselines/PPGA/pyribs | ||
``` | ||
|
||
The experiments were run using Apptainer containers on NVIDIA Quadro RTX 6000 with CUDA 11. | ||
|
||
## Launch Experiments | ||
### Specifying the Backend | ||
|
||
For `qdac_mb`, `qdac_mb_fixed_lambda`, `qdac_mb_no_sf`, and `uvfa`, you can specify the Brax backend to use by adding the following parameter:`+backend=<backend>` | ||
where `<backend>` can be any Brax backend (e.g. `spring` and `generalized`). | ||
|
||
For other algorithms, the backend can be specified by adding the following parameter: `algo.backend=<backend>`. | ||
The `spring` backend is used by default. | ||
|
||
### Logging | ||
|
||
The results are all saved in the `output/` folder. | ||
|
||
We use [WandB](https://wandb.ai/site) for logging. | ||
|
||
### Learning Diverse High-Performing Skills | ||
|
||
To launch a quality-diversity experiment, you can run the following command: | ||
```bash | ||
python main.py algo=<algo> task=<task> feat=<feat> seed=$RANDOM | ||
``` | ||
where: | ||
- `<algo>` can be any of the following algorithms: | ||
- `qdac`: QDAC | ||
- `qdac_mb`: QDAC-MB | ||
- `ppga`: PPGA | ||
- `dcg_me`: DCG-ME | ||
- `qd_pg`: QD-PG | ||
- `domino`: DOMiNO | ||
- `smerl`: SMERL | ||
- `smerl_reverse`: Reverse SMERL | ||
- `qdac_mb_fixed_lambda`: QDAC-MB with fixed lambda, it requires an extra parameter `+goal.fixed_lagrangian_coeff=<value>` where `<value>` is the value of the fixed lambda (between 0 and 1) | ||
- `qdac_mb_no_sf`: No-SF | ||
- `uvfa`: UVFA, it requires an extra parameter `+goal.fixed_lagrangian_coeff=<value>` where `<value>` is the value of the fixed lambda (between 0 and 1) | ||
- `<task>` and `<feat>` can be any of the following combinations: | ||
- `task=humanoid` and `feat=feet_contact` | ||
- `task=ant` and `feat=feet_contact` | ||
- `task=walker2d` and `feat=feet_contact` | ||
- `task=ant` and `feat=velocity` | ||
- `task=humanoid` and `feat=jump` | ||
- `task=humanoid` and `feat=angle` | ||
|
||
The algorithm configuration files are located in the `configs` folder. | ||
|
||
### Learning Diverse High-Performing Skills | ||
|
||
To launch a few-shot adaptation experiment, you can run the following command: | ||
```bash | ||
python main_adaptation_<type>.py --algo=<algo> --path=<results_path> --seed=$RANDOM | ||
``` | ||
where: | ||
- `<type>` can be any of the following types: | ||
- `failure`: Only works with `task=humanoid` and `feat=feet_contact` | ||
- `friction`: Only works with `task=walker2d` and `feat=feet_contact` | ||
- `gravity`: Only works with `task=ant` and `feat=feet_contact` | ||
- `hurdle`: Only works with `task=humanoid` and `feat=jump` | ||
- `<algo>` can be any of the above algorithms, except for `qdac_mb_no_sf` and `qdac_mb_fixed_lambda` | ||
- `<results_path>` is the path to the results of the quality-diversity experiment | ||
|
||
To launch a hierarchical learning experiment, you can run the following command: | ||
```bash | ||
python main_adaptation_wall.py algo_name=<algo> path=<results_path> seed=$RANDOM | ||
``` | ||
where: | ||
- `<algo>` can be any of the above algorithms, except for `qdac_mb_no_sf` and `qdac_mb_fixed_lambda` | ||
- `<results_path>` is the path to the results of the quality-diversity experiment (only works with `task=ant` and `feat=velocity`) | ||
|
||
The results take the form of a csv file in the quality-diversity experiment folder. | ||
|
||
## Citation | ||
|
||
``` | ||
@article{airl2024qdac, | ||
author = {Grillotti, Luca and Faldor, Maxence and González León, Borja and Cully, Antoine}, | ||
title = {Quality-Diversity Actor-Critic: Learning High-Performing and Diverse Behaviors via Value and Successor Features Critics}, | ||
journal = {ICML}, | ||
year = {2024}, | ||
} | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
MIT License | ||
|
||
Copyright (c) 2023 Sumeet Batra | ||
|
||
Permission is hereby granted, free of charge, to any person obtaining a copy | ||
of this software and associated documentation files (the "Software"), to deal | ||
in the Software without restriction, including without limitation the rights | ||
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell | ||
copies of the Software, and to permit persons to whom the Software is | ||
furnished to do so, subject to the following conditions: | ||
|
||
The above copyright notice and this permission notice shall be included in all | ||
copies or substantial portions of the Software. | ||
|
||
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR | ||
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, | ||
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE | ||
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER | ||
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, | ||
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE | ||
SOFTWARE. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,101 @@ | ||
# Proximal Policy Gradient Arborescence | ||
The official repo of PPGA! Implemented in PyTorch and run with [Brax](https://github.com/google/brax), a GPU-Accelerated | ||
high-throughput simulator for rigid bodies. This project also contains a modified version of [pyribs](https://github.com/icaros-usc/pyribs), | ||
a QD library, and implements a modified multi-objective, vectorized version of Proximal Policy Optimization (PPO) based off | ||
of [cleanrl](https://github.com/vwxyzjn/cleanrl). | ||
|
||
## Requirements | ||
We use Anaconda to manage dependencies. | ||
```bash | ||
conda env create -f environment.yml | ||
conda activate ppga | ||
``` | ||
Then install this project's custom version of pyribs. | ||
```bash | ||
cd pyribs && pip install -e. && cd .. | ||
``` | ||
### CUDA | ||
This project has been tested on Ubuntu 20.04 with an NVIDIA RTX 3090 GPU. In order to enable GPU-Acceleration, your machine must support | ||
CUDA 11.X with minimum driver version 450.80.02 (Linux x86_64). See [here](https://docs.nvidia.com/deploy/cuda-compatibility/) | ||
for more details on cuda compatibility. | ||
|
||
The environment.yml file intentionally contains no CUDA dependencies since this is a machine dependent property, and so | ||
jax-cuda and related CUDA packages must be installed by the user. We recommend installing one of the following jaxlib-cuda packages: | ||
```bash | ||
# for CUDA 11 and cuDNN 8.2 or newer | ||
wget https://storage.googleapis.com/jax-releases/cuda11/jaxlib-0.3.25+cuda11.cudnn82-cp39-cp39-manylinux2014_x86_64.whl | ||
pip install jaxlib-0.3.25+cuda11.cudnn82-cp39-cp39-manylinux2014_x86_64.whl | ||
|
||
# OR | ||
|
||
# for CUDA 11 and cuDNN 8.0.5 or newer | ||
wget https://storage.googleapis.com/jax-releases/cuda11/jaxlib-0.3.25+cuda11.cudnn805-cp39-cp39-manylinux2014_x86_64.whl | ||
pip install jaxlib-0.3.25+cuda11.cudnn805-cp39-cp39-manylinux2014_x86_64.whl | ||
``` | ||
|
||
If you run into issues getting cuda-accelerated jax to work, please see the [jax github](https://github.com/google/jax) for more details. | ||
|
||
We recommend using conda to install cuDNN and cudatoolkit | ||
```bash | ||
conda install -c anaconda cudnn | ||
conda install -c anaconda cudatoolkit | ||
``` | ||
|
||
### Common gotchas | ||
Most issues arise from having the wrong version of Jax, Flax, Brax etc. installed. If you followed the steps above and are still | ||
running into issues, please make sure the following packages are of the right version: | ||
```bash | ||
jax==0.3.25 | ||
jaxlib==0.3.25+cuda11.cudnn82 # or whatever your cuDNN version is | ||
jaxopt==0.5.5 | ||
flax==0.6.1 | ||
brax==0.1.0 | ||
chex==0.1.5 | ||
gym==0.23.1 | ||
``` | ||
|
||
### Preflight Checklist | ||
Depending on your machine specs, you may encounter out of memory errors due to how Jax VRAM preallocation works. | ||
If this is you, you will need to disable memory preallocation. | ||
```bash | ||
export XLA_PYTHON_CLIENT_PREALLOCATE=false | ||
``` | ||
With CUDA enabled, you will also need to add the cublas library to your LD_LIBRARY_PATH like so: | ||
```bash | ||
export LD_LIBRARY_PATH=<PATH_TO_ANACONDA>/envs/ppga/lib/python3.9/site-packages/nvidia/cublas/lib/:$LD_LIBRARY_PATH | ||
``` | ||
For example, if you use miniconda, this would be `/home/{username}/miniconda3/...` | ||
|
||
## Running Experiments | ||
We provide run scripts to reproduce the paper results for both local machines and slurm. | ||
|
||
### local | ||
```bash | ||
# from PPGA root. Ex. to run humanoid | ||
./runners/local/train_ppga_humanoid.sh | ||
``` | ||
|
||
### slurm | ||
```bash | ||
# from PPGA root. Ex to run humanoid | ||
sbatch runners/slurm/train_ppga_humanoid.sh | ||
``` | ||
|
||
For a full list of configurable hyperparameters with descriptions: | ||
```bash | ||
python3 -m algorithm.train_ppga --help | ||
``` | ||
|
||
## Evaluating an Archive | ||
See the jupyter notebook `algorithm/enjoy_ppga.ipynb` for instructions and examples on how to visualize results! | ||
|
||
## Pretrained Archives | ||
Trained archives reported in the paper and scheduler checkpoints are hosted on Google Drive and can be downloaded from [here](https://drive.google.com/drive/folders/1dPV5mJNaalqHdMH87KNvGAqnuHi7ozdw?usp=sharing). | ||
|
||
## Results | ||
| **Environment** | **QD-Score** | **Coverage** | **Best Reward** | **Experiment Command** | | ||
|-----------------|--------------------|--------------|-----------------|---------------------------------------------| | ||
| Humanoid | $7.01 \times 10^6$ | 70.0% | 9755 | `./runners/local/train_ppga_humanoid.sh` | | ||
| Walker2D | $5.82 \times 10^6$ | 67.8% | 4796 | `./runners/local/train_ppga_walker2d.sh` | | ||
| HalfCheetah | $2.94 \times 10^7$ | 98.4% | 9335 | `./runners/local/train_ppga_halfcheetah.sh` | | ||
| Ant | $2.26 \times 10^7$ | 53.1% | 7854 | `./runners/local/train_ppga_ant.sh` | |
Oops, something went wrong.