Skip to content

Commit

Permalink
Initial commit
Browse files Browse the repository at this point in the history
  • Loading branch information
maxencefaldor committed Jun 3, 2024
0 parents commit 8c73add
Show file tree
Hide file tree
Showing 423 changed files with 67,604 additions and 0 deletions.
34 changes: 34 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
*.whl
wandb/
venv/
baselines/PPGA/pyribs/build/

.pytest_cache
dist
__pycache__/
*.py[cod]
*.egg-info
MUJOCO_LOG.TXT
;
.idea/

# Apptainer
apptainer/*.sif
apptainer/tmp/
apptainer/var_tmp/

# Output
output/

results*

*.png
*.jpg
*.jpeg
*.gif
*.pdf
*.svg
*.eps

*.csv
.DS_Store
21 changes: 21 additions & 0 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
MIT License

Copyright (c) 2024 Adaptive and Intelligent Robotics Lab

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
100 changes: 100 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
# Quality-Diversity Actor-Critic: Learning High-Performing and Diverse Behaviors via Value and Successor Features Critics

This repository contains the code for **Quality-Diversity Actor-Critic (QDAC)**, an off-policy actor-critic deep reinforcement learning algorithm that leverages a value function critic and a successor features critic to learn high-performing and diverse behaviors.

## Installation

This code is supported on Python 3.10 and dependencies can be installed using the following commands:

```bash
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
```

If you want to run PPGA as well, you will need to install pyribs as well:
```bash
pip install baselines/PPGA/pyribs
```

The experiments were run using Apptainer containers on NVIDIA Quadro RTX 6000 with CUDA 11.

## Launch Experiments
### Specifying the Backend

For `qdac_mb`, `qdac_mb_fixed_lambda`, `qdac_mb_no_sf`, and `uvfa`, you can specify the Brax backend to use by adding the following parameter:`+backend=<backend>`
where `<backend>` can be any Brax backend (e.g. `spring` and `generalized`).

For other algorithms, the backend can be specified by adding the following parameter: `algo.backend=<backend>`.
The `spring` backend is used by default.

### Logging

The results are all saved in the `output/` folder.

We use [WandB](https://wandb.ai/site) for logging.

### Learning Diverse High-Performing Skills

To launch a quality-diversity experiment, you can run the following command:
```bash
python main.py algo=<algo> task=<task> feat=<feat> seed=$RANDOM
```
where:
- `<algo>` can be any of the following algorithms:
- `qdac`: QDAC
- `qdac_mb`: QDAC-MB
- `ppga`: PPGA
- `dcg_me`: DCG-ME
- `qd_pg`: QD-PG
- `domino`: DOMiNO
- `smerl`: SMERL
- `smerl_reverse`: Reverse SMERL
- `qdac_mb_fixed_lambda`: QDAC-MB with fixed lambda, it requires an extra parameter `+goal.fixed_lagrangian_coeff=<value>` where `<value>` is the value of the fixed lambda (between 0 and 1)
- `qdac_mb_no_sf`: No-SF
- `uvfa`: UVFA, it requires an extra parameter `+goal.fixed_lagrangian_coeff=<value>` where `<value>` is the value of the fixed lambda (between 0 and 1)
- `<task>` and `<feat>` can be any of the following combinations:
- `task=humanoid` and `feat=feet_contact`
- `task=ant` and `feat=feet_contact`
- `task=walker2d` and `feat=feet_contact`
- `task=ant` and `feat=velocity`
- `task=humanoid` and `feat=jump`
- `task=humanoid` and `feat=angle`

The algorithm configuration files are located in the `configs` folder.

### Learning Diverse High-Performing Skills

To launch a few-shot adaptation experiment, you can run the following command:
```bash
python main_adaptation_<type>.py --algo=<algo> --path=<results_path> --seed=$RANDOM
```
where:
- `<type>` can be any of the following types:
- `failure`: Only works with `task=humanoid` and `feat=feet_contact`
- `friction`: Only works with `task=walker2d` and `feat=feet_contact`
- `gravity`: Only works with `task=ant` and `feat=feet_contact`
- `hurdle`: Only works with `task=humanoid` and `feat=jump`
- `<algo>` can be any of the above algorithms, except for `qdac_mb_no_sf` and `qdac_mb_fixed_lambda`
- `<results_path>` is the path to the results of the quality-diversity experiment

To launch a hierarchical learning experiment, you can run the following command:
```bash
python main_adaptation_wall.py algo_name=<algo> path=<results_path> seed=$RANDOM
```
where:
- `<algo>` can be any of the above algorithms, except for `qdac_mb_no_sf` and `qdac_mb_fixed_lambda`
- `<results_path>` is the path to the results of the quality-diversity experiment (only works with `task=ant` and `feat=velocity`)

The results take the form of a csv file in the quality-diversity experiment folder.

## Citation

```
@article{airl2024qdac,
author = {Grillotti, Luca and Faldor, Maxence and González León, Borja and Cully, Antoine},
title = {Quality-Diversity Actor-Critic: Learning High-Performing and Diverse Behaviors via Value and Successor Features Critics},
journal = {ICML},
year = {2024},
}
```
21 changes: 21 additions & 0 deletions baselines/PPGA/LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
MIT License

Copyright (c) 2023 Sumeet Batra

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
101 changes: 101 additions & 0 deletions baselines/PPGA/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,101 @@
# Proximal Policy Gradient Arborescence
The official repo of PPGA! Implemented in PyTorch and run with [Brax](https://github.com/google/brax), a GPU-Accelerated
high-throughput simulator for rigid bodies. This project also contains a modified version of [pyribs](https://github.com/icaros-usc/pyribs),
a QD library, and implements a modified multi-objective, vectorized version of Proximal Policy Optimization (PPO) based off
of [cleanrl](https://github.com/vwxyzjn/cleanrl).

## Requirements
We use Anaconda to manage dependencies.
```bash
conda env create -f environment.yml
conda activate ppga
```
Then install this project's custom version of pyribs.
```bash
cd pyribs && pip install -e. && cd ..
```
### CUDA
This project has been tested on Ubuntu 20.04 with an NVIDIA RTX 3090 GPU. In order to enable GPU-Acceleration, your machine must support
CUDA 11.X with minimum driver version 450.80.02 (Linux x86_64). See [here](https://docs.nvidia.com/deploy/cuda-compatibility/)
for more details on cuda compatibility.

The environment.yml file intentionally contains no CUDA dependencies since this is a machine dependent property, and so
jax-cuda and related CUDA packages must be installed by the user. We recommend installing one of the following jaxlib-cuda packages:
```bash
# for CUDA 11 and cuDNN 8.2 or newer
wget https://storage.googleapis.com/jax-releases/cuda11/jaxlib-0.3.25+cuda11.cudnn82-cp39-cp39-manylinux2014_x86_64.whl
pip install jaxlib-0.3.25+cuda11.cudnn82-cp39-cp39-manylinux2014_x86_64.whl

# OR

# for CUDA 11 and cuDNN 8.0.5 or newer
wget https://storage.googleapis.com/jax-releases/cuda11/jaxlib-0.3.25+cuda11.cudnn805-cp39-cp39-manylinux2014_x86_64.whl
pip install jaxlib-0.3.25+cuda11.cudnn805-cp39-cp39-manylinux2014_x86_64.whl
```

If you run into issues getting cuda-accelerated jax to work, please see the [jax github](https://github.com/google/jax) for more details.

We recommend using conda to install cuDNN and cudatoolkit
```bash
conda install -c anaconda cudnn
conda install -c anaconda cudatoolkit
```

### Common gotchas
Most issues arise from having the wrong version of Jax, Flax, Brax etc. installed. If you followed the steps above and are still
running into issues, please make sure the following packages are of the right version:
```bash
jax==0.3.25
jaxlib==0.3.25+cuda11.cudnn82 # or whatever your cuDNN version is
jaxopt==0.5.5
flax==0.6.1
brax==0.1.0
chex==0.1.5
gym==0.23.1
```

### Preflight Checklist
Depending on your machine specs, you may encounter out of memory errors due to how Jax VRAM preallocation works.
If this is you, you will need to disable memory preallocation.
```bash
export XLA_PYTHON_CLIENT_PREALLOCATE=false
```
With CUDA enabled, you will also need to add the cublas library to your LD_LIBRARY_PATH like so:
```bash
export LD_LIBRARY_PATH=<PATH_TO_ANACONDA>/envs/ppga/lib/python3.9/site-packages/nvidia/cublas/lib/:$LD_LIBRARY_PATH
```
For example, if you use miniconda, this would be `/home/{username}/miniconda3/...`

## Running Experiments
We provide run scripts to reproduce the paper results for both local machines and slurm.

### local
```bash
# from PPGA root. Ex. to run humanoid
./runners/local/train_ppga_humanoid.sh
```

### slurm
```bash
# from PPGA root. Ex to run humanoid
sbatch runners/slurm/train_ppga_humanoid.sh
```

For a full list of configurable hyperparameters with descriptions:
```bash
python3 -m algorithm.train_ppga --help
```

## Evaluating an Archive
See the jupyter notebook `algorithm/enjoy_ppga.ipynb` for instructions and examples on how to visualize results!

## Pretrained Archives
Trained archives reported in the paper and scheduler checkpoints are hosted on Google Drive and can be downloaded from [here](https://drive.google.com/drive/folders/1dPV5mJNaalqHdMH87KNvGAqnuHi7ozdw?usp=sharing).

## Results
| **Environment** | **QD-Score** | **Coverage** | **Best Reward** | **Experiment Command** |
|-----------------|--------------------|--------------|-----------------|---------------------------------------------|
| Humanoid | $7.01 \times 10^6$ | 70.0% | 9755 | `./runners/local/train_ppga_humanoid.sh` |
| Walker2D | $5.82 \times 10^6$ | 67.8% | 4796 | `./runners/local/train_ppga_walker2d.sh` |
| HalfCheetah | $2.94 \times 10^7$ | 98.4% | 9335 | `./runners/local/train_ppga_halfcheetah.sh` |
| Ant | $2.26 \times 10^7$ | 53.1% | 7854 | `./runners/local/train_ppga_ant.sh` |
Loading

0 comments on commit 8c73add

Please sign in to comment.