Initial commit

adaptive-intelligent-robotics · Jun 3, 2024 · 8c73add · 8c73add
commit 8c73add
Show file tree

Hide file tree

Showing 423 changed files with 67,604 additions and 0 deletions.
diff --git a/.gitignore b/.gitignore
@@ -0,0 +1,34 @@
+*.whl
+wandb/
+venv/
+baselines/PPGA/pyribs/build/
+
+.pytest_cache
+dist
+__pycache__/
+*.py[cod]
+*.egg-info
+MUJOCO_LOG.TXT
+;
+.idea/
+
+# Apptainer
+apptainer/*.sif
+apptainer/tmp/
+apptainer/var_tmp/
+
+# Output
+output/
+
+results*
+
+*.png
+*.jpg
+*.jpeg
+*.gif
+*.pdf
+*.svg
+*.eps
+
+*.csv
+.DS_Store
diff --git a/LICENSE b/LICENSE
@@ -0,0 +1,21 @@
+MIT License
+
+Copyright (c) 2024 Adaptive and Intelligent Robotics Lab
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
diff --git a/README.md b/README.md
@@ -0,0 +1,100 @@
+# Quality-Diversity Actor-Critic: Learning High-Performing and Diverse Behaviors via Value and Successor Features Critics
+
+This repository contains the code for **Quality-Diversity Actor-Critic (QDAC)**, an off-policy actor-critic deep reinforcement learning algorithm that leverages a value function critic and a successor features critic to learn high-performing and diverse behaviors.
+
+## Installation
+
+This code is supported on Python 3.10 and dependencies can be installed using the following commands:
+
+```bash
+python -m venv venv
+source venv/bin/activate
+pip install -r requirements.txt
+```
+
+If you want to run PPGA as well, you will need to install pyribs as well:
+```bash
+pip install baselines/PPGA/pyribs
+```
+
+The experiments were run using Apptainer containers on NVIDIA Quadro RTX 6000 with CUDA 11.
+
+## Launch Experiments
+### Specifying the Backend
+
+For `qdac_mb`, `qdac_mb_fixed_lambda`, `qdac_mb_no_sf`, and `uvfa`, you can specify the Brax backend to use by adding the following parameter:`+backend=<backend>`
+where `<backend>` can be any Brax backend (e.g. `spring` and `generalized`).
+
+For other algorithms, the backend can be specified by adding the following parameter: `algo.backend=<backend>`. 
+The `spring` backend is used by default.
+
+### Logging
+
+The results are all saved in the `output/` folder.
+
+We use [WandB](https://wandb.ai/site) for logging.
+
+### Learning Diverse High-Performing Skills
+
+To launch a quality-diversity experiment, you can run the following command:
+```bash
+python main.py algo=<algo> task=<task> feat=<feat> seed=$RANDOM
+```
+where:
+- `<algo>` can be any of the following algorithms:
+  - `qdac`: QDAC
+  - `qdac_mb`: QDAC-MB
+  - `ppga`: PPGA
+  - `dcg_me`: DCG-ME
+  - `qd_pg`: QD-PG
+  - `domino`: DOMiNO
+  - `smerl`: SMERL
+  - `smerl_reverse`: Reverse SMERL
+  - `qdac_mb_fixed_lambda`: QDAC-MB with fixed lambda, it requires an extra parameter `+goal.fixed_lagrangian_coeff=<value>` where `<value>` is the value of the fixed lambda (between 0 and 1)
+  - `qdac_mb_no_sf`: No-SF
+  - `uvfa`: UVFA, it requires an extra parameter `+goal.fixed_lagrangian_coeff=<value>` where `<value>` is the value of the fixed lambda (between 0 and 1)
+- `<task>` and `<feat>` can be any of the following combinations:
+  - `task=humanoid` and `feat=feet_contact`
+  - `task=ant` and `feat=feet_contact`
+  - `task=walker2d` and `feat=feet_contact`
+  - `task=ant` and `feat=velocity`
+  - `task=humanoid` and `feat=jump`
+  - `task=humanoid` and `feat=angle`
+
+The algorithm configuration files are located in the `configs` folder.
+
+### Learning Diverse High-Performing Skills
+
+To launch a few-shot adaptation experiment, you can run the following command:
+```bash
+python main_adaptation_<type>.py --algo=<algo> --path=<results_path> --seed=$RANDOM
+```
+where:
+- `<type>` can be any of the following types:
+  - `failure`: Only works with `task=humanoid` and `feat=feet_contact`
+  - `friction`: Only works with `task=walker2d` and `feat=feet_contact`
+  - `gravity`: Only works with `task=ant` and `feat=feet_contact`
+  - `hurdle`: Only works with `task=humanoid` and `feat=jump`
+- `<algo>` can be any of the above algorithms, except for `qdac_mb_no_sf` and `qdac_mb_fixed_lambda`
+- `<results_path>` is the path to the results of the quality-diversity experiment
+
+To launch a hierarchical learning experiment, you can run the following command:
+```bash
+python main_adaptation_wall.py algo_name=<algo> path=<results_path> seed=$RANDOM
+```
+where:
+- `<algo>` can be any of the above algorithms, except for `qdac_mb_no_sf` and `qdac_mb_fixed_lambda`
+- `<results_path>` is the path to the results of the quality-diversity experiment (only works with `task=ant` and `feat=velocity`)
+
+The results take the form of a csv file in the quality-diversity experiment folder.
+
+## Citation
+
+```
+@article{airl2024qdac,
+	author    = {Grillotti, Luca and Faldor, Maxence and González León, Borja and Cully, Antoine},
+	title     = {Quality-Diversity Actor-Critic: Learning High-Performing and Diverse Behaviors via Value and Successor Features Critics},
+	journal   = {ICML},
+	year      = {2024},
+}
+```
diff --git a/baselines/PPGA/LICENSE b/baselines/PPGA/LICENSE
@@ -0,0 +1,21 @@
+MIT License
+
+Copyright (c) 2023 Sumeet Batra
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
diff --git a/baselines/PPGA/README.md b/baselines/PPGA/README.md
@@ -0,0 +1,101 @@
+# Proximal Policy Gradient Arborescence 
+The official repo of PPGA! Implemented in PyTorch and run with [Brax](https://github.com/google/brax), a GPU-Accelerated
+high-throughput simulator for rigid bodies. This project also contains a modified version of [pyribs](https://github.com/icaros-usc/pyribs),
+a QD library, and implements a modified multi-objective, vectorized version of Proximal Policy Optimization (PPO) based off
+of [cleanrl](https://github.com/vwxyzjn/cleanrl).
+
+## Requirements
+We use Anaconda to manage dependencies. 
+```bash
+conda env create -f environment.yml
+conda activate ppga  
+```
+Then install this project's custom version of pyribs.
+```bash
+cd pyribs && pip install -e. && cd ..
+```
+### CUDA 
+This project has been tested on Ubuntu 20.04 with an NVIDIA RTX 3090 GPU. In order to enable GPU-Acceleration, your machine must support 
+CUDA 11.X with minimum driver version 450.80.02 (Linux x86_64). See [here](https://docs.nvidia.com/deploy/cuda-compatibility/)
+for more details on cuda compatibility. 
+
+The environment.yml file intentionally contains no CUDA dependencies since this is a machine dependent property, and so 
+jax-cuda and related CUDA packages must be installed by the user. We recommend installing one of the following jaxlib-cuda packages:
+```bash
+# for CUDA 11 and cuDNN 8.2 or newer
+wget https://storage.googleapis.com/jax-releases/cuda11/jaxlib-0.3.25+cuda11.cudnn82-cp39-cp39-manylinux2014_x86_64.whl
+pip install jaxlib-0.3.25+cuda11.cudnn82-cp39-cp39-manylinux2014_x86_64.whl
+
+# OR 
+
+# for CUDA 11 and cuDNN 8.0.5 or newer 
+wget https://storage.googleapis.com/jax-releases/cuda11/jaxlib-0.3.25+cuda11.cudnn805-cp39-cp39-manylinux2014_x86_64.whl
+pip install jaxlib-0.3.25+cuda11.cudnn805-cp39-cp39-manylinux2014_x86_64.whl
+```
+
+If you run into issues getting cuda-accelerated jax to work, please see the [jax github](https://github.com/google/jax) for more details.
+
+We recommend using conda to install cuDNN and cudatoolkit
+```bash
+conda install -c anaconda cudnn
+conda install -c anaconda cudatoolkit 
+```
+
+### Common gotchas 
+Most issues arise from having the wrong version of Jax, Flax, Brax etc. installed. If you followed the steps above and are still 
+running into issues, please make sure the following packages are of the right version: 
+```bash
+jax==0.3.25
+jaxlib==0.3.25+cuda11.cudnn82 # or whatever your cuDNN version is 
+jaxopt==0.5.5
+flax==0.6.1
+brax==0.1.0
+chex==0.1.5
+gym==0.23.1
+```
+
+### Preflight Checklist 
+Depending on your machine specs, you may encounter out of memory errors due to how Jax VRAM preallocation works.
+If this is you, you will need to disable memory preallocation. 
+```bash
+export XLA_PYTHON_CLIENT_PREALLOCATE=false
+```
+With CUDA enabled, you will also need to add the cublas library to your LD_LIBRARY_PATH like so:
+```bash
+export LD_LIBRARY_PATH=<PATH_TO_ANACONDA>/envs/ppga/lib/python3.9/site-packages/nvidia/cublas/lib/:$LD_LIBRARY_PATH
+```
+For example, if you use miniconda, this would be `/home/{username}/miniconda3/...`
+
+## Running Experiments
+We provide run scripts to reproduce the paper results for both local machines and slurm. 
+
+### local
+```bash
+# from PPGA root. Ex. to run humanoid
+./runners/local/train_ppga_humanoid.sh 
+```
+
+### slurm 
+```bash
+# from PPGA root. Ex to run humanoid 
+sbatch runners/slurm/train_ppga_humanoid.sh 
+```
+
+For a full list of configurable hyperparameters with descriptions: 
+```bash
+python3 -m algorithm.train_ppga --help 
+```
+
+## Evaluating an Archive 
+See the jupyter notebook `algorithm/enjoy_ppga.ipynb` for instructions and examples on how to visualize results! 
+
+## Pretrained Archives 
+Trained archives reported in the paper and scheduler checkpoints are hosted on Google Drive and can be downloaded from [here](https://drive.google.com/drive/folders/1dPV5mJNaalqHdMH87KNvGAqnuHi7ozdw?usp=sharing).
+
+## Results 
+| **Environment** | **QD-Score**       | **Coverage** | **Best Reward** | **Experiment Command**                      |
+|-----------------|--------------------|--------------|-----------------|---------------------------------------------|
+| Humanoid        | $7.01 \times 10^6$ | 70.0%        | 9755            | `./runners/local/train_ppga_humanoid.sh`    |
+| Walker2D        | $5.82 \times 10^6$ | 67.8%        | 4796            | `./runners/local/train_ppga_walker2d.sh`    |
+| HalfCheetah     | $2.94 \times 10^7$ | 98.4%        | 9335            | `./runners/local/train_ppga_halfcheetah.sh` |
+| Ant             | $2.26 \times 10^7$ | 53.1%        | 7854            | `./runners/local/train_ppga_ant.sh`         |