This repository is the official implementation of Paper "Safety-guided deep reinforcement learning via online gaussian process estimation".
Our implementation is based on GPflow and OpenAI Baselines.
In our implementation, we use Tensorflow 1.13.1 and GPflow 1.3.0
-
You can install Tensorflow via
pip install tensorflow-gpu==1.13.1 # if you have a CUDA-compatible gpu and proper drivers pip install numpy==1.17.5
or
pip install tensorflow==1.13.1 pip install numpy==1.17.5
-
Install GPflow via
pip install gpflow==1.3.0
You can find detailed instructions for installing OpenAI Baselines here.
Our implementation is based on a commit c57528573ea695b19cd03e98dae48f0082fb2b5e
Instructions on setting up MuJoCo can be found here
The MuJoCo environments used in our paper depend on OpenAI Gym as well.
- Install gym from gym via
pip install -e ".[classic_control]" pip install -e ".[mujoco]"
Run the following command from the project directory:
pip install -e .
Our implementation includes two methods: vanilla DDPG and DDPG with online GP estimation.
To train a vanilla ddpg policy, use the code in ddpg_baseline.
For DDPG using online GP, use the code in safe_ddpg.
As default, training results will be saved to data_ddpg.
We also provide some samples of console outputs in outputs.
An example of training DDPG with online GP policy for pendulum can be found in train:
./train/pendulum_0.1M_safe_ddpg.sh
To train with vanila DDPG:
./train/pendulum_1M_ddpg.sh
./train/half_cheetah_0.1M_init_safe_ddpg.sh