- Stochastic DHPG algorithm has been added to this repository. The original NeurIPS 2022 paper proposes deterministic DHPG, while the following preprint introduces stochastic DHPG and compares it against the deterministic variant:
-
In order to run the new agents with the stochastic policy, follow the instructions below and simply use
agent=stochastichpg
for stochastic DHPG, oragent=stochastichpg_aug
for stochastic DHPG with image augmentation. -
The novel symmetric environments (Section 7.2) are in the repos symmetry_RL and mountain_car_3D.
- Author's PyTorch implementation of Deep Homomorphic Policy Gradients (DHPG). If you use our code, please cite our NeurIPS 2022 paper:
- DHPG simultaneously learns the MDP homomorphism map and learns the optimal policy using the homomorphic policy gradient theorem for continuous control problems:
- Install the following libraries needed for Mujoco and DeepMind Control Suite:
sudo apt update
sudo apt install libosmesa6-dev libgl1-mesa-glx libglfw3
- Install Mujoco and DeepMind Control Suite following the official instructions.
- We recommend using a conda virtual environment to run the code. Create the virtual environment:
conda create -n hpg_env python=3.8
conda activate hpg_env
pip install --upgrade pip
- Install dependencies of this package:
pip install -r requirements.txt
- This code includes our Python implementation of DHPG and all
the baseline algorithms used in the paper:
- Pixel observations: Deterministic DHPG, Stochastic DHPG, DBC, DeepMDP, SAC-AE, DrQ-v2.
- State observations: Deterministic DHPG, Stochastic DHPG, TD3, DDPG, SAC.
- Results were obtained on Python v3.8.10, CUDA v11.4, PyTorch v1.10.0 on 10 seeds.
- To train agents on pixel observations:
python train.py task=pendulum_swingup agent=hpg
- Available DHPG agents are:
hpg
,hpg_aug
,stochastichpg
,stochastichpg_aug
,hpg_ind
,hpg_ind_aug
:hpg
is the deterministic DHPG variant in which gradients of HPG and DPG are summed together for a single actor update (hpg_aug
ishpg
with image augmentation.)stochastic_hpg
is stochastic DHPG (stochastic_hpg_aug
isstochastic_hpg
with image augmentation)hpg_ind
is the deterministic DHPG variant in which gradients of HPG and DPG are used to independently update the actor (hpg_ind_aug
ishpg_ind
with image augmentation.)- See Appendix D.5 for more information on these variants.
- Available baseline agents are:
drqv2
,dbc
,deepmdp
,sacae
.- You can run each baseline with image augmentation by simply adding
_aug
to the end of its name. For example,dbc_aug
runsdbc
with image augmentation.
- You can run each baseline with image augmentation by simply adding
- If you do not have a CUDA device, use
device=cpu
.
- To train agents on state observations:
python train.py pixel_obs=false action_repeat=1 frame_stack=1 task=pendulum_swingup agent=hpg
- Available DHPG agents are:
hpg
,hpg_ind
,stochastichpg
. - Available baseline agents are:
td3
,sac
,ddpg_original
,ddpg_ours
.
- To run the transfer experiments, use
python transfer.py
with the same configurations discussed above for pixel observations, but usecartpole_transfer
,quadruped_transfer
,walker_transfer
, orhopper_transfer
as thetask
argument.
- To monitor results use:
tensorboard --logdir exp
If you are using our code, please cite our NeurIPS 2022 paper:
@article{rezaei2022continuous,
title={Continuous mdp homomorphisms and homomorphic policy gradient},
author={Rezaei-Shoshtari, Sahand and Zhao, Rosie and Panangaden, Prakash and Meger, David and Precup, Doina},
journal={Advances in Neural Information Processing Systems},
volume={35},
pages={20189--20204},
year={2022}
}
And our extended JMLR paper which contains the theoretical and empirical results for stochastic policies:
@article{panangaden2024policy,
author = {Prakash Panangaden and Sahand Rezaei-Shoshtari and Rosie Zhao and David Meger and Doina Precup},
title = {Policy Gradient Methods in the Presence of Symmetries and State Abstractions},
journal = {Journal of Machine Learning Research},
year = {2024},
volume = {25},
number = {71},
pages = {1--57},
url = {http://jmlr.org/papers/v25/23-1415.html}
}