Skip to content

Latest commit

 

History

History
55 lines (37 loc) · 2.04 KB

README.md

File metadata and controls

55 lines (37 loc) · 2.04 KB

Attention-based Partially Decoupled Actor-Critic (APDAC)

This repository contains the code for the following paper presented at the Deep RL Workshop, NeurIPS 2021:
Attention-based Partial Decoupling of Policy and Value for Generalization in Reinforcement Learning.

Citation

If you use this code, please cite our paper:

Nafi, N.M., Glasscock, C. and Hsu, W. (2021). Attention-based Partial Decoupling of Policy and Value for Generalization in Reinforcement Learning. In Deep Reinforcement Learning Workshop, NeurIPS 2021.

Our code is largely based on this implementation and the corresponding paper is available here. Their implementation used an open sourced PyTorch implementation of PPO.

Dependencies

Run the following to create the environment and install the required dependencies:

conda create -n apdac python=3.7
conda activate apdac

cd apdac
pip install -r requirements.txt

pip install procgen

pip install protobuf==3.20.0

git clone https://github.com/openai/baselines.git
cd baselines 
python setup.py install 

Instructions

To Train APDAC on CoinRun

python train.py --env_name coinrun --algo apdac

To Train IDAAC on CoinRun

python train.py --env_name coinrun --algo idaac

To Train PPO on CoinRun

python train.py --env_name coinrun --algo ppo --ppo_epoch 3

APDAC uses the same set of hyperparameters for all environments. Please refer to the paper for the details and the experimental results. APDAC significantly outperforms the PPO baseline and achieves comparable performance with respect to the recent state-of-the-art method IDAAC on the challenging RL generalization benchmark Procgen. Thus, APDAC demonstrates similar generalization benefits of a fully decoupled approach while reducing the overall parameters and computational cost.