This is the official code base for our AAAI'21 paper, "Distributional Reinforcement Learning via Moment Matching", arXiv, AAAI Proceeding.
- tensorflow==1.15
- tensorflow-probability==0.8.0
- atari-py
- gym==0.12.1
- gin-config
- cv2
- Dopamine framework (already integrated into this code base)
- To train and evaluate
MMDQN
in an Atari game with the default settings, use the following example command (from within the main directorymmdrl
):python main.py --env Breakout --agent_id mmd \ --agent_name mmd_dqn_1 --gin_files ./configs/mmd_atari.gin
where env
is an Atari game name, agent_id
is a registered agent id ('mmd' for MMDQN), agent_name
for the directory name to save the agent training and evaluation results, and gin_files
is a path to the hyperparameter configuration (in gin
format).
- For convenience, we can directly modify the bash script
run_mmdqn.sh
for various hyperparameter settings and run the bash viachmod +x ./run_mmdqn.sh; ./run_mmdqn.sh
env
: One of the 57 Atari gamesagent_id
: ['mmd', 'quantile', 'iqn'], agent id (forMMDQN
,QR-DQN
andIQN
)agent_name
: str, the experiment log saved to./results/<env>/<agent_name>
policy
: ['eps_greedy', 'ucb', 'ps'], policy used by the agent (epsilon-greey, UCB or Thompson sampling)num_atoms
: int, the number of particles Nbandwidth_selection_type
: 'mixture', the method for kernel bandwidth selectiongin_files
: str, the path to a gin file containing the hyperparameters of the agentgin_bindings
: str, overwrite hyperparameters in a gin file
mmd_agent.py
: An implementation ofMMDQN
agentquantile_agent.py
: An implementation ofQR-DQN
main.py
: Main file to train and evaluate an agentrun_mmdqn.sh
: A bash script to train and evaluateMMDQN
agentconfigs/mmd_atari_gin
: Hyperparameters ofMMDQN
agentdopamine/
: The code base of Dopamine framework
For the ease of re-presenting our experimental result, I have uploaded the raw result data of our algorithm MMDQN
(and QR-DQN
) to /raw_result_data
/raw_result_data/mmdqn_train_episode_return.csv
: The raw scores ofMMDQN
during training for the Atari games./raw_result_data/mmdqn_eval_episode_return.csv
: The raw scores ofMMDQN
during evaluation for the Atari games./raw_result_data/qr_train_episode_return.csv
: The raw scores ofQR-DQN
during training for the Atari games./raw_result_data/qr_train_episode_return.csv
: The raw scores ofQR-DQN
during evaluation for the Atari games.
MMDQN
is trained in each of the 55 Atari games for three independent times (three random seeds). Each line of each of the csv
files above contains the name of the game and a series of 200
numbers that represent the score that MMDQN
obtains after each iteration. I have also uploaded the raw result data of QR-DQN
in /raw_result_data/qr_train_episode_return.csv
and /raw_result_data/qr_eval_episode_return.csv
.
@article{Nguyen-Tang_Gupta_Venkatesh_2021,
title={Distributional Reinforcement Learning via Moment Matching},
volume={35},
url={https://ojs.aaai.org/index.php/AAAI/article/view/17104},
number={10},
journal={Proceedings of the AAAI Conference on Artificial Intelligence},
author={Nguyen-Tang, Thanh and Gupta, Sunil and Venkatesh, Svetha},
year={2021},
month={May},
pages={9144-9152} }