This repository contains the code for the AAAI'24 paper Learning Uncertainty-Aware Temporally-Extended Actions.
This code was developed with python 3.6.13 and torch 1.8.1.
conda create -n ute python=3.6
conda actviate ute
conda install pytorch==1.8.1 torchvision==0.9.1 torchaudio==0.8.1 cudatoolkit=11.3 -c pytorch -c conda-forge
To install requirements:
pip install -r requirements.txt
To train the model(s) in the paper, run this command:
cd chain_mdp
$\epsilon$z-greedy from Temporally Extended $\epsilon$-Greedy Exploration
python --agent ez_greedy --cuda 0 --input-dim 50 --max-episodes=1000
python --agent tdqn --cuda 0 --input-dim 50 --max-episodes=1000 --skip-net-max-skips=10
python --agent ute --cuda 0 --input-dim 50 --max-episodes=1000 --skip-net-max-skips=10 --uncertainty-factor=2.0
cd grid_atari
python --agent q --env lava
python --agent sq --env lava --max-skips 7
python --agent ute --env lava --max-skips 7 --uncertainty-factor -1.5
cd grid_atari
python --env qbert --env-max-steps 10000 --agent dqn --out-dir ./ --episodes 20000 --training-steps 2500000 --eval-after-n-steps 10000 --seed 12345 --84x84 --eval-n-episodes 3
python --env qbert --env-max-steps 10000 --agent tdqn --out-dir ./ --episodes 20000 --training-steps 2500000 --eval-after-n-steps 10000 --seed 12345 --84x84 --eval-n-episodes 3
python --env qbert --env-max-steps 10000 --agent ute --out-dir ./ --episodes 20000 --training-steps 2500000 --eval-after-n-steps 10000 --seed 12345 --84x84 --eval-n-episodes 3 --uncertainty-factor -0.5
python --env qbert --env-max-steps 10000 --agent ute_bandit --out-dir ./ --episodes 20000 --training-steps 2500000 --eval-after-n-steps 10000 --seed 12345 --84x84 --eval-n-episodes 3 --uncertainty-factor -0.5
The Chain MDP environment is based on the code for the paper: Randomized Value functions via Multiplicative Normalizing Flows. Ahmed Touati, Harsh Satija, Joshua Romoff, Joelle Pineau, Pascal Vincent. UAI 2019
title={Randomized value functions via multiplicative normalizing flows},
author={Touati, Ahmed and Satija, Harsh and Romoff, Joshua and Pineau, Joelle and Vincent, Pascal},
journal={arXiv preprint arXiv:1806.02315},
The gridworlds/Atari environmnet and TempoRL agent is based on the code for the paper: TempoRL: Learning When to Act
author = {André Biedenkapp and Raghu Rajan and Frank Hutter and Marius Lindauer},
title = {{T}empo{RL}: Learning When to Act},
booktitle = {Proceedings of the 38th International Conference on Machine Learning (ICML 2021)},
year = {2021},
month = jul,