Uncertainty-aware Temporal Extension

This repository contains the code for the AAAI'24 paper Learning Uncertainty-Aware Temporally-Extended Actions.

Setup & Requirements

This code was developed with python 3.6.13 and torch 1.8.1.

conda create -n ute python=3.6

conda actviate ute

conda install pytorch==1.8.1 torchvision==0.9.1 torchaudio==0.8.1 cudatoolkit=11.3 -c pytorch -c conda-forge

To install requirements:

pip install -r requirements.txt

To train the model(s) in the paper, run this command:

Chain MDP experiments

cd chain_mdp

$\epsilon$z-greedy from Temporally Extended $\epsilon$-Greedy Exploration

python main_DQN.py --agent ez_greedy --cuda 0 --input-dim 50 --max-episodes=1000

TempoRL

python main_UTE.py --agent tdqn --cuda 0 --input-dim 50 --max-episodes=1000 --skip-net-max-skips=10

UTE

python main_UTE.py --agent ute --cuda 0 --input-dim 50 --max-episodes=1000 --skip-net-max-skips=10 --uncertainty-factor=2.0

Gridworlds experiments

cd grid_atari

DDQN

python gridworlds.py --agent q --env lava

TempoRL

python gridworlds.py --agent sq --env lava --max-skips 7

UTE

python gridworlds.py --agent ute --env lava --max-skips 7 --uncertainty-factor -1.5

Atari experiments

cd grid_atari

DDQN

python atari.py --env qbert --env-max-steps 10000 --agent dqn --out-dir ./ --episodes 20000 --training-steps 2500000 --eval-after-n-steps 10000 --seed 12345 --84x84 --eval-n-episodes 3

TempoRL

python atari.py --env qbert --env-max-steps 10000 --agent tdqn --out-dir ./ --episodes 20000 --training-steps 2500000 --eval-after-n-steps 10000 --seed 12345 --84x84 --eval-n-episodes 3

UTE

python atari.py --env qbert --env-max-steps 10000 --agent ute --out-dir ./ --episodes 20000 --training-steps 2500000 --eval-after-n-steps 10000 --seed 12345 --84x84 --eval-n-episodes 3 --uncertainty-factor -0.5

UTE with adaptive lambda

python atari.py --env qbert --env-max-steps 10000 --agent ute_bandit --out-dir ./ --episodes 20000 --training-steps 2500000 --eval-after-n-steps 10000 --seed 12345 --84x84 --eval-n-episodes 3 --uncertainty-factor -0.5

The Chain MDP environment is based on the code for the paper: Randomized Value functions via Multiplicative Normalizing Flows. Ahmed Touati, Harsh Satija, Joshua Romoff, Joelle Pineau, Pascal Vincent. UAI 2019

  title={Randomized value functions via multiplicative normalizing flows},
  author={Touati, Ahmed and Satija, Harsh and Romoff, Joshua and Pineau, Joelle and Vincent, Pascal},
  journal={arXiv preprint arXiv:1806.02315},
  year={2018}
}

The gridworlds/Atari environmnet and TempoRL agent is based on the code for the paper: TempoRL: Learning When to Act

  author    = {André Biedenkapp and Raghu Rajan and Frank Hutter and Marius Lindauer},
  title     = {{T}empo{RL}: Learning When to Act},
  booktitle = {Proceedings of the 38th International Conference on Machine Learning (ICML 2021)},
  year = {2021},
  month     = jul,
}

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
DDPG		DDPG
chain_mdp		chain_mdp
grid_atari		grid_atari
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Uncertainty-aware Temporal Extension

Setup & Requirements

Chain MDP experiments

$\epsilon$z-greedy from Temporally Extended $\epsilon$-Greedy Exploration

TempoRL

UTE

Gridworlds experiments

DDQN

TempoRL

UTE

Atari experiments

DDQN

TempoRL

UTE

UTE with adaptive lambda

About

Releases

Packages

Languages

License

oh-lab/UTE-Uncertainty-aware-Temporal-Extension-

Folders and files

Latest commit

History

Repository files navigation

Uncertainty-aware Temporal Extension

Setup & Requirements

Chain MDP experiments

$\epsilon$z-greedy from Temporally Extended $\epsilon$-Greedy Exploration

TempoRL

UTE

Gridworlds experiments

DDQN

TempoRL

UTE

Atari experiments

DDQN

TempoRL

UTE

UTE with adaptive lambda

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages