Name		Name	Last commit message	Last commit date
parent directory ..
configs		configs
figures		figures
gifs		gifs
plot		plot
README.md		README.md
export_wandb_results_to_csv.py		export_wandb_results_to_csv.py
saved_runs.yaml		saved_runs.yaml
train.py		train.py
training.py		training.py
utils.py		utils.py

README.md

Experiments accompanying paper

All experiments are configured using hydra and monitored using Weights & Biases.

Experiment configuratin with Hydra

All experiments use the base hydra config in ./configs/main.yaml but override it differently. The overrides can be seen in ./configs/experiment. An experiment's config can be viewed with:

python train.py +experiment=INSERT_EXPERIMENT_NAME --cfg job

where INSERT_EXPERIMENT_NAME is the filename of an experiment's yaml config in ./configs/experiment. The base config can be displayed with:

python train.py --cfg=job

The experiments are as follows:

Experiment	Description
	`greedy_no_constraint` - We are not able to solve our δ-mode-constrained navigation problem with the greedy exploitation strategy because it leaves the desired dynamics mode.
	`greedy_with_constraint` - Adding the δ-mode-constraint to the greedy exploitation strategy is still not able to solve our δ-mode-constrained navigation problem. This is because the optimisation gets stuck at a local optimum induced by the constraint.
	`moderl` - Our strategy successfully solves our δ-mode-constrained navigation problem by augmenting the greedy exploitation objective with an intrinsic motivation term. Our intrinsic motivation uses the epistmic uncertainty associated with the learned mode constraint to escape local optima induced by the constraint.
	`aleatoric_unc_ablation` - Here we show the importance of using only the epistemic uncertainty for exploration. This experiment augmented the greedy objective with the entropy of the mode indicator variable. It cannot escape the local optimum induced by the mode constraint because the mode indicator variable's entropy is always high at the mode boundary. This motivated formulating a dynamics model which can disentangle the sources of uncertainty in the mode constraint.
	`myopic_ablation` - We motivate why our intrinsic motivatin term considers the joint entroy over a trajectory, instead of summing the entropy at each time step (as is often seen in the literature). This experiment formulated the intrinsic motivation term as the sum of the gating function entropy at each time step. That is, it assumed each time step is independent and did not consider the information gain over an entire trajectory, i.e. the exploration is myopic (aka shortsighted).
	`compare_constraint_levels` - Finally, we compare different constraint levels $\delta \in \{0.1, 0.2, 0.3, 0.4, 0.5\}$ to see how it influences training.

Running experiments

An individual experiment can be run with:

python train.py +experiment=INSERT_EXPERIMENT_NAME

All experiments can be run with:

python train.py  --multirun '+experiment=glob(*)'

or

python train.py --multirun +experiment=greedy_no_constraint,greedy_with_constraint,moderl,aleatoric_unc_ablation,myopic_ablation
python train.py --multirun +experiment=constraint_schedule ++training.random_seed=1,42,69,100,50
python train.py --multirun +experiment=compare_constraint_levels ++training.random_seed=1,42,69,100,50

Plotting figures

Recreate the figures in the paper with:

python plot/plot_all_figures.py --wandb_dir=wandb --saved_runs=saved_runs.yaml

This uses the experiments stored in saved_runs.yaml, which can be reproduced as follows:

Figure 1
```
python train.py +experiment=moderl
```

Figure 2

python train.py +experiment=constraint_schedule

Figure 3 - greedy plots (left)

python train.py --multirun +experiment=greedy_no_constraint,greedy_with_constraint

Figure 3 - myopic ablation plots (right)

python train.py --multirun +experiment=myopic_ablation

Figure 5

python train.py --multirun +experiment=aleatoric_unc_ablation

Figures 6 & 7

python train.py --multirun +experiment=compare_constraint_levels ++training.random_seed=1,42,69,100,50
python train.py --multirun +experiment=constraint_schedule ++training.random_seed=1,42,69,100,50

Running experiments on Triton (Aalto's cluster)

Clone the repo with:

git clone https://github.com/aidanscannell/ModeRL.git ~/python-projects/moderl

Create a virtual environment:

ml python/3.8.7
ml py-virtualenv
python -m venv .venv

Install dependencies with:

cd /path/to/project/
source .venv/bin/activate
pip install -e ".[experiments]"

Run multiple experiments in parallel whilst using hydra's sweep:

python train.py --multirun +experiments=moderl ++training.random_seed=42,1,69,22,4

Copy wandb results from triton with:

rsync -avz -e  "ssh" [email protected]:/home/scannea1/python-projects/moderl/experiments/wandb ./wandb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

experiments

experiments

README.md

Experiments accompanying paper

Experiment configuratin with Hydra

Running experiments

Plotting figures

Running experiments on Triton (Aalto's cluster)

Files

experiments

Directory actions

More options

Directory actions

More options

Latest commit

History

experiments

Folders and files

parent directory

README.md

Experiments accompanying paper

Experiment configuratin with Hydra

Running experiments

Plotting figures

Running experiments on Triton (Aalto's cluster)