Skip to content

Latest commit

 

History

History

experiments

Experiments accompanying paper

All experiments are configured using hydra and monitored using Weights & Biases.

Experiment configuratin with Hydra

All experiments use the base hydra config in ./configs/main.yaml but override it differently. The overrides can be seen in ./configs/experiment. An experiment's config can be viewed with:

python train.py +experiment=INSERT_EXPERIMENT_NAME --cfg job

where INSERT_EXPERIMENT_NAME is the filename of an experiment's yaml config in ./configs/experiment. The base config can be displayed with:

python train.py --cfg=job

The experiments are as follows:

Experiment Description
<b>Greedy exploitation WITHOUT mode constraint</b> `greedy_no_constraint` - We are not able to solve our δ-mode-constrained navigation problem with the greedy exploitation strategy because it leaves the desired dynamics mode.
<b>Greedy exploitation WITH mode constraint</b> `greedy_with_constraint` - Adding the δ-mode-constraint to the greedy exploitation strategy is still not able to solve our δ-mode-constrained navigation problem. This is because the optimisation gets stuck at a local optimum induced by the constraint.
<b>ModeRL (ours)</b> `moderl` - Our strategy successfully solves our δ-mode-constrained navigation problem by augmenting the greedy exploitation objective with an intrinsic motivation term. Our intrinsic motivation uses the epistmic uncertainty associated with the learned mode constraint to escape local optima induced by the constraint.
<b>Aleatoric uncertainty (ablation)</b> `aleatoric_unc_ablation` - Here we show the importance of using only the epistemic uncertainty for exploration. This experiment augmented the greedy objective with the entropy of the mode indicator variable. It cannot escape the local optimum induced by the mode constraint because the mode indicator variable's entropy is always high at the mode boundary. This motivated formulating a dynamics model which can disentangle the sources of uncertainty in the mode constraint.
<b>Myopic intrinsic exploration (ablation)</b> `myopic_ablation` - We motivate why our intrinsic motivatin term considers the joint entroy over a trajectory, instead of summing the entropy at each time step (as is often seen in the literature). This experiment formulated the intrinsic motivation term as the sum of the gating function entropy at each time step. That is, it assumed each time step is independent and did not consider the information gain over an entire trajectory, i.e. the exploration is myopic (aka shortsighted).
<b>Constraint level comparison (ablation)</b> `compare_constraint_levels` - Finally, we compare different constraint levels $\delta \in \{0.1, 0.2, 0.3, 0.4, 0.5\}$ to see how it influences training.

Running experiments

An individual experiment can be run with:

python train.py +experiment=INSERT_EXPERIMENT_NAME

All experiments can be run with:

python train.py  --multirun '+experiment=glob(*)'

or

python train.py --multirun +experiment=greedy_no_constraint,greedy_with_constraint,moderl,aleatoric_unc_ablation,myopic_ablation
python train.py --multirun +experiment=constraint_schedule ++training.random_seed=1,42,69,100,50
python train.py --multirun +experiment=compare_constraint_levels ++training.random_seed=1,42,69,100,50

Plotting figures

Recreate the figures in the paper with:

python plot/plot_all_figures.py --wandb_dir=wandb --saved_runs=saved_runs.yaml

This uses the experiments stored in saved_runs.yaml, which can be reproduced as follows:

  • Figure 1
    python train.py +experiment=moderl
  • Figure 2
    python train.py +experiment=constraint_schedule
  • Figure 3 - greedy plots (left)
    python train.py --multirun +experiment=greedy_no_constraint,greedy_with_constraint
  • Figure 3 - myopic ablation plots (right)
    python train.py --multirun +experiment=myopic_ablation
  • Figure 5
    python train.py --multirun +experiment=aleatoric_unc_ablation
  • Figures 6 & 7
    python train.py --multirun +experiment=compare_constraint_levels ++training.random_seed=1,42,69,100,50
    python train.py --multirun +experiment=constraint_schedule ++training.random_seed=1,42,69,100,50

Running experiments on Triton (Aalto's cluster)

Clone the repo with:

git clone https://github.com/aidanscannell/ModeRL.git ~/python-projects/moderl

Create a virtual environment:

ml python/3.8.7
ml py-virtualenv
python -m venv .venv

Install dependencies with:

cd /path/to/project/
source .venv/bin/activate
pip install -e ".[experiments]"

Run multiple experiments in parallel whilst using hydra's sweep:

python train.py --multirun +experiments=moderl ++training.random_seed=42,1,69,22,4

Copy wandb results from triton with:

rsync -avz -e  "ssh" [email protected]:/home/scannea1/python-projects/moderl/experiments/wandb ./wandb