This codebase was used to generate the results documented in the paper "Improving Exploration in Soft-Actor-Critic with Normalizing Flows Policies". Patrick Nadeem Ward*12, Ariella Smofsky*12, Avishek Joey Bose12. INNF Workshop ICML 2019.
- * Equal contribution, 1 McGill University, 2 Mila
- Correspondence to:
- Patrick Nadeem Ward <Github: NadeemWard, [email protected]>
- Ariella Smofsky <Github: asmoog, [email protected]>
Gaussian policy on Dense Gridworld environment with REINFORCE:
TODO
Gaussian policy on Sparse Gridworld environment with REINFORCE:
TODO
Gaussian policy on Dense Gridworld environment with reparametrization:
python main.py --namestr=G-S-DG-CG --make_cont_grid --batch_size=128 --replay_size=100000 --hidden_size=64 --num_steps=100000 --policy=Gaussian --smol --comet --dense_goals --silent
Gaussian policy on Sparse Gridworld environment with reparametrization:
python main.py --namestr=G-S-CG --make_cont_grid --batch_size=128 --replay_size=100000 --hidden_size=64 --num_steps=100000 --policy=Gaussian --smol --comet --silent
Normalizing Flow policy on Dense Gridworld environment:
TODO
Normalizing Flow policy on Sparse Gridworld environment:
TODO
To run an experiment with a different policy distribution, modify the --policy
flag.
- Implementation of SAC based on PyTorch SAC.