Multi Agent Reinforcement Learning with reward machines and Soft Actor Critic
References
SAC: https://arxiv.org/pdf/1812.05905.pdf
RL with Reward Machines: https://arxiv.org/abs/2007.01962
SAC Discrete: https://arxiv.org/pdf/1910.07207.pdf
Phase 1 Results (Implementing SAC-Discrete on a gridworld)
Best model achieved an average reward of 0.9, compared to the benchmark reward of 0.8.