A minimal implementation of the PPO algorithm written in the Julia programming language. Both actor and critic networks are simple MLPs. The value function is learned with the temporal difference bootstrap update.
PPO paper: https://arxiv.org/abs/1707.06347