Skip to content
This repository has been archived by the owner on Jul 21, 2020. It is now read-only.

Latest commit

 

History

History
 
 

week06_policy_based

Materials

More materials

  • Actually proving the policy gradient for discounted rewards - article

  • On variance of policy gradient and optimal baselines: article, another article

  • Learn Advatangeg Actor Critic with a comic

  • Generalizing log-derivative trick - url

  • Combining policy gradient and q-learning - arxiv

  • Variational perspective on reinforcement learning (from DeepBayes) - pdf

  • Adversarial review of policy gradient - blog

Run seminar notebook in colab: Open In Colab

Run optional homework notebook in colab: Open In Colab