Following is a list of recent papers in reinforcement learning that we studied as a part of this course. I apologize for not include detailed attribution to the authors of these papers.
Title of the paper | Category |
---|---|
Trust Region Policy Optimization | Policy optimization |
Proximal Policy Optimization Algorithms | Policy optimization |
Asynchronous Methods for Deep Reinforcement Learning | Policy optimization, actor critic |
Playing Atari with Deep Reinforcement Learning | Application - games, Deep Q learning |
Deep Reinforcement Learning with Double Q-learning | Deep Q-learning |
Prioritized Experience Replay | Deep Q-learning |
Deep Reinforcement Learning with Double Q-Learning | Deep Q-learning |
Deep Exploration via Bootstrapped DQN | Deep Q-learning, exploration |
Noisy Nets for Exploration | Deep Q-learning, exploration |
Hindsight experience replay | Deep Q-learning |
Learning with Opponent-Learning Awareness | Multi-agent |
Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments | Multi-agent, Policy optimization |
Value Iteration Networks | Q-learning, approximate DP |
Learning Tetris Using the Noisy Cross-Entropy Method | Cross-entropy |
Algorithms for inverse reinforcement learning | inverse RL |
Dueling Network Architectures for Deep Reinforcement Learning | Deep Q-learning |
Learning Features of Music from Scratch | Application -music |
Generating Music by Fine-Tuning Recurrent Neural Network with Reinforcement Learning | Application -music |
Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm | Application - games |
StarCraft II: A New Challenge To Reinforcement Learning | Application - games |
Mastering the game of Go with deep neural networks and tree search, | Application - games |
PAC Model-Free Reinforcement Learning | Theory |
UCB Exploration via Q-Ensembles | Theory |
Minimax Regret Bounds for Reinforcement Learning | Theory |
Efficient Reinforcement learning via Posterior Sampling | Theory |
Deep exploration via Randomized Value Functions, | Theory, Exploration |
Neural Combinatorial Optimization with Reinforcement Learning | Application - combinatorial opt |
Deep Direct Reinforcement Learning for Financial Signal Representation and Trading | Application -finance |
Learning to optimize | Application - combinatorial opt |
End-to-End Offline Goal-Oriented Dialog Policy Learning via Policy Gradient, | Application - NLP |
Deep Reinforcement Learning for Dialogue Generation | Application - NLP |