Masters Thesis Project: Fall 2020
Thesis Title: Bike Control and Navigation using Reinforcement Learning
Paper referred for the DDPG algorithm: "https://arxiv.org/pdf/1509.02pdf971."
Following the DDPG algorithm procedure, we need the old state, action, reward and new states as output for every step
- Class for Storing the previous values (rewards and states) -
- Class for Actor DNN : class Actor(Obj)
- Class for Critic DNN: class Critic(Obj)
- Class for Ornstein Unlenbeck (this would be a class defined for Noise) - Ornstein-Uhlenbeck process (Uhlenbeck & Ornstein, 1930) - page 4 - (https://arxiv.org/pdf/1509.02pdf971.)
- Class or function for Memory size and/or terminal size
- Loading the environment and deployment of Algorithm.
- Class for Agent to do the learning and make use of the classes above: class Agent(Obj)
- Deterministic policy is action based (i.e it O/P is "Action" and Not a "Prob value")
- Limiting the constraints in the environment designed