Reinforcement learning Reinforcement learning practice lab 1 Basic maze vs minataur problem Creating a MDP and doing value iteration to converge to a policy. Police vs Bank Robber Q learning and Sarsa algorithm for robber to get maximum reward in the game.