At the Interface of Basal Cognition and Reinforcement Learning

Introduction

RL - unique method among attempts to build intelligent agents
Point to review of RL (lil's log)
Summarize: Agent interacting with environment. Rewards, Environment Dynamics ...
Summarize: Goal is to find a strategy (Policy) that maximizes expected sum of rewards (the return).
Summarize: Methods range from Value Function based methods, to Direct Policy Search methods
Summarize: Modified to account for multiple agents
Only constant is the formulation of an agent interacting with the environment
Formalized by the abstract notion of an MDP

RL's Foundational Stone: The MDP

Explain the MDP formulation

The Foundations' Foundation: Neuroscience & Psychology

What's the rationale behind the MDP?
Explain, from primary sources, how the MDP formulation draws inspiration from the cognitive sciences

Basal Cognition: What If?

Suppose we drew inspiration instead from basal cognition
How would that change the structure of our abstraction?
Would that be a welcome change? in what ways would it be a +; in what ways would it be a -?
How does that change affect downstream algorithms?

A Revised Foundation for RL

Next Steps

Primary Sources