- RL - unique method among attempts to build intelligent agents
- Point to review of RL (lil's log)
- Summarize: Agent interacting with environment. Rewards, Environment Dynamics ...
- Summarize: Goal is to find a strategy (Policy) that maximizes expected sum of rewards (the return).
- Summarize: Methods range from Value Function based methods, to Direct Policy Search methods
- Summarize: Modified to account for multiple agents
- Only constant is the formulation of an agent interacting with the environment
- Formalized by the abstract notion of an MDP
- Explain the MDP formulation
- What's the rationale behind the MDP?
- Explain, from primary sources, how the MDP formulation draws inspiration from the cognitive sciences
- Suppose we drew inspiration instead from basal cognition
- How would that change the structure of our abstraction?
- Would that be a welcome change? in what ways would it be a +; in what ways would it be a -?
- How does that change affect downstream algorithms?