Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reinforcement learning curriculum #61

Closed
wants to merge 8 commits into from

Conversation

zslrmhb
Copy link
Contributor

@zslrmhb zslrmhb commented Aug 1, 2023

Added Lesson 3 (Lesson 4 in progress) and changed some structures of the previous lessons (such as moving all the solutions to a separate file and restructuring the exercises).

As for Lesson 3, should I be more in-depth in talking about the code (and the math part) or the comments in the code are sufficient (besides adding some references)

@zslrmhb zslrmhb requested a review from krmiddlebrook August 1, 2023 00:05
@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@@ -0,0 +1,1158 @@
{
Copy link
Collaborator

@krmiddlebrook krmiddlebrook Aug 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor rephrasing for readability: "Let us start with the fun example of estimating π..."


Reply via ReviewNB

@@ -0,0 +1,1158 @@
{
Copy link
Collaborator

@krmiddlebrook krmiddlebrook Aug 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor typo. Corrected: "What have you noticed?"


Reply via ReviewNB

@@ -0,0 +1,1158 @@
{
Copy link
Collaborator

@krmiddlebrook krmiddlebrook Aug 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor typos. Corrected: "How can Monte Carlo Method be applied..." and "...think of a game that you have played before"


Reply via ReviewNB

@@ -0,0 +1,1158 @@
{
Copy link
Collaborator

@krmiddlebrook krmiddlebrook Aug 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Is this a typo: "There is on 1 optimal path (fewest blocks to take)"? I'm guessing it's supposed to say, "There is only 1 optimal path (fewest blocks to take)"


Reply via ReviewNB

@@ -0,0 +1,1158 @@
{
Copy link
Collaborator

@krmiddlebrook krmiddlebrook Aug 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected typo: "I will get a..."


Reply via ReviewNB

@@ -0,0 +1,403 @@
{
Copy link
Collaborator

@krmiddlebrook krmiddlebrook Aug 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rephrase: "...we have talk(ed)..." to "...we talked..."

Corrected typo: "...value-based methods...", "...policy-based methods.", and "...difference between value-based and policy-based methods."


Reply via ReviewNB

@@ -0,0 +1,403 @@
{
Copy link
Collaborator

@krmiddlebrook krmiddlebrook Aug 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected typo: "...which will lead to an optimal policy...


Reply via ReviewNB

@@ -0,0 +1,403 @@
{
Copy link
Collaborator

@krmiddlebrook krmiddlebrook Aug 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be useful to add a note after the equation picture explaining to the kids that it's ok if they don't recognize or understand what all the symbols are in the equation. Emphasize that the important part is that they recognize that we use an intelligent deep learning approach to learn the optimal policy via gradient descent.


Reply via ReviewNB

@@ -0,0 +1,403 @@
{
Copy link
Collaborator

@krmiddlebrook krmiddlebrook Aug 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rephrase to be supportive: Instead of "should be trivial..." something like "Let's see how well you've been following along, is the action space here discrete or continuous?"


Reply via ReviewNB

@@ -0,0 +1,301 @@
{
Copy link
Collaborator

@krmiddlebrook krmiddlebrook Aug 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update the Mountain Car link to use the new gymnasium website: https://gymnasium.farama.org/environments/classic_control/mountain_car/.


Reply via ReviewNB

@krmiddlebrook
Copy link
Collaborator

A few suggested fixes, mostly correcting typos and minor rephrasing stuff. The only section I felt could overwhelm students was the equation in lesson 4 on policy-based methods. The equation doesn't need to be removed, but should include a note below it to help relax the students (see my review comment for suggestions).

Overall the content looks great! It's a good balance between depth and usefulness, plus hands-on interactivity. I'm loving what you're creating! Keep it up Hongbin!

Copy link
Collaborator

@krmiddlebrook krmiddlebrook left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few suggested fixes, mostly correcting typos and minor rephrasing stuff. The only section I felt could overwhelm students was the equation in lesson 4 on policy-based methods. The equation doesn't need to be removed but should include a note below it to help relax students (see: #61 (comment)).

Overall the content looks great! It's a good balance between depth and usefulness, plus hands-on interactivity. I'm loving what you're creating! Keep it up, Hongbin!

@krmiddlebrook krmiddlebrook added the enhancement New feature or request label Aug 10, 2023
@zslrmhb zslrmhb closed this Aug 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
2 participants