-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reinforcement learning curriculum #61
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
@@ -0,0 +1,1158 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor rephrasing for readability: "Let us start with the fun example of estimating π..."
Reply via ReviewNB
@@ -0,0 +1,1158 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -0,0 +1,1158 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor typos. Corrected: "How can Monte Carlo Method be applied..." and "...think of a game that you have played before"
Reply via ReviewNB
@@ -0,0 +1,1158 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Is this a typo: "There is on 1 optimal path (fewest blocks to take)"? I'm guessing it's supposed to say, "There is only 1 optimal path (fewest blocks to take)"
Reply via ReviewNB
@@ -0,0 +1,1158 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -0,0 +1,403 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rephrase: "...we have talk(ed)..." to "...we talked..."
Corrected typo: "...value-based methods...", "...policy-based methods.", and "...difference between value-based and policy-based methods."
Reply via ReviewNB
@@ -0,0 +1,403 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -0,0 +1,403 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be useful to add a note after the equation picture explaining to the kids that it's ok if they don't recognize or understand what all the symbols are in the equation. Emphasize that the important part is that they recognize that we use an intelligent deep learning approach to learn the optimal policy via gradient descent.
Reply via ReviewNB
@@ -0,0 +1,403 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rephrase to be supportive: Instead of "should be trivial..." something like "Let's see how well you've been following along, is the action space here discrete or continuous?"
Reply via ReviewNB
@@ -0,0 +1,301 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update the Mountain Car link to use the new gymnasium website: https://gymnasium.farama.org/environments/classic_control/mountain_car/.
Reply via ReviewNB
A few suggested fixes, mostly correcting typos and minor rephrasing stuff. The only section I felt could overwhelm students was the equation in lesson 4 on policy-based methods. The equation doesn't need to be removed, but should include a note below it to help relax the students (see my review comment for suggestions). Overall the content looks great! It's a good balance between depth and usefulness, plus hands-on interactivity. I'm loving what you're creating! Keep it up Hongbin! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few suggested fixes, mostly correcting typos and minor rephrasing stuff. The only section I felt could overwhelm students was the equation in lesson 4 on policy-based methods. The equation doesn't need to be removed but should include a note below it to help relax students (see: #61 (comment)).
Overall the content looks great! It's a good balance between depth and usefulness, plus hands-on interactivity. I'm loving what you're creating! Keep it up, Hongbin!
Added Lesson 3 (Lesson 4 in progress) and changed some structures of the previous lessons (such as moving all the solutions to a separate file and restructuring the exercises).
As for Lesson 3, should I be more in-depth in talking about the code (and the math part) or the comments in the code are sufficient (besides adding some references)