New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Reinforcement learning curriculum #61

Closed

zslrmhb wants to merge 8 commits into main from Reinforcement-Learning-Curriculum

Contributor

zslrmhb commented Aug 1, 2023

Added Lesson 3 (Lesson 4 in progress) and changed some structures of the previous lessons (such as moving all the solutions to a separate file and restructuring the exercises).

As for Lesson 3, should I be more in-depth in talking about the code (and the math part) or the comments in the code are sufficient (besides adding some references)

zslrmhb added 5 commits

July 12, 2023 09:10


          added the challenge for lesson 1

8ecfc9a


          added lesson 1

235b810


          added draft for lesson 2

1218a38


          add mdp challenge

eb74fc2


          updated lesson 1 and 2, added lesson 3 and moved all the solutions to…

40a28b3

… a separate file.

zslrmhb requested a review from krmiddlebrook

August 1, 2023 00:05

review-notebook-app bot commented Aug 1, 2023

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

This was linked to issues Aug 1, 2023

Lesson 1: Introduction to Reinforcement Learning #48

Open

Lesson 2: Markov Decision Process #49

Open

Lesson 3: Valued-Based Learning Method #50

Open

Lesson 1 challenge #56

Open

Lesson 2 challenge #60

Open

Lesson 3 challenge #62

Open


          added lesson-4 content

b04cadc

krmiddlebrook reviewed

View reviewed changes

reinforcement_learning/Lesson-3-Value-Based-Learning-Method-draft.ipynb

    
            @@ -0,0 +1,1158 @@
          
              {

Collaborator

krmiddlebrook Aug 10, 2023 •

edited

Loading

Minor rephrasing for readability: "Let us start with the fun example of estimating π..."

Reply via ReviewNB

reinforcement_learning/Lesson-3-Value-Based-Learning-Method-draft.ipynb

    
            @@ -0,0 +1,1158 @@
          
              {

Collaborator

krmiddlebrook Aug 10, 2023 •

edited

Loading

Minor typo. Corrected: "What have you noticed?"

Reply via ReviewNB

reinforcement_learning/Lesson-3-Value-Based-Learning-Method-draft.ipynb

    
            @@ -0,0 +1,1158 @@
          
              {

Collaborator

krmiddlebrook Aug 10, 2023 •

edited

Loading

Minor typos. Corrected: "How can Monte Carlo Method be applied..." and "...think of a game that you have played before"

Reply via ReviewNB

reinforcement_learning/Lesson-3-Value-Based-Learning-Method-draft.ipynb

    
            @@ -0,0 +1,1158 @@
          
              {

Collaborator

krmiddlebrook Aug 10, 2023 •

edited

Loading

Is this a typo: "There is on 1 optimal path (fewest blocks to take)"? I'm guessing it's supposed to say, "There is only 1 optimal path (fewest blocks to take)"

Reply via ReviewNB

reinforcement_learning/Lesson-3-Value-Based-Learning-Method-draft.ipynb

    
            @@ -0,0 +1,1158 @@
          
              {

Collaborator

krmiddlebrook Aug 10, 2023 •

edited

Loading

Corrected typo: "I will get a..."

Reply via ReviewNB

reinforcement_learning/Lesson-4-Policy-Based-Learning-Method-draft.ipynb

		@@ -0,0 +1,403 @@
		{

Collaborator

krmiddlebrook Aug 10, 2023 •

edited

Loading

Rephrase: "...we have talk(ed)..." to "...we talked..."

Corrected typo: "...value-based methods...", "...policy-based methods.", and "...difference between value-based and policy-based methods."

Reply via ReviewNB

reinforcement_learning/Lesson-4-Policy-Based-Learning-Method-draft.ipynb

		@@ -0,0 +1,403 @@
		{

Collaborator

krmiddlebrook Aug 10, 2023 •

edited

Loading

Corrected typo: "...which will lead to an optimal policy...

Reply via ReviewNB

reinforcement_learning/Lesson-4-Policy-Based-Learning-Method-draft.ipynb

		@@ -0,0 +1,403 @@
		{

Collaborator

krmiddlebrook Aug 10, 2023 •

edited

Loading

Might be useful to add a note after the equation picture explaining to the kids that it's ok if they don't recognize or understand what all the symbols are in the equation. Emphasize that the important part is that they recognize that we use an intelligent deep learning approach to learn the optimal policy via gradient descent.

Reply via ReviewNB

reinforcement_learning/Lesson-4-Policy-Based-Learning-Method-draft.ipynb

		@@ -0,0 +1,403 @@
		{

Collaborator

krmiddlebrook Aug 10, 2023 •

edited

Loading

Rephrase to be supportive: Instead of "should be trivial..." something like "Let's see how well you've been following along, is the action space here discrete or continuous?"

Reply via ReviewNB

reinforcement_learning/Lesson-4-challenge-draft.ipynb

		@@ -0,0 +1,301 @@
		{

Collaborator

krmiddlebrook Aug 10, 2023 •

edited

Loading

Update the Mountain Car link to use the new gymnasium website: https://gymnasium.farama.org/environments/classic_control/mountain_car/.

Reply via ReviewNB

Collaborator

krmiddlebrook commented Aug 10, 2023

A few suggested fixes, mostly correcting typos and minor rephrasing stuff. The only section I felt could overwhelm students was the equation in lesson 4 on policy-based methods. The equation doesn't need to be removed, but should include a note below it to help relax the students (see my review comment for suggestions).

Overall the content looks great! It's a good balance between depth and usefulness, plus hands-on interactivity. I'm loving what you're creating! Keep it up Hongbin!

krmiddlebrook requested changes

View reviewed changes

Collaborator

krmiddlebrook left a comment •

edited

Loading

A few suggested fixes, mostly correcting typos and minor rephrasing stuff. The only section I felt could overwhelm students was the equation in lesson 4 on policy-based methods. The equation doesn't need to be removed but should include a note below it to help relax students (see: #61 (comment)).

Overall the content looks great! It's a good balance between depth and usefulness, plus hands-on interactivity. I'm loving what you're creating! Keep it up, Hongbin!

krmiddlebrook assigned zslrmhb

krmiddlebrook added the enhancement label

zslrmhb added 2 commits

August 11, 2023 16:46


          added Lesson 5 and Lesson 6 contents

246fcee


          fix typos in lesson 3 and 4

c185a8c

zslrmhb closed this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels