Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Q-Learning pseudocode | Mathematical notation #432

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

fardinafdideh
Copy link
Contributor

Hi,
My remark is about the mathematical notation of Q-Learning pseudocode in unit2.ipynb.
I found the following notation a little bit confusing:
Q(s,a) + lr [R(s,a) + gamma * max Q(s',a') - Q(s,a)]
Maximization should be taken over all possible values for the action variable (second variable) of the two-variable function Q, while the above expression, i.e., max Q(s',a'), maximizes the Q at the specified points of s' and a' as its first and second variable. It can become clearer if the general variables and specified points are represented with small and capital letters, respectively, e.g., Q(s, a) function at the specified points s=S and a=A can be represented as Q(S, A).
So:

  • Current version: max Q(s',a') implies maximization of the two-variable function Q at the specifief points of s' and a' (since s' has been defined to be a specified point).
  • Suggested version: max_a Q(S',a) implies maximization of the Q function at the specific point of S' (as its first variable) and over its second variable, i.e., a.

@simoninithomas simoninithomas mentioned this pull request Dec 12, 2023
11 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant