You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I think there's a little bug in many of your scripts in that you update the returns for the last step with a post-terminal step. Thus, your value (policy) functions wind up growing (unbounded?) near the terminal state. For example, in rl2/mountaincar you have a "train" boolean but it is never set to false for the last step.
The text was updated successfully, but these errors were encountered:
Hmm.. I only found this train flag in one file (pg_theano) which was just a remnant from an old version (not being used). Could you elaborate on what you were referring to?
Actually there is an issue I found which is most scripts don't consider the value of the terminal state to be 0 (and hence the return is just the reward), but that doesn't sound like what you're referring to.
It's been awhile since I thought about this but I think my kludge fix of not updating on the last step is effectively (but not precisely) setting the value of the terminal state to be zero. Setting the value of the terminal state to zero will fix the fundamental issue that the value function grows to extremely large values (i.e. much much larger than the maximum possible reward).
I think there's a little bug in many of your scripts in that you update the returns for the last step with a post-terminal step. Thus, your value (policy) functions wind up growing (unbounded?) near the terminal state. For example, in rl2/mountaincar you have a "train" boolean but it is never set to false for the last step.
The text was updated successfully, but these errors were encountered: