Update after terminal state #13

ericmock · 2017-11-19T21:31:32Z

I think there's a little bug in many of your scripts in that you update the returns for the last step with a post-terminal step. Thus, your value (policy) functions wind up growing (unbounded?) near the terminal state. For example, in rl2/mountaincar you have a "train" boolean but it is never set to false for the last step.

lazyprogrammer · 2018-02-04T00:59:25Z

Hmm.. I only found this train flag in one file (pg_theano) which was just a remnant from an old version (not being used). Could you elaborate on what you were referring to?

Actually there is an issue I found which is most scripts don't consider the value of the terminal state to be 0 (and hence the return is just the reward), but that doesn't sound like what you're referring to.

ericmock · 2018-02-04T04:09:10Z

It's been awhile since I thought about this but I think my kludge fix of not updating on the last step is effectively (but not precisely) setting the value of the terminal state to be zero. Setting the value of the terminal state to zero will fix the fundamental issue that the value function grows to extremely large values (i.e. much much larger than the maximum possible reward).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update after terminal state #13

Update after terminal state #13

ericmock commented Nov 19, 2017

lazyprogrammer commented Feb 4, 2018

ericmock commented Feb 4, 2018

Update after terminal state #13

Update after terminal state #13

Comments

ericmock commented Nov 19, 2017

lazyprogrammer commented Feb 4, 2018

ericmock commented Feb 4, 2018