Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chapter4: More episode duration leads to a decrease in policy gradient method! #24

Open
DemonsHunter opened this issue Feb 19, 2021 · 0 comments

Comments

@DemonsHunter
Copy link

Accoring to what authors say in chapter 4, more episode duration will allow the model to hold the game longer.

Then I download the code of chapter 4, run it locally with MAX_EPISODES = 250.

Surprisingly, this makes the model be bad at the task, only 22 times exceed 180s while the original model can make it by 90 times.

And I also reset the model, try with higher MAX_EPISODES, but all of them fail to beat the beginning set.

What may contribute to this phenomenon?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant