New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Chapter4: More episode duration leads to a decrease in policy gradient method! #24

Open

DemonsHunter opened this issue Feb 19, 2021 · 0 comments

DemonsHunter commented Feb 19, 2021

Accoring to what authors say in chapter 4, more episode duration will allow the model to hold the game longer.

Then I download the code of chapter 4, run it locally with MAX_EPISODES = 250.

Surprisingly, this makes the model be bad at the task, only 22 times exceed 180s while the original model can make it by 90 times.

And I also reset the model, try with higher MAX_EPISODES, but all of them fail to beat the beginning set.

What may contribute to this phenomenon?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment