-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
reward shaping in Atari #3
Comments
Hm, that is strange. The only place that
As you say, in pong the rewards are already in this set so it should have no effect. I'd guess that maybe there is just some random variation in the runs that caused the behavior you see? I would try running them again. |
Actually that isn't strictly true, |
Thanks for your response. |
Yeah, it's possible that the hyperparameters I tested with aren't great (didn't tune them at all), or maybe it would work more reliably with frame stacking (#2). But RL does generally have a good deal of variation even in the more stable algorithms, so who knows. |
Hello. I have ran you algorithm in the Pong game 2 times for about 3k steps each. Once with the clip_rewards=True and the other time with clip_rewards=False.
However in the case of clip_rewards=False it did not progress that much but for clip_rewards=True the results are like yours.
I thought that in the pong environment setting clip_rewards should not have any effect because the rewards are already 0, +1 or -1.
Do you have any idea what is the cause?
Thanks
The text was updated successfully, but these errors were encountered: