reward shaping in Atari #3

merv801 · 2019-12-14T22:02:14Z

Hello. I have ran you algorithm in the Pong game 2 times for about 3k steps each. Once with the clip_rewards=True and the other time with clip_rewards=False.
However in the case of clip_rewards=False it did not progress that much but for clip_rewards=True the results are like yours.
I thought that in the pong environment setting clip_rewards should not have any effect because the rewards are already 0, +1 or -1.
Do you have any idea what is the cause?
Thanks

zplizzi · 2019-12-14T22:23:27Z

Hm, that is strange. The only place that clip_rewards is applied is here:

if self.args.clip_rewards:
    # clip reward to one of {-1, 0, 1}
    step_reward = np.sign(step_reward)

As you say, in pong the rewards are already in this set so it should have no effect. I'd guess that maybe there is just some random variation in the runs that caused the behavior you see? I would try running them again.

zplizzi · 2019-12-14T22:26:19Z

Actually that isn't strictly true, step_reward is actually the sum of rewards over steps_to_skip timesteps. But I don't think it's possible to get multiple rewards within a few steps in Pong - so this still shouldn't matter.

merv801 · 2019-12-15T07:32:47Z

Thanks for your response.
I ran it again with step_reward=False again and this time it is working good, so the first time was indeed a random variation.
However it seems quite strange. I didn't expect to see such a difference between two runs. I have heard that PPO is relatively stable.( the first time agent got stuck in the -8 to 2 range of rewards)

zplizzi · 2019-12-15T22:09:21Z

Yeah, it's possible that the hyperparameters I tested with aren't great (didn't tune them at all), or maybe it would work more reliably with frame stacking (#2). But RL does generally have a good deal of variation even in the more stable algorithms, so who knows.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

reward shaping in Atari #3

reward shaping in Atari #3

merv801 commented Dec 14, 2019

zplizzi commented Dec 14, 2019

zplizzi commented Dec 14, 2019

merv801 commented Dec 15, 2019

zplizzi commented Dec 15, 2019

reward shaping in Atari #3

reward shaping in Atari #3

Comments

merv801 commented Dec 14, 2019

zplizzi commented Dec 14, 2019

zplizzi commented Dec 14, 2019

merv801 commented Dec 15, 2019

zplizzi commented Dec 15, 2019