-
Notifications
You must be signed in to change notification settings - Fork 184
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The test score is different from the DeepMind paper #32
Comments
AFAIK, RMSProp optimizer implementation and screen preprocessing are different from the paper. In the code, RMSProp is implemented here, which is similar to RMSProp by A.Graves (see page 23, eq 40, parameters are not same as DQN paper). simple_dqn uses Neon RMSprop. Notice that The difference in screen preprocessing is mentioned here. Simple DQN uses averaged frame among skipped frame (which is ALE's built-in functionality), instead of max values from successive two frames as the paper. Correct me if I am wrong. |
Yes @mthrok, these are some of the differences. But I'm not sure how much they matter, for example you can easily switch from RMSProp to Adam, which seems to be the preferred optimization method these days. Another important difference is that DeepMind considers loss of life as episode end, something I don't do yet. I would expect more substantial differences from that, but who knows. There is also discussion about matching DeepMind's result in deep-q-learning list: Keeping this issue open till we figure out the differences. |
I got a similar score to the DeepMind paper in kung_fu_master after taking loss of life as the terminal state. I regarded the loss of life as the terminal state but didn't reset the game, which is different from DeepMind. |
Just to clarify, the current implementation does not store test experience, right? |
Two more things to get close to the DeepMind paper.
With the two changes plus one previous fix, 'taking loss of life as game over for train', |
@only4hj Good job! Just to clarify 1), 10,000 env steps is equivalent to 2,500 observation from agent's view point when
BTW, what value of |
@only4hj
|
Thanks @only4hj and @mthrok for wonderful analysis, I included bits of it in README. I would be happy to merge any pull requests regarding this. Especially target network interval and Xavier initialization seem like trivial fixes. |
Regarding repeat_action_probability, https://github.com/only4hj/DeepRL/blob/master/deep_rl_player.py#L165
|
@only4hj Thank you for clarifying. Now I understand that frame skip is processed in their alewrapper and invisible to agent. I was having trouble to set ale's @tambetm |
Thanks @mthrok, merged the PR! Keep me posted if you figure out the network update interval. |
Quoted from the Nature paper:
Since the agent sees the image and makes prediction once every 4th frame (due to action repeat = 4) and it only updates its online network once every 4th prediction (due to update frequency = 4), with target network update frequency = 10000, doesn't it mean that the target network should get updated on the 10000th update or once every 40000 predictions which is once every 160000 frames? |
@kerawits your reasoning seems valid. Because we see only every 4th frame, I think |
I run a code from here and the mean score seems be able to reach 400. BTW, I have change the network architecture to original DQN. The original code have commented the part. |
Hi,
Thank you for the great project.
While testing simple_dqn I found the test score of simple_dqn is different from the DeepMind paper.
The DeepMind paper 'Prioritized Experience Replay' (http://arxiv.org/pdf/1511.05952v3.pdf)
shows learning curves of DQN in Figure 7.
The gray is the original DQN according to the paper.
When comparing the curves in the paper with the attached png files in simple_dqn/results folder,
the test score is somewhat different.
In Breakout the paper says the original DQN reaches around score 320 but the simple_dqn doesn't.
Also in Seaquest the paper says the original DQN reaches more than score 3000 but the simple_dqn doesn't.
The paper doesn't say much about test code or environment of the original DQN result.
Also I'm not sure the result of the paper is using the following DeepMind code or not.
https://sites.google.com/a/deepmind.com/dqn/
Do you have any idea why there are score differences between simple_dqn and the DeepMind paper?
Thank you
The text was updated successfully, but these errors were encountered: