Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to run Atari environments? #1

Open
ghost opened this issue Apr 27, 2017 · 6 comments
Open

How to run Atari environments? #1

ghost opened this issue Apr 27, 2017 · 6 comments

Comments

@ghost
Copy link

ghost commented Apr 27, 2017

Hi @EndingCredits,

this is really cool that you got the NEC working 👍

Have you tried to run your code on the Atari environments, in Open AI gym?

I tried to train on Pong, but I got this error,

(tf_py2_tf_1_0) ajay@ajay-h8-1170uk:~/PythonProjects/nn_q_learning_tensorflow-master$ python main.py --env PongDeterministic-v3
Namespace(EWC=0.0, EWC_decay=0.999, batch_size=4, beta=0, chk_dir=None, chk_name='model', discount=0.9, display_step=2500, double_q=1, env='PongDeterministic-v3', epsilon=0.1, epsilon_anneal=500000, epsilon_final=0.1, layer_sizes=[20], learning_rate=0.001, memory_size=1000, play_from=None, reg=0, resume_from=None, target_step=1000, training_iters=500000, use_target=True)
2017-04-27 10:54:06.210986: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-04-27 10:54:06.211009: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-04-27 10:54:06.211018: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
[2017-04-27 10:54:06,211] Making new env: PongDeterministic-v3
WARNING:tensorflow:From main.py:314: initialize_all_variables (from tensorflow.python.ops.variables) is deprecated and will be removed after 2017-03-02.
Instructions for updating:
Use `tf.global_variables_initializer` instead.
[2017-04-27 10:54:06,571] From main.py:314: initialize_all_variables (from tensorflow.python.ops.variables) is deprecated and will be removed after 2017-03-02.
Instructions for updating:
Use `tf.global_variables_initializer` instead.
  0%|                                      | 0/500000 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "main.py", line 470, in <module>
    tf.app.run()
  File "/home/ajay/anaconda3/envs/tf_py2_tf_1_0/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 48, in run
    _sys.exit(main(_sys.argv[:1] + flags_passthrough))
  File "main.py", line 326, in main
    act, q_ = agent.predict(state)
  File "main.py", line 101, in predict
    q = self.session.run(self.pred_q, feed_dict={self.state: [state]})
  File "/home/ajay/anaconda3/envs/tf_py2_tf_1_0/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 778, in run
    run_metadata_ptr)
  File "/home/ajay/anaconda3/envs/tf_py2_tf_1_0/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 961, in _run
    % (np_val.shape, subfeed_t.name, str(subfeed_t.get_shape())))
ValueError: Cannot feed value of shape (1, 210, 160, 3) for Tensor u'Placeholder:0', which has shape '(?, 210)'

I guess it might be related to TF v1.0, does this repo use an earlier version?

Thank a lot for your help,

Aj

@EndingCredits
Copy link
Owner

EndingCredits commented Apr 27, 2017

Hi Aj,
Unfortunately this code is only set up for environments where the input is a 1-dimensional vector. It isn't too hard to adapt it to image observations, although getting it so inputs are the last 4 frames is a bit of a pain. (Yes, I use an earlier version of TF, but that isn't the issue here, although it may cause problems elsewhere.)

Do the implementations run for you if you run Cartpole-v0? It would be nice to know that this works on other machines.

Also, main.py runs DQN (with some extras), a2c.py runs an actor-advantage-critic algorithm (using replay mem rather than distributed rep) and NEC.py runs the NEC agent.

-Will

EDIT: Just remembered that a2c.py should be set up to work with Atari envs. Have a look at that if you want to look at adapting the others.

@ghost
Copy link
Author

ghost commented Apr 27, 2017

Hi Will,

thanks a lot for the help 👍 - I'm just working through your code and the paper now.

Cartpole seems to work well - I haven’t checked against my A3C implementations, but from memory I think it looks better,

11:28:21,  450000/500000it | avg_r: 1.000, avg_q: 9.169, avr_ep_r: 113.7, max_ep_r: 127.0, num_eps: 22, epsilon: 0.100, ewc:  0.0
11:28:28,  452500/500000it | avg_r: 1.000, avg_q: 9.258, avr_ep_r: 104.6, max_ep_r: 137.0, num_eps: 24, epsilon: 0.100, ewc:  0.0
11:28:34,  455000/500000it | avg_r: 1.000, avg_q: 9.130, avr_ep_r: 118.7, max_ep_r: 200.0, num_eps: 21, epsilon: 0.100, ewc:  0.0
11:28:40,  457500/500000it | avg_r: 1.000, avg_q: 9.561, avr_ep_r: 110.0, max_ep_r: 200.0, num_eps: 23, epsilon: 0.100, ewc:  0.0
11:28:46,  460000/500000it | avg_r: 1.000, avg_q: 9.550, avr_ep_r: 89.7, max_ep_r: 200.0, num_eps: 28, epsilon: 0.100, ewc:  0.0
11:28:53,  462500/500000it | avg_r: 1.000, avg_q: 9.625, avr_ep_r: 133.0, max_ep_r: 200.0, num_eps: 19, epsilon: 0.100, ewc:  0.0
11:28:59,  465000/500000it | avg_r: 1.000, avg_q: 9.554, avr_ep_r: 113.6, max_ep_r: 149.0, num_eps: 22, epsilon: 0.100, ewc:  0.0
11:29:05,  467500/500000it | avg_r: 1.000, avg_q: 9.576, avr_ep_r: 130.9, max_ep_r: 200.0, num_eps: 19, epsilon: 0.100, ewc:  0.0
11:29:12,  470000/500000it | avg_r: 1.000, avg_q: 9.381, avr_ep_r: 126.8, max_ep_r: 169.0, num_eps: 19, epsilon: 0.100, ewc:  0.0
11:29:18,  472500/500000it | avg_r: 1.000, avg_q: 9.605, avr_ep_r: 137.9, max_ep_r: 200.0, num_eps: 18, epsilon: 0.100, ewc:  0.0
11:29:24,  475000/500000it | avg_r: 1.000, avg_q: 9.462, avr_ep_r: 136.7, max_ep_r: 200.0, num_eps: 19, epsilon: 0.100, ewc:  0.0
11:29:31,  477500/500000it | avg_r: 1.000, avg_q: 9.304, avr_ep_r: 118.1, max_ep_r: 142.0, num_eps: 21, epsilon: 0.100, ewc:  0.0
11:29:37,  480000/500000it | avg_r: 1.000, avg_q: 9.319, avr_ep_r: 98.4, max_ep_r: 121.0, num_eps: 26, epsilon: 0.100, ewc:  0.0
11:29:43,  482500/500000it | avg_r: 1.000, avg_q: 9.044, avr_ep_r: 119.0, max_ep_r: 200.0, num_eps: 21, epsilon: 0.100, ewc:  0.0
11:29:50,  485000/500000it | avg_r: 1.000, avg_q: 9.231, avr_ep_r: 109.2, max_ep_r: 158.0, num_eps: 22, epsilon: 0.100, ewc:  0.0
11:29:56,  487500/500000it | avg_r: 1.000, avg_q: 9.153, avr_ep_r: 113.5, max_ep_r: 188.0, num_eps: 22, epsilon: 0.100, ewc:  0.0
11:30:02,  490000/500000it | avg_r: 1.000, avg_q: 9.372, avr_ep_r: 124.8, max_ep_r: 200.0, num_eps: 20, epsilon: 0.100, ewc:  0.0
11:30:08,  492500/500000it | avg_r: 1.000, avg_q: 9.031, avr_ep_r: 144.8, max_ep_r: 200.0, num_eps: 17, epsilon: 0.100, ewc:  0.0
11:30:15,  495000/500000it | avg_r: 1.000, avg_q: 9.136, avr_ep_r: 98.4, max_ep_r: 200.0, num_eps: 26, epsilon: 0.100, ewc:  0.0
11:30:21,  497500/500000it | avg_r: 1.000, avg_q: 9.100, avr_ep_r: 149.2, max_ep_r: 200.0, num_eps: 17, epsilon: 0.100, ewc:  0.0
100%|████████████████████████| 500000/500000 [20:46<00:00, 401.28it/s]

I'll try to get it working for the Atari envs too :) If you're interested there's a fairly clean implement in PyTorch

Looks like a fun project 👍

All the best - Aj

PS - I've only read the paper quickly, but it seems there's no need for the actor-critic stuff in a2c?

@ghost
Copy link
Author

ghost commented May 20, 2017

Hi Will,

I was wondering whether you got this working for 2D pixel inputs, i.e. Atari.

If so did you manage to get anywhere close to DMs published results, (I guess they do a lot of model searching/hyper-parameter tuning) ?

All the best, Aj

@EndingCredits
Copy link
Owner

Hi Aj,
I did get it working for an Atari setting (ALE), but I haven't managed to get any good results yet.

Code is a bit of a mess, so will probably tidy it up before sharing.
-Will

@EndingCredits
Copy link
Owner

Update:
You can find the repo here https://github.com/EndingCredits/Neural-Episodic-Control . Only extra thing you'll need to install is ALE I think.

@ghost
Copy link
Author

ghost commented May 26, 2017

Great, thanks very much for your work on it :)

I guess if it does'nt perform SOTA on Atari, (or you can't tune it as well as DM), you'll find some environments where it is strong in - you know the Wolpert and Macready NFL thm,

We have dubbed the associated results NFL theorems because they demonstrate that if an algorithm performs well on a certain class of problems then it necessarily pays for that with degraded performance on the set of all remaining problems.[1]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant