How to run Atari environments? #1

ghost · 2017-04-27T10:02:34Z

this is really cool that you got the NEC working 👍

Have you tried to run your code on the Atari environments, in Open AI gym?

I tried to train on Pong, but I got this error,

(tf_py2_tf_1_0) ajay@ajay-h8-1170uk:~/PythonProjects/nn_q_learning_tensorflow-master$ python main.py --env PongDeterministic-v3
Namespace(EWC=0.0, EWC_decay=0.999, batch_size=4, beta=0, chk_dir=None, chk_name='model', discount=0.9, display_step=2500, double_q=1, env='PongDeterministic-v3', epsilon=0.1, epsilon_anneal=500000, epsilon_final=0.1, layer_sizes=[20], learning_rate=0.001, memory_size=1000, play_from=None, reg=0, resume_from=None, target_step=1000, training_iters=500000, use_target=True)
2017-04-27 10:54:06.210986: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-04-27 10:54:06.211009: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-04-27 10:54:06.211018: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
[2017-04-27 10:54:06,211] Making new env: PongDeterministic-v3
WARNING:tensorflow:From main.py:314: initialize_all_variables (from tensorflow.python.ops.variables) is deprecated and will be removed after 2017-03-02.
Instructions for updating:
Use `tf.global_variables_initializer` instead.
[2017-04-27 10:54:06,571] From main.py:314: initialize_all_variables (from tensorflow.python.ops.variables) is deprecated and will be removed after 2017-03-02.
Instructions for updating:
Use `tf.global_variables_initializer` instead.
  0%|                                      | 0/500000 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "main.py", line 470, in <module>
    tf.app.run()
  File "/home/ajay/anaconda3/envs/tf_py2_tf_1_0/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 48, in run
    _sys.exit(main(_sys.argv[:1] + flags_passthrough))
  File "main.py", line 326, in main
    act, q_ = agent.predict(state)
  File "main.py", line 101, in predict
    q = self.session.run(self.pred_q, feed_dict={self.state: [state]})
  File "/home/ajay/anaconda3/envs/tf_py2_tf_1_0/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 778, in run
    run_metadata_ptr)
  File "/home/ajay/anaconda3/envs/tf_py2_tf_1_0/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 961, in _run
    % (np_val.shape, subfeed_t.name, str(subfeed_t.get_shape())))
ValueError: Cannot feed value of shape (1, 210, 160, 3) for Tensor u'Placeholder:0', which has shape '(?, 210)'

I guess it might be related to TF v1.0, does this repo use an earlier version?

Thank a lot for your help,

Aj

The text was updated successfully, but these errors were encountered:

EndingCredits · 2017-04-27T10:18:55Z

Hi Aj,
Unfortunately this code is only set up for environments where the input is a 1-dimensional vector. It isn't too hard to adapt it to image observations, although getting it so inputs are the last 4 frames is a bit of a pain. (Yes, I use an earlier version of TF, but that isn't the issue here, although it may cause problems elsewhere.)

Do the implementations run for you if you run Cartpole-v0? It would be nice to know that this works on other machines.

Also, main.py runs DQN (with some extras), a2c.py runs an actor-advantage-critic algorithm (using replay mem rather than distributed rep) and NEC.py runs the NEC agent.

-Will

EDIT: Just remembered that a2c.py should be set up to work with Atari envs. Have a look at that if you want to look at adapting the others.

ghost · 2017-04-27T10:52:03Z

Hi Will,

thanks a lot for the help 👍 - I'm just working through your code and the paper now.

Cartpole seems to work well - I haven’t checked against my A3C implementations, but from memory I think it looks better,

11:28:21,  450000/500000it | avg_r: 1.000, avg_q: 9.169, avr_ep_r: 113.7, max_ep_r: 127.0, num_eps: 22, epsilon: 0.100, ewc:  0.0
11:28:28,  452500/500000it | avg_r: 1.000, avg_q: 9.258, avr_ep_r: 104.6, max_ep_r: 137.0, num_eps: 24, epsilon: 0.100, ewc:  0.0
11:28:34,  455000/500000it | avg_r: 1.000, avg_q: 9.130, avr_ep_r: 118.7, max_ep_r: 200.0, num_eps: 21, epsilon: 0.100, ewc:  0.0
11:28:40,  457500/500000it | avg_r: 1.000, avg_q: 9.561, avr_ep_r: 110.0, max_ep_r: 200.0, num_eps: 23, epsilon: 0.100, ewc:  0.0
11:28:46,  460000/500000it | avg_r: 1.000, avg_q: 9.550, avr_ep_r: 89.7, max_ep_r: 200.0, num_eps: 28, epsilon: 0.100, ewc:  0.0
11:28:53,  462500/500000it | avg_r: 1.000, avg_q: 9.625, avr_ep_r: 133.0, max_ep_r: 200.0, num_eps: 19, epsilon: 0.100, ewc:  0.0
11:28:59,  465000/500000it | avg_r: 1.000, avg_q: 9.554, avr_ep_r: 113.6, max_ep_r: 149.0, num_eps: 22, epsilon: 0.100, ewc:  0.0
11:29:05,  467500/500000it | avg_r: 1.000, avg_q: 9.576, avr_ep_r: 130.9, max_ep_r: 200.0, num_eps: 19, epsilon: 0.100, ewc:  0.0
11:29:12,  470000/500000it | avg_r: 1.000, avg_q: 9.381, avr_ep_r: 126.8, max_ep_r: 169.0, num_eps: 19, epsilon: 0.100, ewc:  0.0
11:29:18,  472500/500000it | avg_r: 1.000, avg_q: 9.605, avr_ep_r: 137.9, max_ep_r: 200.0, num_eps: 18, epsilon: 0.100, ewc:  0.0
11:29:24,  475000/500000it | avg_r: 1.000, avg_q: 9.462, avr_ep_r: 136.7, max_ep_r: 200.0, num_eps: 19, epsilon: 0.100, ewc:  0.0
11:29:31,  477500/500000it | avg_r: 1.000, avg_q: 9.304, avr_ep_r: 118.1, max_ep_r: 142.0, num_eps: 21, epsilon: 0.100, ewc:  0.0
11:29:37,  480000/500000it | avg_r: 1.000, avg_q: 9.319, avr_ep_r: 98.4, max_ep_r: 121.0, num_eps: 26, epsilon: 0.100, ewc:  0.0
11:29:43,  482500/500000it | avg_r: 1.000, avg_q: 9.044, avr_ep_r: 119.0, max_ep_r: 200.0, num_eps: 21, epsilon: 0.100, ewc:  0.0
11:29:50,  485000/500000it | avg_r: 1.000, avg_q: 9.231, avr_ep_r: 109.2, max_ep_r: 158.0, num_eps: 22, epsilon: 0.100, ewc:  0.0
11:29:56,  487500/500000it | avg_r: 1.000, avg_q: 9.153, avr_ep_r: 113.5, max_ep_r: 188.0, num_eps: 22, epsilon: 0.100, ewc:  0.0
11:30:02,  490000/500000it | avg_r: 1.000, avg_q: 9.372, avr_ep_r: 124.8, max_ep_r: 200.0, num_eps: 20, epsilon: 0.100, ewc:  0.0
11:30:08,  492500/500000it | avg_r: 1.000, avg_q: 9.031, avr_ep_r: 144.8, max_ep_r: 200.0, num_eps: 17, epsilon: 0.100, ewc:  0.0
11:30:15,  495000/500000it | avg_r: 1.000, avg_q: 9.136, avr_ep_r: 98.4, max_ep_r: 200.0, num_eps: 26, epsilon: 0.100, ewc:  0.0
11:30:21,  497500/500000it | avg_r: 1.000, avg_q: 9.100, avr_ep_r: 149.2, max_ep_r: 200.0, num_eps: 17, epsilon: 0.100, ewc:  0.0
100%|████████████████████████| 500000/500000 [20:46<00:00, 401.28it/s]

I'll try to get it working for the Atari envs too :) If you're interested there's a fairly clean implement in PyTorch

Looks like a fun project 👍

All the best - Aj

PS - I've only read the paper quickly, but it seems there's no need for the actor-critic stuff in a2c?

ghost · 2017-05-20T14:11:17Z

Hi Will,

I was wondering whether you got this working for 2D pixel inputs, i.e. Atari.

If so did you manage to get anywhere close to DMs published results, (I guess they do a lot of model searching/hyper-parameter tuning) ?

All the best, Aj

EndingCredits · 2017-05-22T09:11:42Z

Hi Aj,
I did get it working for an Atari setting (ALE), but I haven't managed to get any good results yet.

Code is a bit of a mess, so will probably tidy it up before sharing.
-Will

EndingCredits · 2017-05-26T14:07:44Z

Update:
You can find the repo here https://github.com/EndingCredits/Neural-Episodic-Control . Only extra thing you'll need to install is ALE I think.

ghost · 2017-05-26T14:22:30Z

Great, thanks very much for your work on it :)

I guess if it does'nt perform SOTA on Atari, (or you can't tune it as well as DM), you'll find some environments where it is strong in - you know the Wolpert and Macready NFL thm,

We have dubbed the associated results NFL theorems because they demonstrate that if an algorithm performs well on a certain class of problems then it necessarily pays for that with degraded performance on the set of all remaining problems.[1]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to run Atari environments? #1

How to run Atari environments? #1

ghost commented Apr 27, 2017

EndingCredits commented Apr 27, 2017 •

edited

Loading

ghost commented Apr 27, 2017

ghost commented May 20, 2017

EndingCredits commented May 22, 2017

EndingCredits commented May 26, 2017

ghost commented May 26, 2017

How to run Atari environments? #1

How to run Atari environments? #1

Comments

ghost commented Apr 27, 2017

EndingCredits commented Apr 27, 2017 • edited Loading

ghost commented Apr 27, 2017

ghost commented May 20, 2017

EndingCredits commented May 22, 2017

EndingCredits commented May 26, 2017

ghost commented May 26, 2017

EndingCredits commented Apr 27, 2017 •

edited

Loading