Reacher-v1 not training #7

amolchanov86 · 2016-12-15T18:58:47Z

Hi, I have just tried running Reacher-v1 for 1000000 timesteps with default settings and it didn't learn anything (it just get stuck at -12 test reward), but it looks like you made it running with some settings, what were these settings ?

rmst · 2016-12-22T03:41:30Z

Hey,

sorry for the late reply! The most important setting which was reward normalization is actually hardcoded into filter_env.py for Reacher-v1. The other hyperparameters etc. should be fine. Have you tried multiple times? Are at least the two pendulum tasks working?

Cheers
Simon

amolchanov86 · 2016-12-23T01:27:04Z

Hi, thanks for the reply !

I tried only once. ok, I will rerun it. But the thing is I am experiencing the same problems with my implementation, although, all balancing envs and the hopper worked fine.
Another question: did you try to learn some high-dimensional tasks using ddpg?
And the last but not least: correct me if I am wrong, but you haven't tried prioritized experience replay, yet ? Because it is a bit confusing that PER is mentioned under "Improvements beyond the original paper", but from "replay_memory.py" it seems that replay buffer is just randomly sampled.
Thanks a lot !

rmst · 2017-01-06T19:05:32Z

Hey, sry for the late reply.

I never got Reacher-v1 to "solve" but it was close (like you can see in the gif in the readme). For my evaluations I used the commit before "fixes in replay memory" but actually I don't believe the performance got worse after that commit. I don't use prioritized experience replay. The list of improvements are only a roadmap. I haven't had time to work on that so far and now it actually doesn't seem like such a big improvement compared to other things like auxiliary tasks in a3c and so on. Maybe I will release a new tensorflow deep RL repo though where we can include it.

Ah and no I didn't use it with convolutional nets on pixels yet. But that should also come soon (in the new repo though).

Cheers

amolchanov86 · 2017-01-08T00:21:42Z

Hi thanks for the help !

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reacher-v1 not training #7

Reacher-v1 not training #7

amolchanov86 commented Dec 15, 2016

rmst commented Dec 22, 2016

amolchanov86 commented Dec 23, 2016

rmst commented Jan 6, 2017 •

edited

Loading

amolchanov86 commented Jan 8, 2017

Reacher-v1 not training #7

Reacher-v1 not training #7

Comments

amolchanov86 commented Dec 15, 2016

rmst commented Dec 22, 2016

amolchanov86 commented Dec 23, 2016

rmst commented Jan 6, 2017 • edited Loading

amolchanov86 commented Jan 8, 2017

rmst commented Jan 6, 2017 •

edited

Loading