Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reacher-v1 not training #7

Open
amolchanov86 opened this issue Dec 15, 2016 · 4 comments
Open

Reacher-v1 not training #7

amolchanov86 opened this issue Dec 15, 2016 · 4 comments

Comments

@amolchanov86
Copy link

Hi, I have just tried running Reacher-v1 for 1000000 timesteps with default settings and it didn't learn anything (it just get stuck at -12 test reward), but it looks like you made it running with some settings, what were these settings ?

@rmst
Copy link
Owner

rmst commented Dec 22, 2016

Hey,

sorry for the late reply! The most important setting which was reward normalization is actually hardcoded into filter_env.py for Reacher-v1. The other hyperparameters etc. should be fine. Have you tried multiple times? Are at least the two pendulum tasks working?

Cheers
Simon

@amolchanov86
Copy link
Author

Hi, thanks for the reply !

  • I tried only once. ok, I will rerun it. But the thing is I am experiencing the same problems with my implementation, although, all balancing envs and the hopper worked fine.
  • Another question: did you try to learn some high-dimensional tasks using ddpg?
  • And the last but not least: correct me if I am wrong, but you haven't tried prioritized experience replay, yet ? Because it is a bit confusing that PER is mentioned under "Improvements beyond the original paper", but from "replay_memory.py" it seems that replay buffer is just randomly sampled.
    Thanks a lot !

@rmst
Copy link
Owner

rmst commented Jan 6, 2017

Hey, sry for the late reply.

I never got Reacher-v1 to "solve" but it was close (like you can see in the gif in the readme). For my evaluations I used the commit before "fixes in replay memory" but actually I don't believe the performance got worse after that commit. I don't use prioritized experience replay. The list of improvements are only a roadmap. I haven't had time to work on that so far and now it actually doesn't seem like such a big improvement compared to other things like auxiliary tasks in a3c and so on. Maybe I will release a new tensorflow deep RL repo though where we can include it.

Ah and no I didn't use it with convolutional nets on pixels yet. But that should also come soon (in the new repo though).

Cheers

@amolchanov86
Copy link
Author

Hi thanks for the help !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants