Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overlap between first evaluation episode and "min_steps_before_learning" in SAC #59

Open
pvdsp opened this issue Sep 2, 2020 · 0 comments

Comments

@pvdsp
Copy link

pvdsp commented Sep 2, 2020

This might be a minor issue, but I think there is a conflict between the fact that the first global episode is regarded as an evaluation episode, and min_steps_before_learning in the hyperparameters (in SAC). If the global step count is smaller than min_steps_before_learning, we should perform random actions to improve exploration at the start of training. Because of the fact that the first episode is an evaluation episode, the first batch of global steps is used for evaluation of the model instead of exploration. Because of this, the actual number of initial random steps that are used for exploration are drastically decreased (to min_steps_before_learning, minus the amount of steps needed in the first evaluation episode).

This issue can easily be solved in two ways: we can keep the first episode as an evaluation episode and delay the start of random steps to the start of the next episode, or we can use the first min_steps_before_learning for exploration and delay the start of the evaluation episode to the first episode with a first step larger than min_steps_before_learning. I would suggest the former solution. If you want (and if you agree that this overlap is an issue), I can make a pull request where this overlap between exploration and evaluation is solved, so that we can guarantee that the first min_steps_before_learning steps of training are used for random steps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant