-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Question] Continued Training in Stable Baseline 3 #597
Comments
Hello, the learning in your case fail for a very special reason: For this special case, a simple fix is to replace
by EDIT: there is probably a simpler fix: model._last_obs = None
model.learn(total_timesteps=200, log_interval=4, reset_num_timesteps=False) I think we can do that internally in SB3 when the user call |
An additional note, for DQN EDIT: and settting periodically the seed is also different than training for a long period as here the number of steps per episode vary (there is a max limit but the number of steps is not constant) |
Thanks @araffin, Sorry for the delay in replying, I've been working on other projects for the past few days. I tested the following change as recommended: model._last_obs = None
model.learn (total_timesteps = 200, log_interval = 4, reset_num_timesteps = False) It works (tested for CartPole-v0 environment). In my scenario I haven't been successful yet (it still behaves randomly when performing continued training), my environment differs from CartPole-v0, as each episode has this value of 200 steps fixed, that is, the environment always only returns done after the 200 episodes. I will continue exploring! |
Question
The agent does not demonstrate to be learning over time by following a continuing training model. Would it be a problem with this specific training model or with sb3?
More details
I have a custom env that connects to a network simulator, where the size of each simulation is fixed at 500 timesteps, and after this number of steps, I must end the training, save it to restart the simulator, and then restart the training. Using DQN in my env I realized that the agent didn't learn over time, so I decided to test the training model in a gym environment in order to validate the idea of continued training, but I got the same problem.
Code
Continued training:
Non-Continued training:
Checklist
I saw [TO INVESTIGATE] Potential Performance Drop after loading SAC/TQC model #435, [Question] Resume of training from saved model does not give similar result #326, [question] retrain model after loading checkpoint #51, Does using model.save() and then using model.load() resume training exactly from the point where it was left? #29 but I couldn't answer the question.
The text was updated successfully, but these errors were encountered: