-
Notifications
You must be signed in to change notification settings - Fork 725
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Possibility to resume training #692
Comments
I am not sure if I follow here. Yes, you can save models at any point of training (via callbacks), load models and resume training, as shown in this example. |
So I figured out that with the callback function we can save the model parameters, I was just not completely sure if the training will resume or continue completely similar to if it wouldn't have been stopped in the first place. I resume training model which was saved with the "save" function, like this:
Does this continue the training exacltly as if it was not stopped in the first place at that point when it was saved? thanks. |
Ah yes, this is a valid question. Answer is no, no it does not continue exactly as without saving and loading. Most notably, optimizer parameters are not stored along the model, and schedulers for learning rates and such start from zero again upon new call to As for Tensorboard being updated: I have not tried this, but others seem to have issues with (e.g. #599). I am not sure how the code is supposed to function in this case when you re-use the same name. |
I think the only way is saving the complete model object perhaps right? then probably with some changes in the "learn" function, one can resume from a previously learning process. Do you think this could be a pull request to make? Edit: |
It would not be as easy, as models contain a bunch of un-pickable objects and Tensorflow variables are not included in the pickling process by default. Also, as mentioned earlier, the We could design next version of stable-baselines to support this "continue as if it was never stopped" behavior, where it should be easier with eager-type of computations and graphs. Edit: Yup, pickling or serialization in general picks specific variables to store for this reason. |
Related #301 |
This example maybe help you |
closing this one in favor of #301 |
Is there a way to resume training, for example if our PC crashes or we face memory issues?
Could saving the "model" object as a pickle file in every step and using the "learn" function a way to resume a training? (if so, can I make a pull request for it)
The text was updated successfully, but these errors were encountered: