You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Right now it seems like generate.py using a lot of cuda memory during inference.
For example. I trained a small 2 layer 150 hidden sized GRU network on the Shakespeare corpus.
When it comes time to generate text, I feed it a largish --predict_len and my GTX-1070 (8GB) ends up running out of memory. (For example --predict_len 5000 dies due to lack of GPU memory)
I would think it should be possible to operate the inference of this model by feeding it one character at a time (basically the last predicted character) and then it should only be doing a forward pass through the network. As it is now, it seems like each forward pass of inference is allocating some bit of CPU memory that is not being freed or reused. Thus on a prediction run of 5000 characters it's allocating something close to the size of the model (or perhaps 2x the model size) for each character of inference.
Question: is this a bug, or a necessary condition of this algorithm? (I don't seem to recall having this same problem with the original torch lua implementation.)
Is there a way to adjust this code to essentially do infinite inference (basically just keep generating text until told to stop)
The text was updated successfully, but these errors were encountered:
Right now it seems like generate.py using a lot of cuda memory during inference.
For example. I trained a small 2 layer 150 hidden sized GRU network on the Shakespeare corpus.
When it comes time to generate text, I feed it a largish --predict_len and my GTX-1070 (8GB) ends up running out of memory. (For example --predict_len 5000 dies due to lack of GPU memory)
I would think it should be possible to operate the inference of this model by feeding it one character at a time (basically the last predicted character) and then it should only be doing a forward pass through the network. As it is now, it seems like each forward pass of inference is allocating some bit of CPU memory that is not being freed or reused. Thus on a prediction run of 5000 characters it's allocating something close to the size of the model (or perhaps 2x the model size) for each character of inference.
Question: is this a bug, or a necessary condition of this algorithm? (I don't seem to recall having this same problem with the original torch lua implementation.)
Is there a way to adjust this code to essentially do infinite inference (basically just keep generating text until told to stop)
The text was updated successfully, but these errors were encountered: