-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Recurrently running deepspeed ends up overflowing GPU memory #4222
Comments
@shankarp8 If you run this loop without DeepSpeed do you not see the issue? |
Well I can't fine-tune Llama-7B on my GPUs (A40s with 48GB RAM) without deepspeed, so I replaced it with GPT2-XL for now. To be precise, the memory at the start of each loop (outputted by the print statement in my code there) is 6GB -> 12GB -> 12GB -> 12GB -> 12GB ... -> 12GB. By contrast, the GPU memory for each of the ten loops when using deepspeed with GPT2-XL is 6GB -> 12GB -> 15GB -> 18GB -> 21GB -> 24GB -> 27GB -> 30GB -> 33GB. Noticeably, when using deepspeed it seems that 3GB (or, rather, 1/2 of the model size - for Llama-7B it would be 14GB) extra is left on the GPUs each time. |
@shankarp8 Can you try with this branch #4383 ? |
Hi, I tried using that branch and model_engine.destroy() at the end of every loop (let me know if that is what I was supposed to do), and unfortunately it still seems to be having the same issue. |
After further investigation it looks like we won't be able to clear everything off the GPU by destroying the ZeRO optimizers, but that is the best we can do at the moment. |
I am conducting research on model editing. Basically, I apply different editing methods to edit a transformer model once on one sample in my dataset, then revert it back to the original (using a deepcopy of the original model) and edit it again on a different sample. Each time, I train the model using deep speed's zero optimizer stage 2. Therefore, this is a different use case than the majority of uses of deepspeed, which only need to train the model once and perform inference once in a given process.
The issue appears to be that deepspeed leaves some residual memory on the GPUs, so every time I attempt to edit it again, there is more and more memory on the GPU until it runs out of memory. I have tried deleting the model_engine each time, clearing torch's CUDA cache, and using python garbage collector but none of these work.
To Reproduce
Any simple training loop using deepspeed should be sufficient to reproduce this error. For me, I used Llama-7B:
Expected behavior
After each loop, some memory is left on the GPU, eventually causing it to run out of memory. The 'GPU MEMORY USAGE AT STEP i' print statement should make this clear.
System info (please complete the following information):
Launcher context
I run with the python launcher (python program.py). The deepspeed launcher appears to automatically place memory on GPUs and does not allow me to withhold a few visible GPUs to use for other parts of my script (this is necessary).
The text was updated successfully, but these errors were encountered: