You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks for creating this implementation. I've tried running it in a Google Colab Pro notebook, but the session keeps crashing due to maxing out the RAM. Do you have any sense of how much RAM is needed to run the model? Thanks!
The text was updated successfully, but these errors were encountered:
~40gb vram to load ~45gb to infer full 2048 token length. About the twice the amount in cpu RAM (~81gb) as it currently is intantiated on the cpu and then converted to fp16 and uploaded to VRAM. Dunno if that's intended behaviour, seems like it's supposed to create meta tensors, but thats not actually working.
Thanks for creating this implementation. I've tried running it in a Google Colab Pro notebook, but the session keeps crashing due to maxing out the RAM. Do you have any sense of how much RAM is needed to run the model? Thanks!
The text was updated successfully, but these errors were encountered: