Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model is twice as large on first load #1787

Open
winstxnhdw opened this issue Sep 23, 2024 · 0 comments
Open

Model is twice as large on first load #1787

winstxnhdw opened this issue Sep 23, 2024 · 0 comments

Comments

@winstxnhdw
Copy link

winstxnhdw commented Sep 23, 2024

Hey, I am trying to load an 8b int8 model on my device with 16 GB of RAM. The model should only be taking slightly over 8 GB of memory but it seems that the during the loading process, the model is being copied, which doubles the memory usage to over 16 GB and causing an OOM.

Is it not possible to stream the model instead?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant