You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I try to inference qwen2-vl 7B with batch = 2 on 4090 (24G), and I got a oom. How to avoid the oom? In my opinion,compared to 4090(24G), batch = 2 is not too large.
I found that the llm engine is 15G, but after build ModelRunnerCpp,memory increases by almost 20G. Is it correct?
The text was updated successfully, but these errors were encountered:
Hi @YSF-A . pls use the latest code and you can run it with INT4/INT8/FP8 referring to examples/qwen. Pls let me know if you have any other questions about it.
I try to inference qwen2-vl 7B with batch = 2 on 4090 (24G), and I got a oom. How to avoid the oom? In my opinion,compared to 4090(24G), batch = 2 is not too large.
I found that the llm engine is 15G, but after build ModelRunnerCpp,memory increases by almost 20G. Is it correct?
The text was updated successfully, but these errors were encountered: