You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
Hi,
i run deepspeed inference for llama3.1 70b for 2 node, each node with 2 gpu, each gpu with 24GB vram.
it slowly loading in node 1 but fast load in node 2 and oom. what is problem?
To Reproduce
run below code with following command on node one only.
Describe the bug
Hi,
i run deepspeed inference for llama3.1 70b for 2 node, each node with 2 gpu, each gpu with 24GB vram.
it slowly loading in node 1 but fast load in node 2 and oom. what is problem?
To Reproduce
run below code with following command on node one only.
Expected behavior
expect to distributed load complete llama 3.1 70B in vram of 2 node and run inference.
ds_report output
System info (please complete the following information):
The text was updated successfully, but these errors were encountered: