Generation speed suddenly became very slow, what might be the cause? #8886

zhuwenfei-wintech · 2024-09-27T05:20:43Z

zhuwenfei-wintech
Sep 27, 2024

The models I deployed suddenly became very slow without any indication, generation throughput around 1 token/s.
Here is the detail:
I deployed Qwen2-72B-Instruct-GPTQ-Int4 with vLLM v0.5.1. The deployment was done via docker compose and official docker image v0.5.1.
I deployed the same model with the same settings on two machines. One machine for production with 4x4090 GPU, 256G memory, Xeon 8352V 36 cores. One machine for development with 2xH100 GPU (only one was used for the model), 256G memory, AMD EPYC 9354 32-Core.
Both models on these two machines have been running for over one month without any problem but suddenly they both (not at the same time, but within 24 hours) became very slow. No error log, vRAM was OK, GPU utilization was OK, cpu, memory, everything was OK. I couldn't find a cause.
Here is part of the log, you can see the generation throughput suddenly decrease.

After a clear restart of the containers, the problem seems gone so far.
Any idea what might be the cause?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generation speed suddenly became very slow, what might be the cause? #8886

{{title}}

Replies: 0 comments

Select a reply

Generation speed suddenly became very slow, what might be the cause? #8886

zhuwenfei-wintech Sep 27, 2024

Replies: 0 comments

zhuwenfei-wintech
Sep 27, 2024