You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
TensorRT-LLM has better kernel optimization, better model ecosystem support, and more quantization algorithm support compared to other large model inference libraries. However, during use, it is not easy to demonstrate its advantages in throughput scenarios(#73#1097#819#965#1255). The current benchmark is limited to designated parameters( for instances, batch sizes) that is not very close to reality.
TensorRT-LLM has better kernel optimization, better model ecosystem support, and more quantization algorithm support compared to other large model inference libraries. However, during use, it is not easy to demonstrate its advantages in throughput scenarios(#73 #1097 #819 #965 #1255). The current benchmark is limited to designated parameters( for instances, batch sizes) that is not very close to reality.
Is it possible to replace the benchmark method with vLLM's type of benchmark_throughput, and improve throughput in this scenario? #632
The text was updated successfully, but these errors were encountered: