[Feature Request] More realistic benchmark and throughput optimization #1292

tp-nan · 2024-03-13T02:51:32Z

TensorRT-LLM has better kernel optimization, better model ecosystem support, and more quantization algorithm support compared to other large model inference libraries. However, during use, it is not easy to demonstrate its advantages in throughput scenarios(#73 #1097 #819 #965 #1255). The current benchmark is limited to designated parameters( for instances, batch sizes) that is not very close to reality.

Is it possible to replace the benchmark method with vLLM's type of benchmark_throughput, and improve throughput in this scenario? #632

Missmiaom · 2024-05-02T03:25:39Z

Any updates?

github-actions · 2024-06-08T01:50:44Z

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 15 days."

github-actions · 2024-06-26T01:51:37Z

This issue was closed because it has been stalled for 15 days with no activity.

byshiue assigned ncomly-nvidia and juney-nvidia Mar 14, 2024

byshiue added the feature request New feature or request label Mar 14, 2024

github-actions bot added the stale label Jun 8, 2024

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Jun 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] More realistic benchmark and throughput optimization #1292

[Feature Request] More realistic benchmark and throughput optimization #1292

tp-nan commented Mar 13, 2024 •

edited

Loading

Missmiaom commented May 2, 2024

github-actions bot commented Jun 8, 2024

github-actions bot commented Jun 26, 2024

[Feature Request] More realistic benchmark and throughput optimization #1292

[Feature Request] More realistic benchmark and throughput optimization #1292

Comments

tp-nan commented Mar 13, 2024 • edited Loading

Missmiaom commented May 2, 2024

github-actions bot commented Jun 8, 2024

github-actions bot commented Jun 26, 2024

tp-nan commented Mar 13, 2024 •

edited

Loading