Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] More realistic benchmark and throughput optimization #1292

Closed
tp-nan opened this issue Mar 13, 2024 · 3 comments
Closed

[Feature Request] More realistic benchmark and throughput optimization #1292

tp-nan opened this issue Mar 13, 2024 · 3 comments
Assignees
Labels
feature request New feature or request stale

Comments

@tp-nan
Copy link

tp-nan commented Mar 13, 2024

TensorRT-LLM has better kernel optimization, better model ecosystem support, and more quantization algorithm support compared to other large model inference libraries. However, during use, it is not easy to demonstrate its advantages in throughput scenarios(#73 #1097 #819 #965 #1255). The current benchmark is limited to designated parameters( for instances, batch sizes) that is not very close to reality.

Is it possible to replace the benchmark method with vLLM's type of benchmark_throughput, and improve throughput in this scenario? #632

@byshiue byshiue added the feature request New feature or request label Mar 14, 2024
@Missmiaom
Copy link

Any updates?

Copy link

github-actions bot commented Jun 8, 2024

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 15 days."

@github-actions github-actions bot added the stale label Jun 8, 2024
Copy link

This issue was closed because it has been stalled for 15 days with no activity.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Jun 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request stale
Projects
None yet
Development

No branches or pull requests

5 participants