Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Response Metrics #2673

Open
nathan-az opened this issue Oct 28, 2024 · 2 comments
Open

[Feature] Response Metrics #2673

nathan-az opened this issue Oct 28, 2024 · 2 comments
Assignees

Comments

@nathan-az
Copy link

nathan-az commented Oct 28, 2024

Motivation

Response metrics are very useful for benchmarking performance of different configurations. LMDeploy could implement similar metrics to vLLM's RequestMetrics.

I think adding basic metrics like first token time, finish time, etc. should be pretty straightforward for AsyncEngine from skimming the sourcecode. I'm not sure if there are other areas where changes would be required.

If there is interest, I am happy to make a PR.

@lvhan028
Copy link
Collaborator

LMDeploy has a related PR: #1423
But we haven't updated it for a long time :(
Is it what you are looking for?

@nathan-az
Copy link
Author

Yes that's exactly what I was referring to (a more comprehensive version with a new endpoint and prometheus integration 😅)!

If @AllentDan is still active I'll monitor progress on that PR :) One suggestion would also be to measure "time to first token" on a generation since it's also a good indicator of the prefill duration!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants