[Feature] Response Metrics #2673

nathan-az · 2024-10-28T12:18:48Z

Motivation

Response metrics are very useful for benchmarking performance of different configurations. LMDeploy could implement similar metrics to vLLM's RequestMetrics.

I think adding basic metrics like first token time, finish time, etc. should be pretty straightforward for AsyncEngine from skimming the sourcecode. I'm not sure if there are other areas where changes would be required.

If there is interest, I am happy to make a PR.

The text was updated successfully, but these errors were encountered:

lvhan028 · 2024-10-28T14:14:27Z

LMDeploy has a related PR: #1423
But we haven't updated it for a long time :(
Is it what you are looking for?

nathan-az · 2024-10-28T22:58:14Z

Yes that's exactly what I was referring to (a more comprehensive version with a new endpoint and prometheus integration 😅)!

If @AllentDan is still active I'll monitor progress on that PR :) One suggestion would also be to measure "time to first token" on a generation since it's also a good indicator of the prefill duration!

lvhan028 assigned AllentDan Oct 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Response Metrics #2673

[Feature] Response Metrics #2673

nathan-az commented Oct 28, 2024 •

edited

Loading

lvhan028 commented Oct 28, 2024

nathan-az commented Oct 28, 2024

[Feature] Response Metrics #2673

[Feature] Response Metrics #2673

Comments

nathan-az commented Oct 28, 2024 • edited Loading

Motivation

lvhan028 commented Oct 28, 2024

nathan-az commented Oct 28, 2024

nathan-az commented Oct 28, 2024 •

edited

Loading