You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Response metrics are very useful for benchmarking performance of different configurations. LMDeploy could implement similar metrics to vLLM's RequestMetrics.
I think adding basic metrics like first token time, finish time, etc. should be pretty straightforward for AsyncEngine from skimming the sourcecode. I'm not sure if there are other areas where changes would be required.
If there is interest, I am happy to make a PR.
The text was updated successfully, but these errors were encountered:
Yes that's exactly what I was referring to (a more comprehensive version with a new endpoint and prometheus integration 😅)!
If @AllentDan is still active I'll monitor progress on that PR :) One suggestion would also be to measure "time to first token" on a generation since it's also a good indicator of the prefill duration!
Motivation
Response metrics are very useful for benchmarking performance of different configurations. LMDeploy could implement similar metrics to vLLM's
RequestMetrics
.I think adding basic metrics like first token time, finish time, etc. should be pretty straightforward for
AsyncEngine
from skimming the sourcecode. I'm not sure if there are other areas where changes would be required.If there is interest, I am happy to make a PR.
The text was updated successfully, but these errors were encountered: