-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Report more histogram metrics #61
Conversation
src/utils/metrics.py
Outdated
@@ -172,3 +228,21 @@ def log(self, stats: VllmStats) -> None: | |||
self.metrics.histogram_time_per_output_token, | |||
stats.time_per_output_tokens_iter, | |||
) | |||
# Request stats |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest to refactor this as follows:
triton_metrics = [ self.metrics.histogram_time_to_first_token, ... ]
vllm_stats = [ stats.time_to_first_tokens_iter, ... ]
for tuple in zip(triton_metrics, vllm_stats):
self._log_histogram(tuple)
This way it is more extendable in my mind. Triton metrics are all defined in one place, as well as vLLM's stats.
Potentially, maybe there's a way to extract all Triton metrics and vLLM stats without explicitly composing lists. I would recommend checking
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More on zip
: https://realpython.com/python-zip-function/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed
fca7a71
to
2394079
Compare
self.assertEqual( | ||
metrics_dict["vllm:e2e_request_latency_seconds_count"], total_prompts | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I think it would be easier to read/review/maintain this function if we could group the common parts:
histogram_metrics = ["vllm:e2e_request_latency_seconds", "..."]
for metric in metrics:
# Expect exactly one observation and bucket per prompt
self.assertEqual(f"{metric}_count", total_prompts)
self.assertEqual(f"{metric}_bucket", total_prompts)
# Compare the exact expected sum where it makes sense, otherwise assert non-zero
if metric.endswith("_best_of"):
self.assertEqual(f"{metric}_sum", best_of*total_prompts)
elif metric.endswith("_n"):
self.assertEqual(f"{metric}_sum", n*total_prompts)
else:
self.assertGreater(f"{metric}_sum", 0)
for as many of the metrics that make sense to follow the same pattern.
We can have separate/special cases for the metrics that don't fit this pattern.
Feel free to modify or change if any of the above is incorrect, just an example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your input. I personally don't think it's maintainable to add new metric tests since there are a lot of special cases. Counter example
# vllm:time_per_output_token_seconds
self.assertEqual(metrics_dict["vllm:time_per_output_token_seconds_count"], 45)
# This line is fine
self.assertGreater(metrics_dict["vllm:time_per_output_token_seconds_sum"], 0)
self.assertEqual(metrics_dict["vllm:time_per_output_token_seconds_bucket"], 45)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM overall, just a nit on cleaning up the testing if you agree with the comment
What does the PR do?
Report more histogram metrics from vLLM to Triton metrics server.
Checklist
<commit_type>: <Title>
Commit Type:
Check the conventional commit type
box here and add the label to the github PR.
Related PRs:
n/a
Where should the reviewer start?
n/a
Test plan:
ci/L0_backend_vllm/metrics_test
17681888
Caveats:
n/a
Background
Customers requested additional histogram metrics from vLLM.
Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)
n/a