feat: Report more histogram metrics #61

yinggeh · 2024-08-20T20:27:32Z

What does the PR do?

Report more histogram metrics from vLLM to Triton metrics server.

# Histogram of end to end request latency in seconds.
histogram_e2e_time_request

# Number of prefill tokens processed.
histogram_num_prompt_tokens_request

# Number of generation tokens processed.
histogram_num_generation_tokens_request

# Histogram of the best_of request parameter.
histogram_best_of_request

# Histogram of the n request parameter.
histogram_n_request

Checklist

Commit Type:

Check the conventional commit type
box here and add the label to the github PR.

feat

Related PRs:

n/a

Where should the reviewer start?

n/a

Test plan:

ci/L0_backend_vllm/metrics_test

CI Pipeline ID:
17681888

Caveats:

n/a

Background

Customers requested additional histogram metrics from vLLM.

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

n/a

ci/L0_backend_vllm/metrics_test/vllm_metrics_test.py

src/utils/metrics.py

oandreeva-nv · 2024-08-20T21:56:15Z

src/utils/metrics.py

@@ -172,3 +228,21 @@ def log(self, stats: VllmStats) -> None:
            self.metrics.histogram_time_per_output_token,
            stats.time_per_output_tokens_iter,
        )
+        # Request stats


I would suggest to refactor this as follows:

triton_metrics = [ self.metrics.histogram_time_to_first_token, ... ] vllm_stats = [ stats.time_to_first_tokens_iter, ... ] for tuple in zip(triton_metrics, vllm_stats): self._log_histogram(tuple)

This way it is more extendable in my mind. Triton metrics are all defined in one place, as well as vLLM's stats.
Potentially, maybe there's a way to extract all Triton metrics and vLLM stats without explicitly composing lists. I would recommend checking

More on zip: https://realpython.com/python-zip-function/

rmccorm4 · 2024-08-22T22:42:13Z

ci/L0_backend_vllm/metrics_test/vllm_metrics_test.py

+        self.assertEqual(
+            metrics_dict["vllm:e2e_request_latency_seconds_count"], total_prompts
+        )


nit: I think it would be easier to read/review/maintain this function if we could group the common parts:

histogram_metrics = ["vllm:e2e_request_latency_seconds", "..."] for metric in metrics: # Expect exactly one observation and bucket per prompt self.assertEqual(f"{metric}_count", total_prompts) self.assertEqual(f"{metric}_bucket", total_prompts) # Compare the exact expected sum where it makes sense, otherwise assert non-zero if metric.endswith("_best_of"): self.assertEqual(f"{metric}_sum", best_of*total_prompts) elif metric.endswith("_n"): self.assertEqual(f"{metric}_sum", n*total_prompts) else: self.assertGreater(f"{metric}_sum", 0)

for as many of the metrics that make sense to follow the same pattern.

We can have separate/special cases for the metrics that don't fit this pattern.

Feel free to modify or change if any of the above is incorrect, just an example.

Thanks for your input. I personally don't think it's maintainable to add new metric tests since there are a lot of special cases. Counter example

# vllm:time_per_output_token_seconds self.assertEqual(metrics_dict["vllm:time_per_output_token_seconds_count"], 45) # This line is fine self.assertGreater(metrics_dict["vllm:time_per_output_token_seconds_sum"], 0) self.assertEqual(metrics_dict["vllm:time_per_output_token_seconds_bucket"], 45)

rmccorm4

LGTM overall, just a nit on cleaning up the testing if you agree with the comment

Add new vLLM histogram metrics

94426cc

yinggeh added the enhancement New feature or request label Aug 20, 2024

yinggeh requested review from rmccorm4, GuanLuo and oandreeva-nv August 20, 2024 20:27

yinggeh self-assigned this Aug 20, 2024

Update test

e045668

yinggeh commented Aug 20, 2024

View reviewed changes

ci/L0_backend_vllm/metrics_test/vllm_metrics_test.py Outdated Show resolved Hide resolved

oandreeva-nv reviewed Aug 20, 2024

View reviewed changes

src/utils/metrics.py Show resolved Hide resolved

oandreeva-nv reviewed Aug 20, 2024

View reviewed changes

Minor updates

2394079

yinggeh force-pushed the yinggeh-DLIS-7229-more-vllm-histogram-metrcs branch from fca7a71 to 2394079 Compare August 21, 2024 21:50

yinggeh requested a review from oandreeva-nv August 21, 2024 21:51

yinggeh added 2 commits August 21, 2024 15:45

Make sampling parameters more comprehensive for testing metrics.

0fa015a

Add sampling parameters link

e60d22e

rmccorm4 reviewed Aug 22, 2024

View reviewed changes

rmccorm4 approved these changes Aug 23, 2024

View reviewed changes

yinggeh merged commit 98947a7 into main Aug 24, 2024
3 checks passed

yinggeh deleted the yinggeh-DLIS-7229-more-vllm-histogram-metrcs branch September 18, 2024 12:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Report more histogram metrics #61

feat: Report more histogram metrics #61

yinggeh commented Aug 20, 2024 •

edited

Loading

oandreeva-nv Aug 20, 2024

oandreeva-nv Aug 20, 2024

yinggeh Aug 21, 2024

rmccorm4 Aug 22, 2024 •

edited

Loading

yinggeh Aug 22, 2024

rmccorm4 left a comment

feat: Report more histogram metrics #61

feat: Report more histogram metrics #61

Conversation

yinggeh commented Aug 20, 2024 • edited Loading

What does the PR do?

Checklist

Commit Type:

Related PRs:

Where should the reviewer start?

Test plan:

Caveats:

Background

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

oandreeva-nv Aug 20, 2024

Choose a reason for hiding this comment

oandreeva-nv Aug 20, 2024

Choose a reason for hiding this comment

yinggeh Aug 21, 2024

Choose a reason for hiding this comment

rmccorm4 Aug 22, 2024 • edited Loading

Choose a reason for hiding this comment

yinggeh Aug 22, 2024

Choose a reason for hiding this comment

rmccorm4 left a comment

Choose a reason for hiding this comment

yinggeh commented Aug 20, 2024 •

edited

Loading

rmccorm4 Aug 22, 2024 •

edited

Loading