Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Report more histogram metrics #61

Merged
merged 5 commits into from
Aug 24, 2024

Conversation

yinggeh
Copy link
Contributor

@yinggeh yinggeh commented Aug 20, 2024

What does the PR do?

Report more histogram metrics from vLLM to Triton metrics server.

# Histogram of end to end request latency in seconds.
histogram_e2e_time_request

# Number of prefill tokens processed.
histogram_num_prompt_tokens_request

# Number of generation tokens processed.
histogram_num_generation_tokens_request

# Histogram of the best_of request parameter.
histogram_best_of_request

# Histogram of the n request parameter.
histogram_n_request

Checklist

  • PR title reflects the change and is of format <commit_type>: <Title>
  • Changes are described in the pull request.
  • Related issues are referenced.
  • Populated github labels field
  • Added test plan and verified test passes.
  • Verified that the PR passes existing CI.
  • Verified copyright is correct on all changed files.
  • Added succinct git squash message before merging ref.
  • All template sections are filled out.
  • Optional: Additional screenshots for behavior/output changes with before/after.

Commit Type:

Check the conventional commit type
box here and add the label to the github PR.

  • feat

Related PRs:

n/a

Where should the reviewer start?

n/a

Test plan:

ci/L0_backend_vllm/metrics_test

  • CI Pipeline ID:
    17681888

Caveats:

n/a

Background

Customers requested additional histogram metrics from vLLM.

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

n/a

@yinggeh yinggeh added the enhancement New feature or request label Aug 20, 2024
@yinggeh yinggeh self-assigned this Aug 20, 2024
@@ -172,3 +228,21 @@ def log(self, stats: VllmStats) -> None:
self.metrics.histogram_time_per_output_token,
stats.time_per_output_tokens_iter,
)
# Request stats
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest to refactor this as follows:

triton_metrics = [ self.metrics.histogram_time_to_first_token, ... ]
vllm_stats = [ stats.time_to_first_tokens_iter, ... ]

for tuple in zip(triton_metrics, vllm_stats):
    self._log_histogram(tuple)

This way it is more extendable in my mind. Triton metrics are all defined in one place, as well as vLLM's stats.
Potentially, maybe there's a way to extract all Triton metrics and vLLM stats without explicitly composing lists. I would recommend checking

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed

@yinggeh yinggeh force-pushed the yinggeh-DLIS-7229-more-vllm-histogram-metrcs branch from fca7a71 to 2394079 Compare August 21, 2024 21:50
@yinggeh yinggeh requested a review from oandreeva-nv August 21, 2024 21:51
Comment on lines +151 to +153
self.assertEqual(
metrics_dict["vllm:e2e_request_latency_seconds_count"], total_prompts
)
Copy link
Contributor

@rmccorm4 rmccorm4 Aug 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think it would be easier to read/review/maintain this function if we could group the common parts:

histogram_metrics = ["vllm:e2e_request_latency_seconds", "..."]
for metric in metrics:
  # Expect exactly one observation and bucket per prompt
  self.assertEqual(f"{metric}_count", total_prompts)
  self.assertEqual(f"{metric}_bucket", total_prompts)

  # Compare the exact expected sum where it makes sense, otherwise assert non-zero
  if metric.endswith("_best_of"):
    self.assertEqual(f"{metric}_sum", best_of*total_prompts)
  elif metric.endswith("_n"):
    self.assertEqual(f"{metric}_sum", n*total_prompts)
  else:
    self.assertGreater(f"{metric}_sum", 0)

for as many of the metrics that make sense to follow the same pattern.

We can have separate/special cases for the metrics that don't fit this pattern.

Feel free to modify or change if any of the above is incorrect, just an example.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your input. I personally don't think it's maintainable to add new metric tests since there are a lot of special cases. Counter example

# vllm:time_per_output_token_seconds
self.assertEqual(metrics_dict["vllm:time_per_output_token_seconds_count"], 45)
# This line is fine
self.assertGreater(metrics_dict["vllm:time_per_output_token_seconds_sum"], 0)
self.assertEqual(metrics_dict["vllm:time_per_output_token_seconds_bucket"], 45)

Copy link
Contributor

@rmccorm4 rmccorm4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall, just a nit on cleaning up the testing if you agree with the comment

@yinggeh yinggeh merged commit 98947a7 into main Aug 24, 2024
3 checks passed
@yinggeh yinggeh deleted the yinggeh-DLIS-7229-more-vllm-histogram-metrcs branch September 18, 2024 12:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants