-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Fix L0_backend_vllm* jobs #62
Conversation
@@ -134,7 +124,7 @@ def test_vllm_metrics(self): | |||
# vllm:prompt_tokens_total | |||
self.assertEqual(metrics_dict["vllm:prompt_tokens_total"], 18) | |||
# vllm:generation_tokens_total | |||
self.assertEqual(metrics_dict["vllm:generation_tokens_total"], 188) | |||
self.assertEqual(metrics_dict["vllm:generation_tokens_total"], 48) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please update comment above with the expected answer and explain why we check 48
. I would like to avoid magic numbers in the code, to speed up the debugging time in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't find a way to print generation tokens directly from vLLM engine. And the calculation of generation_tokens_total
isn't straight forward (see https://github.com/vllm-project/vllm/blob/da1f7cc12a12ea4a744d26122e9a13ea4b3f4c7b/vllm/engine/llm_engine.py#L1086-L1088).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How did you define 48
as an expected number of tokens ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming vLLM reports the correct number of tokens in the current version (0.5.3-post1), this test makes sure the number stays consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amazing, thanks for this update !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
What does the PR do?
Fix L0_backend_vllm jobs. The metric outputs varies on the platform with non-greedy sampling. Move the checks for sampling parameter related metrics into a separate test case.
Broken pipeline ID: 109056988
Checklist
<commit_type>: <Title>
Commit Type:
Check the conventional commit type
box here and add the label to the github PR.
Related PRs:
n/a
Where should the reviewer start?
n/a
Test plan:
VLLMTritonMetricsTest.test_custom_sampling_params
18003888
Caveats:
n/a
Background
n/a
Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)
n/a