feat: Add vLLM counter metrics access through Triton #53

yinggeh · 2024-07-29T22:55:31Z

Sample endpoint output

# HELP vllm:prompt_tokens_total Number of prefill tokens processed.
# TYPE vllm:prompt_tokens_total counter
vllm:prompt_tokens_total{model="vllm_metrics",version="1"} 60
# HELP vllm:generation_tokens_total Number of generation tokens processed.
# TYPE vllm:generation_tokens_total counter
vllm:generation_tokens_total{model="vllm_metrics",version="1"} 96

What does the PR do?

Add vLLM counter metrics access through python_backend custom metrics.

Checklist

Commit Type:

Check the conventional commit type
box here and add the label to the github PR.

feat

Related PRs:

triton-inference-server/server#7493

Where should the reviewer start?

n/a

Test plan:

L0_backend_vllm/metrics_test

CI Pipeline ID:
17372863

Caveats:

Update user guide and docs

Background

Customers requested.

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

n/a

src/metrics.py

src/model.py

src/metrics.py

GuanLuo · 2024-07-30T00:46:59Z

test?

rmccorm4 · 2024-07-30T18:40:05Z

src/model.py

+            "version": self.args["model_version"],
+        }
+        logger = VllmStatLogger(labels=labels)
+        self.llm_engine.add_logger("triton", logger)


What is the cadence that the logger gets called? CC @kthui as this will involve round trips with core, similar to your investigation with request cancellation frequency.

Can you elaborate on this?

How often will the metrics get updated? Every request, every token, every full response, etc. ? in other words, how often will vLLM engine call this attached triton stats logger?

every iteration

That will probably significantly affect the total throughput then if the core round trip communication will interrupt the generation at every iteration based on Jacky and Iman's recent findings. We probably want this feature either way - just calling out that we'll likely need to make similar optimizations for this feature that @kthui is working on right now. Please work together to align on the best path forward for metrics feature + parity with vllm performance.

@kthui will run benchmarks.

The current path forward is to allow metrics to be turned off. There is still room to improve in the future, i.e. perform the core round-trip communication on a side branch.

At this point, the impact of having metrics (counter and gauge) on performance with --disable-log-stats flag set on FastAPI completion vs Triton generate_stream is negligible. The delta between FastAPI completion and Triton generate_stream without any metrics functionality added is approximately the same with metrics and having the --disable-log-stats flag set.

src/metrics.py

oandreeva-nv · 2024-08-01T22:49:24Z

Do we have a corresponding PR on the server side? Right now during the build container only copies model.py from this github repo, if we're adding metrics.py, that should be copied as well. Ideally, I suggest putting metics.py under utils/ folder and add logic to build.py that copies utils folder as well

oandreeva-nv · 2024-08-01T22:53:36Z

I would like to have a look at this PR as well, before it gets merged

yinggeh · 2024-08-03T01:32:18Z

Do we have a corresponding PR on the server side? Right now during the build container only copies model.py from this github repo, if we're adding metrics.py, that should be copied as well. Ideally, I suggest putting metics.py under utils/ folder and add logic to build.py that copies utils folder as well

@oandreeva-nv Updated. triton-inference-server/server#7493

yinggeh · 2024-08-03T01:33:18Z

test?

Added

ci/L0_backend_vllm/metrics_test/test.sh

ci/L0_backend_vllm/metrics_test/vllm_metrics_test.py

README.md

src/utils/metrics.py

kthui · 2024-08-14T22:03:55Z

src/utils/metrics.py

+class TritonMetrics:
+    def __init__(self, labels):
+        # Initialize metric families
+        # Iteration stats


Can you elaborate on the meaning of "Iteration stats"?

That's one of the vLLM metric catagories. See https://github.com/vllm-project/vllm/blob/fc93e5614374688bddc432279244ba7fbf8169c2/vllm/engine/metrics.py#L68

…TRICS" in config.pbtxt.

oandreeva-nv · 2024-08-15T19:15:10Z

README.md

+following lines to its config.pbtxt.
+```bash
+parameters: {
+  key: "REPORT_CUSTOM_METRICS"


nit: not sure if it should be in caps

To be consistent with the only parameters example found in our code

parameters: { key: "FORCE_CPU_ONLY_INPUT_TENSORS" value: { string_value:"no" } }

It's one example, here's lower case as well: https://github.com/triton-inference-server/server/blob/53200091b84f08a5e4921f5073137784570283e9/docs/user_guide/optimization.md#onnx-with-tensorrt-optimization-ort-trt

I am more inclined to upper case for boolean keys.

Can use key_value.upper() before the comparison:

>>> "nO".upper() 'NO'

I'd rather make either all parameters case-insensitive at the time config.pbtxt was loaded or all case-sensitive.

oandreeva-nv · 2024-08-15T19:17:04Z

src/model.py

+            "REPORT_CUSTOM_METRICS" in self.model_config["parameters"]
+            and self.model_config["parameters"]["REPORT_CUSTOM_METRICS"]["string_value"]
+            == "yes"
+        ):


nit: potentially we can also check if disable_log_stats is true

Nice catch. If FORCE_CPU_ONLY_INPUT_TENSORS = true but disable_log_stats=true, add_logger() throws exception. Test added.

you mean REPORT_CUSTOM_METRICS not FORCE_CPU_ONLY_INPUT_TENSORS ?

Sorry. Yes I meant REPORT_CUSTOM_METRICS.

Tabrizian · 2024-08-15T20:05:09Z

LGTM, please make sure @oandreeva-nv's comments are addressed. Thanks Yingge!

oandreeva-nv

LGTM! Thanks for this work!

kthui

LGTM, nice work!

This reverts commit 3829366.

Report vLLM counter metrics through Triton server

Co-authored-by: Yingge He <[email protected]>

yinggeh added the enhancement New feature or request label Jul 29, 2024

yinggeh requested a review from GuanLuo July 29, 2024 22:55

yinggeh self-assigned this Jul 29, 2024

yinggeh commented Jul 29, 2024

View reviewed changes

src/metrics.py Outdated Show resolved Hide resolved

Add first supported metrics

0686a7c

yinggeh force-pushed the yinggeh-DLIS-7061-add-vllm-metrics branch from e867687 to 0686a7c Compare July 29, 2024 23:03

yinggeh changed the title ~~Add vLLM metrics access~~ Add vLLM metrics access through Triton Jul 29, 2024

rmccorm4 reviewed Jul 29, 2024

View reviewed changes

src/model.py Outdated Show resolved Hide resolved

rmccorm4 reviewed Jul 29, 2024

View reviewed changes

src/model.py Outdated Show resolved Hide resolved

rmccorm4 reviewed Jul 29, 2024

View reviewed changes

src/metrics.py Outdated Show resolved Hide resolved

yinggeh requested a review from krishung5 July 30, 2024 18:10

Update comments

21e2356

yinggeh force-pushed the yinggeh-DLIS-7061-add-vllm-metrics branch from acc216d to 21e2356 Compare July 30, 2024 18:15

rmccorm4 reviewed Jul 30, 2024

View reviewed changes

rmccorm4 changed the title ~~Add vLLM metrics access through Triton~~ feat: Add vLLM metrics access through Triton Jul 30, 2024

rmccorm4 reviewed Jul 30, 2024

View reviewed changes

src/metrics.py Outdated Show resolved Hide resolved

yinggeh requested a review from kthui July 31, 2024 01:21

oandreeva-nv reviewed Aug 1, 2024

View reviewed changes

src/metrics.py Outdated Show resolved Hide resolved

Minor update

d95bb2c

yinggeh mentioned this pull request Aug 3, 2024

feat: Add vLLM counter metrics access through Triton triton-inference-server/server#7493

Merged

11 tasks

Add metrics test

321faa0

yinggeh force-pushed the yinggeh-DLIS-7061-add-vllm-metrics branch from 4f6e9f7 to 321faa0 Compare August 3, 2024 01:30

yinggeh requested a review from oandreeva-nv August 3, 2024 01:33

oandreeva-nv reviewed Aug 5, 2024

View reviewed changes

ci/L0_backend_vllm/metrics_test/test.sh Outdated Show resolved Hide resolved

oandreeva-nv reviewed Aug 5, 2024

View reviewed changes

ci/L0_backend_vllm/metrics_test/vllm_metrics_test.py Outdated Show resolved Hide resolved

yinggeh added 2 commits August 9, 2024 15:47

Update

8280d26

Change temp directory

6fa7ae3

yinggeh requested a review from oandreeva-nv August 9, 2024 23:25

GuanLuo previously approved these changes Aug 10, 2024

View reviewed changes

kthui previously approved these changes Aug 14, 2024

View reviewed changes

Disable metrics report by default. Controlled by parameter "REPORT_ME…

89ca6f4

…TRICS" in config.pbtxt.

yinggeh dismissed stale reviews from kthui and GuanLuo via 89ca6f4 August 15, 2024 02:57

yinggeh added 3 commits August 14, 2024 23:11

Test server option set --allow-metrics=false

1158fee

Add docs

a99d38b

Minor update

de8f25b

oandreeva-nv reviewed Aug 15, 2024

View reviewed changes

Tabrizian previously approved these changes Aug 15, 2024

View reviewed changes

Both args checking

b1333ce

yinggeh dismissed Tabrizian’s stale review via b1333ce August 15, 2024 20:20

yinggeh requested review from oandreeva-nv, kthui, Tabrizian and GuanLuo August 15, 2024 21:15

oandreeva-nv approved these changes Aug 15, 2024

View reviewed changes

kthui approved these changes Aug 15, 2024

View reviewed changes

yinggeh merged commit 3829366 into main Aug 16, 2024
3 checks passed

yinggeh added a commit that referenced this pull request Aug 16, 2024

Revert "feat: Add vLLM counter metrics access through Triton (#53)"

e1318b2

This reverts commit 3829366.

nvda-mesharma added Passed_on_A100 Passed_on_H100 Passed_on_V100 labels Aug 16, 2024

mc-nv pushed a commit that referenced this pull request Aug 19, 2024

feat: Add vLLM counter metrics access through Triton (#53)

6d969d6

Report vLLM counter metrics through Triton server

mc-nv added a commit that referenced this pull request Aug 19, 2024

feat: Add vLLM counter metrics access through Triton (#53) (#60)

50c665d

Co-authored-by: Yingge He <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add vLLM counter metrics access through Triton #53

feat: Add vLLM counter metrics access through Triton #53

yinggeh commented Jul 29, 2024 •

edited

Loading

GuanLuo commented Jul 30, 2024

rmccorm4 Jul 30, 2024

yinggeh Jul 30, 2024

rmccorm4 Jul 30, 2024

GuanLuo Jul 30, 2024

rmccorm4 Jul 30, 2024 •

edited

Loading

yinggeh Aug 8, 2024

kthui Aug 9, 2024

kthui Aug 9, 2024

oandreeva-nv commented Aug 1, 2024

oandreeva-nv commented Aug 1, 2024

yinggeh commented Aug 3, 2024 •

edited

Loading

yinggeh commented Aug 3, 2024

kthui Aug 14, 2024

yinggeh Aug 15, 2024

oandreeva-nv Aug 15, 2024

yinggeh Aug 15, 2024

oandreeva-nv Aug 15, 2024

yinggeh Aug 15, 2024 •

edited

Loading

kthui Aug 15, 2024

yinggeh Aug 16, 2024

oandreeva-nv Aug 15, 2024 •

edited

Loading

yinggeh Aug 15, 2024

oandreeva-nv Aug 15, 2024

yinggeh Aug 15, 2024

Tabrizian commented Aug 15, 2024

oandreeva-nv left a comment

kthui left a comment

feat: Add vLLM counter metrics access through Triton #53

feat: Add vLLM counter metrics access through Triton #53

Conversation

yinggeh commented Jul 29, 2024 • edited Loading

What does the PR do?

Checklist

Commit Type:

Related PRs:

Where should the reviewer start?

Test plan:

Caveats:

Background

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

GuanLuo commented Jul 30, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rmccorm4 Jul 30, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

oandreeva-nv commented Aug 1, 2024

oandreeva-nv commented Aug 1, 2024

yinggeh commented Aug 3, 2024 • edited Loading

yinggeh commented Aug 3, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yinggeh Aug 15, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

oandreeva-nv Aug 15, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Tabrizian commented Aug 15, 2024

oandreeva-nv left a comment

Choose a reason for hiding this comment

kthui left a comment

Choose a reason for hiding this comment

yinggeh commented Jul 29, 2024 •

edited

Loading

rmccorm4 Jul 30, 2024 •

edited

Loading

yinggeh commented Aug 3, 2024 •

edited

Loading

yinggeh Aug 15, 2024 •

edited

Loading

oandreeva-nv Aug 15, 2024 •

edited

Loading