From 7623a36b6efe428834ead611b20d796643da4ce3 Mon Sep 17 00:00:00 2001 From: Neal Vaidya Date: Mon, 20 Nov 2023 18:08:31 -0500 Subject: [PATCH] Add first-token latency Co-authored-by: Matthew Kotila --- src/c++/perf_analyzer/README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/c++/perf_analyzer/README.md b/src/c++/perf_analyzer/README.md index 806a3db8f..89063c9ba 100644 --- a/src/c++/perf_analyzer/README.md +++ b/src/c++/perf_analyzer/README.md @@ -73,8 +73,8 @@ changes in performance as you experiment with different optimization strategies. [TorchServe](docs/benchmarking.md#benchmarking-torchserve) can be used as the inference server in addition to the default Triton server -- [LLMs](docs/llm.md) can also be measured and charcterized with specific metrics - like token-to-token latency +- [LLMs](docs/llm.md) can also be measured and characterized with specific metrics + like first-token latency and token-to-token latency