diff --git a/src/c++/perf_analyzer/README.md b/src/c++/perf_analyzer/README.md index 881447651..806a3db8f 100644 --- a/src/c++/perf_analyzer/README.md +++ b/src/c++/perf_analyzer/README.md @@ -73,6 +73,9 @@ changes in performance as you experiment with different optimization strategies. [TorchServe](docs/benchmarking.md#benchmarking-torchserve) can be used as the inference server in addition to the default Triton server +- [LLMs](docs/llm.md) can also be measured and charcterized with specific metrics + like token-to-token latency +
# Quick Start