Skip to content

Commit

Permalink
Add first-token latency
Browse files Browse the repository at this point in the history
Co-authored-by: Matthew Kotila <[email protected]>
  • Loading branch information
nealvaidya and matthewkotila committed Nov 27, 2023
1 parent 6f4b27e commit 7623a36
Showing 1 changed file with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions src/c++/perf_analyzer/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -73,8 +73,8 @@ changes in performance as you experiment with different optimization strategies.
[TorchServe](docs/benchmarking.md#benchmarking-torchserve) can be used as the
inference server in addition to the default Triton server

- [LLMs](docs/llm.md) can also be measured and charcterized with specific metrics
like token-to-token latency
- [LLMs](docs/llm.md) can also be measured and characterized with specific metrics
like first-token latency and token-to-token latency

<br>

Expand Down

0 comments on commit 7623a36

Please sign in to comment.