From b827aa435c9c41a92d918a137b0a2f6ab00ee645 Mon Sep 17 00:00:00 2001 From: David Yastremsky Date: Fri, 13 Dec 2024 11:40:44 -0800 Subject: [PATCH] Update README template --- templates/genai-perf-templates/README_template | 2 ++ 1 file changed, 2 insertions(+) diff --git a/templates/genai-perf-templates/README_template b/templates/genai-perf-templates/README_template index 31a1dc3a..6d0b9b86 100644 --- a/templates/genai-perf-templates/README_template +++ b/templates/genai-perf-templates/README_template @@ -33,6 +33,7 @@ generative AI models as served through an inference server. For large language models (LLMs), GenAI-Perf provides metrics such as [output token throughput](#output_token_throughput_metric), [time to first token](#time_to_first_token_metric), +[time to second token](#time_to_second_token_metric), [inter token latency](#inter_token_latency_metric), and [request throughput](#request_throughput_metric). For a full list of metrics please see the [Metrics section](#metrics). @@ -355,6 +356,7 @@ the inference server. | Metric | Description | Aggregations | | - | - | - | | Time to First Token | Time between when a request is sent and when its first response is received, one value per request in benchmark | Avg, min, max, p99, p90, p75 | +| Time to Second Token | Time between when the first streaming response is received and when the second streaming response is received, one value per request in benchmark | Avg, min, max, p99, p90, p75 | | Inter Token Latency | Time between intermediate responses for a single request divided by the number of generated tokens of the latter response, one value per response per request in benchmark | Avg, min, max, p99, p90, p75 | | Request Latency | Time between when a request is sent and when its final response is received, one value per request in benchmark | Avg, min, max, p99, p90, p75 | | Output Sequence Length | Total number of output tokens of a request, one value per request in benchmark | Avg, min, max, p99, p90, p75 |