From b827aa435c9c41a92d918a137b0a2f6ab00ee645 Mon Sep 17 00:00:00 2001
From: David Yastremsky <dyastremsky@nvidia.com>
Date: Fri, 13 Dec 2024 11:40:44 -0800
Subject: [PATCH] Update README template

---
 templates/genai-perf-templates/README_template | 2 ++
 1 file changed, 2 insertions(+)
diff --git a/templates/genai-perf-templates/README_template b/templates/genai-perf-templates/README_template
index 31a1dc3a..6d0b9b86 100644
--- a/templates/genai-perf-templates/README_template
+++ b/templates/genai-perf-templates/README_template
@@ -33,6 +33,7 @@ generative AI models as served through an inference server.
 For large language models (LLMs), GenAI-Perf provides metrics such as
 [output token throughput](#output_token_throughput_metric),
 [time to first token](#time_to_first_token_metric),
+[time to second token](#time_to_second_token_metric),
 [inter token latency](#inter_token_latency_metric), and
 [request throughput](#request_throughput_metric).
 For a full list of metrics please see the [Metrics section](#metrics).
@@ -355,6 +356,7 @@ the inference server.
 | Metric | Description | Aggregations |
 | - | - | - |
 | <span id="time_to_first_token_metric">Time to First Token</span> | Time between when a request is sent and when its first response is received, one value per request in benchmark | Avg, min, max, p99, p90, p75 |
+| <span id="time_to_second_token_metric">Time to Second Token</span> | Time between when the first streaming response is received and when the second streaming response is received, one value per request in benchmark | Avg, min, max, p99, p90, p75 |
 | <span id="inter_token_latency_metric">Inter Token Latency</span> | Time between intermediate responses for a single request divided by the number of generated tokens of the latter response, one value per response per request in benchmark | Avg, min, max, p99, p90, p75 |
 | Request Latency | Time between when a request is sent and when its final response is received, one value per request in benchmark | Avg, min, max, p99, p90, p75 |
 | Output Sequence Length | Total number of output tokens of a request, one value per request in benchmark | Avg, min, max, p99, p90, p75 |