diff --git a/genai-perf/README.md b/genai-perf/README.md
index 8df0b009..84e7b4d4 100644
--- a/genai-perf/README.md
+++ b/genai-perf/README.md
@@ -128,7 +128,7 @@ the GPT-2 model running on Triton Inference Server with a TensorRT-LLM engine.
 ### Serve GPT-2 TensorRT-LLM model using Triton CLI
 
 You can follow the [quickstart guide](https://github.com/triton-inference-server/triton_cli?tab=readme-ov-file#serving-a-trt-llm-model)
-on Triton CLI github repo to run GPT-2 model locally.
+in the Triton CLI Github repository to serve GPT-2 on the Triton server with the TensorRT-LLM backend.
 The full instructions are copied below for convenience:
 
 ```bash
@@ -139,12 +139,11 @@ docker run -ti \
     --network=host \
     --shm-size=1g --ulimit memlock=-1 \
     -v /tmp:/tmp \
-    -v ${HOME}/models:/root/models \
     -v ${HOME}/.cache/huggingface:/root/.cache/huggingface \
     nvcr.io/nvidia/tritonserver:24.08-trtllm-python-py3
 
 # Install the Triton CLI
-pip install git+https://github.com/triton-inference-server/triton_cli.git@0.0.8
+pip install git+https://github.com/triton-inference-server/triton_cli.git@0.0.11
 
 # Build TRT LLM engine and generate a Triton model repository pointing at it
 triton remove -m all
@@ -156,48 +155,27 @@ triton start
 
 ### Running GenAI-Perf
 
-Now we can run GenAI-Perf from Triton Inference Server SDK container:
+Now we can run GenAI-Perf inside the Triton Inference Server SDK container:
 
 ```bash
-export RELEASE="24.08"
-
-docker run -it --net=host --rm --gpus=all nvcr.io/nvidia/tritonserver:${RELEASE}-py3-sdk
-
-# Run GenAI-Perf in the container:
-genai-perf profile \
-  -m gpt2 \
-  --service-kind triton \
-  --backend tensorrtllm \
-  --num-prompts 100 \
-  --random-seed 123 \
-  --synthetic-input-tokens-mean 200 \
-  --synthetic-input-tokens-stddev 0 \
-  --streaming \
-  --output-tokens-mean 100 \
-  --output-tokens-stddev 0 \
-  --output-tokens-mean-deterministic \
-  --tokenizer hf-internal-testing/llama-tokenizer \
-  --concurrency 1 \
-  --measurement-interval 4000 \
-  --profile-export-file my_profile_export.json \
-  --url localhost:8001
+genai-perf profile -m gpt2 --service-kind triton --backend tensorrtllm --streaming
 ```
 
 Example output:
 
 ```
-                                   LLM Metrics
-┏━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┓
-┃                Statistic ┃    avg ┃    min ┃    max ┃    p99 ┃    p90 ┃    p75 ┃
-┡━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━┩
-│ Time to first token (ms) │  11.70 │   9.88 │  17.21 │  14.35 │  12.01 │  11.87 │
-│ Inter token latency (ms) │   1.46 │   1.08 │   1.89 │   1.87 │   1.62 │   1.52 │
-│     Request latency (ms) │ 161.24 │ 153.45 │ 200.74 │ 200.66 │ 179.43 │ 162.23 │
-│   Output sequence length │ 103.39 │  95.00 │ 134.00 │ 120.08 │ 107.30 │ 105.00 │
-│    Input sequence length │ 200.01 │ 200.00 │ 201.00 │ 200.13 │ 200.00 │ 200.00 │
-└──────────────────────────┴────────┴────────┴────────┴────────┴────────┴────────┘
-Output token throughput (per sec): 635.61
-Request throughput (per sec): 6.15
+                              NVIDIA GenAI-Perf | LLM Metrics
+┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┓
+┃                         Statistic ┃    avg ┃    min ┃    max ┃    p99 ┃    p90 ┃    p75 ┃
+┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━┩
+│          Time to first token (ms) │  16.26 │  12.39 │  17.25 │  17.09 │  16.68 │  16.56 │
+│          Inter token latency (ms) │   1.85 │   1.55 │   2.04 │   2.02 │   1.97 │   1.92 │
+│              Request latency (ms) │ 499.20 │ 451.01 │ 554.61 │ 548.69 │ 526.13 │ 514.19 │
+│            Output sequence length │ 261.90 │ 256.00 │ 298.00 │ 296.60 │ 270.00 │ 265.00 │
+│             Input sequence length │ 550.06 │ 550.00 │ 553.00 │ 551.60 │ 550.00 │ 550.00 │
+│ Output token throughput (per sec) │ 520.87 │    N/A │    N/A │    N/A │    N/A │    N/A │
+│      Request throughput (per sec) │   1.99 │    N/A │    N/A │    N/A │    N/A │    N/A │
+└───────────────────────────────────┴────────┴────────┴────────┴────────┴────────┴────────┘
 ```
 
 See [Tutorial](docs/tutorial.md) for additional examples.
diff --git a/genai-perf/docs/tutorial.md b/genai-perf/docs/tutorial.md
index 1a31511d..1b5464de 100644
--- a/genai-perf/docs/tutorial.md
+++ b/genai-perf/docs/tutorial.md
@@ -28,192 +28,121 @@ OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
 
 # Profile Large Language Models with GenAI-Perf
 
-- [Profile GPT2 running on Triton + TensorRT-LLM](#tensorrt-llm)
-- [Profile GPT2 running on Triton + vLLM](#triton-vllm)
-- [Profile GPT2 running on OpenAI Chat Completions API-Compatible Server](#openai-chat)
-- [Profile GPT2 running on OpenAI Completions API-Compatible Server](#openai-completions)
-
----
-
-## Profile GPT2 running on Triton + TensorRT-LLM <a id="tensorrt-llm"></a>
-
-### Run GPT2 on Triton Inference Server using TensorRT-LLM
-
-<details>
-<summary>See instructions</summary>
-
-Run Triton Inference Server with TensorRT-LLM backend container:
+This tutorial will demonstrate how you can use GenAI-Perf to measure the performance of
+various inference endpoints such as
+[KServe inference protocol](https://github.com/kserve/kserve/tree/master/docs/predict-api/v2)
+and [OpenAI API](https://platform.openai.com/docs/api-reference/introduction)
+that are widely used across the industry.
 
-```bash
-export RELEASE="24.08"
+### Table of Contents
 
-docker run -it --net=host --gpus=all --shm-size=2g --ulimit memlock=-1 --ulimit stack=67108864 nvcr.io/nvidia/tritonserver:${RELEASE}-trtllm-python-py3
+- [Profile GPT2 running on Triton + TensorRT-LLM Backend](#tensorrt-llm)
+- [Profile GPT2 running on Triton + vLLM Backend](#triton-vllm)
+- [Profile GPT2 running on OpenAI Chat Completions API-Compatible Server](#openai-chat)
+- [Profile GPT2 running on OpenAI Completions API-Compatible Server](#openai-completions)
 
-# Install Triton CLI (~5 min):
-pip install "git+https://github.com/triton-inference-server/triton_cli@0.0.8"
+</br>
 
-# Download model:
-triton import -m gpt2 --backend tensorrtllm
+## Profile GPT-2 running on Triton + TensorRT-LLM <a id="tensorrt-llm"></a>
 
-# Run server:
-triton start
-```
-
-</details>
+You can follow the [quickstart guide](https://github.com/triton-inference-server/triton_cli?tab=readme-ov-file#serving-a-trt-llm-model)
+in the Triton CLI Github repository to serve GPT-2 on the Triton server with the TensorRT-LLM backend.
 
 ### Run GenAI-Perf
 
-Run GenAI-Perf from Triton Inference Server SDK container:
+Run GenAI-Perf inside the Triton Inference Server SDK container:
 
 ```bash
-export RELEASE="24.08"
-
-docker run -it --net=host --gpus=all nvcr.io/nvidia/tritonserver:${RELEASE}-py3-sdk
-
-# Run GenAI-Perf in the container:
 genai-perf profile \
   -m gpt2 \
   --service-kind triton \
   --backend tensorrtllm \
-  --num-prompts 100 \
-  --random-seed 123 \
   --synthetic-input-tokens-mean 200 \
   --synthetic-input-tokens-stddev 0 \
-  --streaming \
   --output-tokens-mean 100 \
   --output-tokens-stddev 0 \
   --output-tokens-mean-deterministic \
-  --tokenizer hf-internal-testing/llama-tokenizer \
-  --concurrency 1 \
-  --measurement-interval 4000 \
-  --profile-export-file my_profile_export.json \
-  --url localhost:8001
+  --streaming
 ```
 
 Example output:
 
 ```
-                                        NVIDIA GenAI-Perf | LLM Metrics
-┏━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━┓
-┃                Statistic ┃         avg ┃         min ┃         max ┃         p99 ┃         p90 ┃         p75 ┃
-┡━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━┩
-│ Time to first token (ns) │  13,266,974 │  11,818,732 │  18,351,779 │  16,513,479 │  13,741,986 │  13,544,376 │
-│ Inter token latency (ns) │   2,069,766 │      42,023 │  15,307,799 │   3,256,375 │   3,020,580 │   2,090,930 │
-│     Request latency (ns) │ 223,532,625 │ 219,123,330 │ 241,004,192 │ 238,198,306 │ 229,676,183 │ 224,715,918 │
-│   Output sequence length │         104 │         100 │         129 │         128 │         109 │         105 │
-│    Input sequence length │         199 │         199 │         199 │         199 │         199 │         199 │
-└──────────────────────────┴─────────────┴─────────────┴─────────────┴─────────────┴─────────────┴─────────────┘
-Output token throughput (per sec): 460.42
-Request throughput (per sec): 4.44
+                              NVIDIA GenAI-Perf | LLM Metrics
+┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┓
+┃                         Statistic ┃    avg ┃    min ┃    max ┃    p99 ┃    p90 ┃    p75 ┃
+┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━┩
+│          Time to first token (ms) │  13.68 │  11.07 │  21.50 │  18.81 │  14.29 │  13.97 │
+│          Inter token latency (ms) │   1.86 │   1.28 │   2.11 │   2.11 │   2.01 │   1.95 │
+│              Request latency (ms) │ 203.70 │ 180.33 │ 228.30 │ 225.45 │ 216.48 │ 211.72 │
+│            Output sequence length │ 103.46 │  95.00 │ 134.00 │ 122.96 │ 108.00 │ 104.75 │
+│             Input sequence length │ 200.00 │ 200.00 │ 200.00 │ 200.00 │ 200.00 │ 200.00 │
+│ Output token throughput (per sec) │ 504.02 │    N/A │    N/A │    N/A │    N/A │    N/A │
+│      Request throughput (per sec) │   4.87 │    N/A │    N/A │    N/A │    N/A │    N/A │
+└───────────────────────────────────┴────────┴────────┴────────┴────────┴────────┴────────┘
 ```
 
-## Profile GPT2 running on Triton + vLLM <a id="triton-vllm"></a>
-
-### Run GPT2 on Triton Inference Server using vLLM
-
-<details>
-<summary>See instructions</summary>
-
-Run Triton Inference Server with vLLM backend container:
-
-```bash
-export RELEASE="24.08"
-
-
-docker run -it --net=host --gpus=1 --shm-size=2g --ulimit memlock=-1 --ulimit stack=67108864 nvcr.io/nvidia/tritonserver:${RELEASE}-vllm-python-py3
+## Profile GPT-2 running on Triton + vLLM <a id="triton-vllm"></a>
 
-# Install Triton CLI (~5 min):
-pip install "git+https://github.com/triton-inference-server/triton_cli@0.0.8"
-
-# Download model:
-triton import -m gpt2 --backend vllm
-
-# Run server:
-triton start
-```
-
-</details>
+You can follow the [quickstart guide](https://github.com/triton-inference-server/triton_cli?tab=readme-ov-file#serving-a-vllm-model)
+in the Triton CLI Github repository to serve GPT-2 on the Triton server with the vLLM backend.
 
 ### Run GenAI-Perf
 
-Run GenAI-Perf from Triton Inference Server SDK container:
+Run GenAI-Perf inside the Triton Inference Server SDK container:
 
 ```bash
-export RELEASE="24.08"
-
-docker run -it --net=host --gpus=1 nvcr.io/nvidia/tritonserver:${RELEASE}-py3-sdk
-
-# Run GenAI-Perf in the container:
 genai-perf profile \
   -m gpt2 \
   --service-kind triton \
   --backend vllm \
-  --num-prompts 100 \
-  --random-seed 123 \
   --synthetic-input-tokens-mean 200 \
   --synthetic-input-tokens-stddev 0 \
-  --streaming \
   --output-tokens-mean 100 \
   --output-tokens-stddev 0 \
   --output-tokens-mean-deterministic \
-  --tokenizer hf-internal-testing/llama-tokenizer \
-  --concurrency 1 \
-  --measurement-interval 4000 \
-  --profile-export-file my_profile_export.json \
-  --url localhost:8001
+  --streaming
 ```
 
 Example output:
 
 ```
-                                     NVIDIA GenAI-Perf | LLM Metrics
-┏━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━┓
-┃                Statistic ┃         avg ┃         min ┃         max ┃         p99 ┃         p90 ┃         p75 ┃
-┡━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━┩
-│ Time to first token (ns) │  15,786,560 │  11,437,189 │  49,550,549 │  40,129,652 │  21,248,091 │  17,824,695 │
-│ Inter token latency (ns) │   3,543,380 │     591,898 │  10,013,690 │   6,152,260 │   5,039,278 │   4,060,982 │
-│     Request latency (ns) │ 388,415,721 │ 312,552,612 │ 528,229,817 │ 518,189,390 │ 484,281,365 │ 459,417,637 │
-│   Output sequence length │         113 │         105 │         123 │         122 │         119 │         115 │
-│    Input sequence length │         199 │         199 │         199 │         199 │         199 │         199 │
-└──────────────────────────┴─────────────┴─────────────┴─────────────┴─────────────┴─────────────┴─────────────┘
-Output token throughput (per sec): 290.24
-Request throughput (per sec): 2.57
+                              NVIDIA GenAI-Perf | LLM Metrics
+┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┓
+┃                         Statistic ┃    avg ┃    min ┃    max ┃    p99 ┃    p90 ┃    p75 ┃
+┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━┩
+│          Time to first token (ms) │  22.04 │  14.00 │  26.02 │  25.73 │  24.41 │  24.06 │
+│          Inter token latency (ms) │   4.58 │   3.45 │   5.34 │   5.33 │   5.11 │   4.86 │
+│              Request latency (ms) │ 542.48 │ 468.10 │ 622.39 │ 615.67 │ 584.73 │ 555.90 │
+│            Output sequence length │ 115.15 │ 103.00 │ 143.00 │ 138.00 │ 120.00 │ 118.50 │
+│             Input sequence length │ 200.00 │ 200.00 │ 200.00 │ 200.00 │ 200.00 │ 200.00 │
+│ Output token throughput (per sec) │ 212.04 │    N/A │    N/A │    N/A │    N/A │    N/A │
+│      Request throughput (per sec) │   1.84 │    N/A │    N/A │    N/A │    N/A │    N/A │
+└───────────────────────────────────┴────────┴────────┴────────┴────────┴────────┴────────┘
 ```
 
-## Profile Zephyr running on OpenAI Chat API-Compatible Server <a id="openai-chat"></a>
-
-### Run Zephyr on [OpenAI Chat Completions API](https://platform.openai.com/docs/api-reference/chat)-compatible server
+## Profile Zephyr-7B-Beta running on OpenAI Chat API-Compatible Server <a id="openai-chat"></a>
 
-<details>
-<summary>See instructions</summary>
-
-Run the vLLM inference server:
+Serve the model on the vLLM server with [OpenAI Chat Completions API](https://platform.openai.com/docs/api-reference/chat) endpoint:
 
 ```bash
 docker run -it --net=host --gpus=all vllm/vllm-openai:latest --model HuggingFaceH4/zephyr-7b-beta --dtype float16
 ```
 
-</details>
-
 ### Run GenAI-Perf
 
-Run GenAI-Perf from Triton Inference Server SDK container:
+Run GenAI-Perf inside the Triton Inference Server SDK container:
 
 ```bash
-export RELEASE="24.08"
-
-docker run -it --net=host --gpus=all nvcr.io/nvidia/tritonserver:${RELEASE}-py3-sdk
-
-# Run GenAI-Perf in the container:
 genai-perf profile \
   -m HuggingFaceH4/zephyr-7b-beta \
   --service-kind openai \
   --endpoint-type chat \
   --synthetic-input-tokens-mean 200 \
   --synthetic-input-tokens-stddev 0 \
-  --streaming \
   --output-tokens-mean 100 \
   --output-tokens-stddev 0 \
+  --streaming \
   --tokenizer HuggingFaceH4/zephyr-7b-beta
 ```
 
@@ -234,54 +163,33 @@ Example output:
 └───────────────────────────────────┴──────────┴──────────┴──────────┴──────────┴──────────┴──────────┘
 ```
 
-## Profile GPT2 running on OpenAI Completions API-Compatible Server <a id="openai-completions"></a>
-
-### Running GPT2 on [OpenAI Completions API](https://platform.openai.com/docs/api-reference/completions)-compatible server
+## Profile GPT-2 running on OpenAI Completions API-Compatible Server <a id="openai-completions"></a>
 
-<details>
-<summary>See instructions</summary>
-
-Run the vLLM inference server:
+Serve the model on the vLLM server with [OpenAI Completions API](https://platform.openai.com/docs/api-reference/completions) endpoint:
 
 ```bash
 docker run -it --net=host --gpus=all vllm/vllm-openai:latest --model gpt2 --dtype float16 --max-model-len 1024
 ```
 
-</details>
-
 ### Run GenAI-Perf
 
-Run GenAI-Perf from Triton Inference Server SDK container:
+Run GenAI-Perf inside the Triton Inference Server SDK container:
 
 ```bash
-export RELEASE="24.08"
-
-docker run -it --net=host --gpus=all nvcr.io/nvidia/tritonserver:${RELEASE}-py3-sdk
-
-
-# Run GenAI-Perf in the container:
 genai-perf profile \
   -m gpt2 \
   --service-kind openai \
-  --endpoint v1/completions \
   --endpoint-type completions \
-  --num-prompts 100 \
-  --random-seed 123 \
   --synthetic-input-tokens-mean 200 \
   --synthetic-input-tokens-stddev 0 \
   --output-tokens-mean 100 \
-  --output-tokens-stddev 0 \
-  --tokenizer hf-internal-testing/llama-tokenizer \
-  --concurrency 1 \
-  --measurement-interval 4000 \
-  --profile-export-file my_profile_export.json \
-  --url localhost:8000
+  --output-tokens-stddev 0
 ```
 
 Example output:
 
 ```
-                              NVIDIA GenAI-Perf | LLM Metrics
+                             NVIDIA GenAI-Perf | LLM Metrics
 ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┓
 ┃                         Statistic ┃    avg ┃    min ┃    max ┃    p99 ┃    p90 ┃    p75 ┃
 ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━┩
diff --git a/templates/genai-perf-templates/README_template b/templates/genai-perf-templates/README_template
index 2c742b22..eb88a141 100644
--- a/templates/genai-perf-templates/README_template
+++ b/templates/genai-perf-templates/README_template
@@ -128,7 +128,7 @@ the GPT-2 model running on Triton Inference Server with a TensorRT-LLM engine.
 ### Serve GPT-2 TensorRT-LLM model using Triton CLI
 
 You can follow the [quickstart guide](https://github.com/triton-inference-server/triton_cli?tab=readme-ov-file#serving-a-trt-llm-model)
-on Triton CLI github repo to run GPT-2 model locally.
+in the Triton CLI Github repository to serve GPT-2 on the Triton server with the TensorRT-LLM backend.
 The full instructions are copied below for convenience:
 
 ```bash
@@ -139,7 +139,6 @@ docker run -ti \
     --network=host \
     --shm-size=1g --ulimit memlock=-1 \
     -v /tmp:/tmp \
-    -v ${HOME}/models:/root/models \
     -v ${HOME}/.cache/huggingface:/root/.cache/huggingface \
     nvcr.io/nvidia/tritonserver:{{ release }}-trtllm-python-py3
 
@@ -156,48 +155,27 @@ triton start
 
 ### Running GenAI-Perf
 
-Now we can run GenAI-Perf from Triton Inference Server SDK container:
+Now we can run GenAI-Perf inside the Triton Inference Server SDK container:
 
 ```bash
-export RELEASE="{{ release }}"
-
-docker run -it --net=host --rm --gpus=all nvcr.io/nvidia/tritonserver:${RELEASE}-py3-sdk
-
-# Run GenAI-Perf in the container:
-genai-perf profile \
-  -m gpt2 \
-  --service-kind triton \
-  --backend tensorrtllm \
-  --num-prompts 100 \
-  --random-seed 123 \
-  --synthetic-input-tokens-mean 200 \
-  --synthetic-input-tokens-stddev 0 \
-  --streaming \
-  --output-tokens-mean 100 \
-  --output-tokens-stddev 0 \
-  --output-tokens-mean-deterministic \
-  --tokenizer hf-internal-testing/llama-tokenizer \
-  --concurrency 1 \
-  --measurement-interval 4000 \
-  --profile-export-file my_profile_export.json \
-  --url localhost:8001
+genai-perf profile -m gpt2 --service-kind triton --backend tensorrtllm --streaming
 ```
 
 Example output:
 
 ```
-                                   LLM Metrics
-┏━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┓
-┃                Statistic ┃    avg ┃    min ┃    max ┃    p99 ┃    p90 ┃    p75 ┃
-┡━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━┩
-│ Time to first token (ms) │  11.70 │   9.88 │  17.21 │  14.35 │  12.01 │  11.87 │
-│ Inter token latency (ms) │   1.46 │   1.08 │   1.89 │   1.87 │   1.62 │   1.52 │
-│     Request latency (ms) │ 161.24 │ 153.45 │ 200.74 │ 200.66 │ 179.43 │ 162.23 │
-│   Output sequence length │ 103.39 │  95.00 │ 134.00 │ 120.08 │ 107.30 │ 105.00 │
-│    Input sequence length │ 200.01 │ 200.00 │ 201.00 │ 200.13 │ 200.00 │ 200.00 │
-└──────────────────────────┴────────┴────────┴────────┴────────┴────────┴────────┘
-Output token throughput (per sec): 635.61
-Request throughput (per sec): 6.15
+                              NVIDIA GenAI-Perf | LLM Metrics
+┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┓
+┃                         Statistic ┃    avg ┃    min ┃    max ┃    p99 ┃    p90 ┃    p75 ┃
+┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━┩
+│          Time to first token (ms) │  16.26 │  12.39 │  17.25 │  17.09 │  16.68 │  16.56 │
+│          Inter token latency (ms) │   1.85 │   1.55 │   2.04 │   2.02 │   1.97 │   1.92 │
+│              Request latency (ms) │ 499.20 │ 451.01 │ 554.61 │ 548.69 │ 526.13 │ 514.19 │
+│            Output sequence length │ 261.90 │ 256.00 │ 298.00 │ 296.60 │ 270.00 │ 265.00 │
+│             Input sequence length │ 550.06 │ 550.00 │ 553.00 │ 551.60 │ 550.00 │ 550.00 │
+│ Output token throughput (per sec) │ 520.87 │    N/A │    N/A │    N/A │    N/A │    N/A │
+│      Request throughput (per sec) │   1.99 │    N/A │    N/A │    N/A │    N/A │    N/A │
+└───────────────────────────────────┴────────┴────────┴────────┴────────┴────────┴────────┘
 ```
 
 See [Tutorial](docs/tutorial.md) for additional examples.
diff --git a/templates/genai-perf-templates/tutorial_template b/templates/genai-perf-templates/tutorial_template
index d36fc88b..43271fe9 100644
--- a/templates/genai-perf-templates/tutorial_template
+++ b/templates/genai-perf-templates/tutorial_template
@@ -28,192 +28,121 @@ OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
 
 # Profile Large Language Models with GenAI-Perf
 
-- [Profile GPT2 running on Triton + TensorRT-LLM](#tensorrt-llm)
-- [Profile GPT2 running on Triton + vLLM](#triton-vllm)
-- [Profile GPT2 running on OpenAI Chat Completions API-Compatible Server](#openai-chat)
-- [Profile GPT2 running on OpenAI Completions API-Compatible Server](#openai-completions)
-
----
-
-## Profile GPT2 running on Triton + TensorRT-LLM <a id="tensorrt-llm"></a>
-
-### Run GPT2 on Triton Inference Server using TensorRT-LLM
-
-<details>
-<summary>See instructions</summary>
-
-Run Triton Inference Server with TensorRT-LLM backend container:
+This tutorial will demonstrate how you can use GenAI-Perf to measure the performance of
+various inference endpoints such as
+[KServe inference protocol](https://github.com/kserve/kserve/tree/master/docs/predict-api/v2)
+and [OpenAI API](https://platform.openai.com/docs/api-reference/introduction)
+that are widely used across the industry.
 
-```bash
-export RELEASE="{{ release }}"
+### Table of Contents
 
-docker run -it --net=host --gpus=all --shm-size=2g --ulimit memlock=-1 --ulimit stack=67108864 nvcr.io/nvidia/tritonserver:${RELEASE}-trtllm-python-py3
+- [Profile GPT2 running on Triton + TensorRT-LLM Backend](#tensorrt-llm)
+- [Profile GPT2 running on Triton + vLLM Backend](#triton-vllm)
+- [Profile GPT2 running on OpenAI Chat Completions API-Compatible Server](#openai-chat)
+- [Profile GPT2 running on OpenAI Completions API-Compatible Server](#openai-completions)
 
-# Install Triton CLI (~5 min):
-pip install "git+https://github.com/triton-inference-server/triton_cli@{{ triton_cli_version }}"
+</br>
 
-# Download model:
-triton import -m gpt2 --backend tensorrtllm
+## Profile GPT-2 running on Triton + TensorRT-LLM <a id="tensorrt-llm"></a>
 
-# Run server:
-triton start
-```
-
-</details>
+You can follow the [quickstart guide](https://github.com/triton-inference-server/triton_cli?tab=readme-ov-file#serving-a-trt-llm-model)
+in the Triton CLI Github repository to serve GPT-2 on the Triton server with the TensorRT-LLM backend.
 
 ### Run GenAI-Perf
 
-Run GenAI-Perf from Triton Inference Server SDK container:
+Run GenAI-Perf inside the Triton Inference Server SDK container:
 
 ```bash
-export RELEASE="{{ release }}"
-
-docker run -it --net=host --gpus=all nvcr.io/nvidia/tritonserver:${RELEASE}-py3-sdk
-
-# Run GenAI-Perf in the container:
 genai-perf profile \
   -m gpt2 \
   --service-kind triton \
   --backend tensorrtllm \
-  --num-prompts 100 \
-  --random-seed 123 \
   --synthetic-input-tokens-mean 200 \
   --synthetic-input-tokens-stddev 0 \
-  --streaming \
   --output-tokens-mean 100 \
   --output-tokens-stddev 0 \
   --output-tokens-mean-deterministic \
-  --tokenizer hf-internal-testing/llama-tokenizer \
-  --concurrency 1 \
-  --measurement-interval 4000 \
-  --profile-export-file my_profile_export.json \
-  --url localhost:8001
+  --streaming
 ```
 
 Example output:
 
 ```
-                                        NVIDIA GenAI-Perf | LLM Metrics
-┏━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━┓
-┃                Statistic ┃         avg ┃         min ┃         max ┃         p99 ┃         p90 ┃         p75 ┃
-┡━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━┩
-│ Time to first token (ns) │  13,266,974 │  11,818,732 │  18,351,779 │  16,513,479 │  13,741,986 │  13,544,376 │
-│ Inter token latency (ns) │   2,069,766 │      42,023 │  15,307,799 │   3,256,375 │   3,020,580 │   2,090,930 │
-│     Request latency (ns) │ 223,532,625 │ 219,123,330 │ 241,004,192 │ 238,198,306 │ 229,676,183 │ 224,715,918 │
-│   Output sequence length │         104 │         100 │         129 │         128 │         109 │         105 │
-│    Input sequence length │         199 │         199 │         199 │         199 │         199 │         199 │
-└──────────────────────────┴─────────────┴─────────────┴─────────────┴─────────────┴─────────────┴─────────────┘
-Output token throughput (per sec): 460.42
-Request throughput (per sec): 4.44
+                              NVIDIA GenAI-Perf | LLM Metrics
+┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┓
+┃                         Statistic ┃    avg ┃    min ┃    max ┃    p99 ┃    p90 ┃    p75 ┃
+┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━┩
+│          Time to first token (ms) │  13.68 │  11.07 │  21.50 │  18.81 │  14.29 │  13.97 │
+│          Inter token latency (ms) │   1.86 │   1.28 │   2.11 │   2.11 │   2.01 │   1.95 │
+│              Request latency (ms) │ 203.70 │ 180.33 │ 228.30 │ 225.45 │ 216.48 │ 211.72 │
+│            Output sequence length │ 103.46 │  95.00 │ 134.00 │ 122.96 │ 108.00 │ 104.75 │
+│             Input sequence length │ 200.00 │ 200.00 │ 200.00 │ 200.00 │ 200.00 │ 200.00 │
+│ Output token throughput (per sec) │ 504.02 │    N/A │    N/A │    N/A │    N/A │    N/A │
+│      Request throughput (per sec) │   4.87 │    N/A │    N/A │    N/A │    N/A │    N/A │
+└───────────────────────────────────┴────────┴────────┴────────┴────────┴────────┴────────┘
 ```
 
-## Profile GPT2 running on Triton + vLLM <a id="triton-vllm"></a>
-
-### Run GPT2 on Triton Inference Server using vLLM
-
-<details>
-<summary>See instructions</summary>
-
-Run Triton Inference Server with vLLM backend container:
-
-```bash
-export RELEASE="{{ release }}"
-
-
-docker run -it --net=host --gpus=1 --shm-size=2g --ulimit memlock=-1 --ulimit stack=67108864 nvcr.io/nvidia/tritonserver:${RELEASE}-vllm-python-py3
+## Profile GPT-2 running on Triton + vLLM <a id="triton-vllm"></a>
 
-# Install Triton CLI (~5 min):
-pip install "git+https://github.com/triton-inference-server/triton_cli@0.0.8"
-
-# Download model:
-triton import -m gpt2 --backend vllm
-
-# Run server:
-triton start
-```
-
-</details>
+You can follow the [quickstart guide](https://github.com/triton-inference-server/triton_cli?tab=readme-ov-file#serving-a-vllm-model)
+in the Triton CLI Github repository to serve GPT-2 on the Triton server with the vLLM backend.
 
 ### Run GenAI-Perf
 
-Run GenAI-Perf from Triton Inference Server SDK container:
+Run GenAI-Perf inside the Triton Inference Server SDK container:
 
 ```bash
-export RELEASE="{{ release }}"
-
-docker run -it --net=host --gpus=1 nvcr.io/nvidia/tritonserver:${RELEASE}-py3-sdk
-
-# Run GenAI-Perf in the container:
 genai-perf profile \
   -m gpt2 \
   --service-kind triton \
   --backend vllm \
-  --num-prompts 100 \
-  --random-seed 123 \
   --synthetic-input-tokens-mean 200 \
   --synthetic-input-tokens-stddev 0 \
-  --streaming \
   --output-tokens-mean 100 \
   --output-tokens-stddev 0 \
   --output-tokens-mean-deterministic \
-  --tokenizer hf-internal-testing/llama-tokenizer \
-  --concurrency 1 \
-  --measurement-interval 4000 \
-  --profile-export-file my_profile_export.json \
-  --url localhost:8001
+  --streaming
 ```
 
 Example output:
 
 ```
-                                     NVIDIA GenAI-Perf | LLM Metrics
-┏━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━┓
-┃                Statistic ┃         avg ┃         min ┃         max ┃         p99 ┃         p90 ┃         p75 ┃
-┡━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━┩
-│ Time to first token (ns) │  15,786,560 │  11,437,189 │  49,550,549 │  40,129,652 │  21,248,091 │  17,824,695 │
-│ Inter token latency (ns) │   3,543,380 │     591,898 │  10,013,690 │   6,152,260 │   5,039,278 │   4,060,982 │
-│     Request latency (ns) │ 388,415,721 │ 312,552,612 │ 528,229,817 │ 518,189,390 │ 484,281,365 │ 459,417,637 │
-│   Output sequence length │         113 │         105 │         123 │         122 │         119 │         115 │
-│    Input sequence length │         199 │         199 │         199 │         199 │         199 │         199 │
-└──────────────────────────┴─────────────┴─────────────┴─────────────┴─────────────┴─────────────┴─────────────┘
-Output token throughput (per sec): 290.24
-Request throughput (per sec): 2.57
+                              NVIDIA GenAI-Perf | LLM Metrics
+┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┓
+┃                         Statistic ┃    avg ┃    min ┃    max ┃    p99 ┃    p90 ┃    p75 ┃
+┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━┩
+│          Time to first token (ms) │  22.04 │  14.00 │  26.02 │  25.73 │  24.41 │  24.06 │
+│          Inter token latency (ms) │   4.58 │   3.45 │   5.34 │   5.33 │   5.11 │   4.86 │
+│              Request latency (ms) │ 542.48 │ 468.10 │ 622.39 │ 615.67 │ 584.73 │ 555.90 │
+│            Output sequence length │ 115.15 │ 103.00 │ 143.00 │ 138.00 │ 120.00 │ 118.50 │
+│             Input sequence length │ 200.00 │ 200.00 │ 200.00 │ 200.00 │ 200.00 │ 200.00 │
+│ Output token throughput (per sec) │ 212.04 │    N/A │    N/A │    N/A │    N/A │    N/A │
+│      Request throughput (per sec) │   1.84 │    N/A │    N/A │    N/A │    N/A │    N/A │
+└───────────────────────────────────┴────────┴────────┴────────┴────────┴────────┴────────┘
 ```
 
-## Profile Zephyr running on OpenAI Chat API-Compatible Server <a id="openai-chat"></a>
-
-### Run Zephyr on [OpenAI Chat Completions API](https://platform.openai.com/docs/api-reference/chat)-compatible server
+## Profile Zephyr-7B-Beta running on OpenAI Chat API-Compatible Server <a id="openai-chat"></a>
 
-<details>
-<summary>See instructions</summary>
-
-Run the vLLM inference server:
+Serve the model on the vLLM server with [OpenAI Chat Completions API](https://platform.openai.com/docs/api-reference/chat) endpoint:
 
 ```bash
 docker run -it --net=host --gpus=all vllm/vllm-openai:latest --model HuggingFaceH4/zephyr-7b-beta --dtype float16
 ```
 
-</details>
-
 ### Run GenAI-Perf
 
-Run GenAI-Perf from Triton Inference Server SDK container:
+Run GenAI-Perf inside the Triton Inference Server SDK container:
 
 ```bash
-export RELEASE="{{ release }}"
-
-docker run -it --net=host --gpus=all nvcr.io/nvidia/tritonserver:${RELEASE}-py3-sdk
-
-# Run GenAI-Perf in the container:
 genai-perf profile \
   -m HuggingFaceH4/zephyr-7b-beta \
   --service-kind openai \
   --endpoint-type chat \
   --synthetic-input-tokens-mean 200 \
   --synthetic-input-tokens-stddev 0 \
-  --streaming \
   --output-tokens-mean 100 \
   --output-tokens-stddev 0 \
+  --streaming \
   --tokenizer HuggingFaceH4/zephyr-7b-beta
 ```
 
@@ -234,48 +163,27 @@ Example output:
 └───────────────────────────────────┴──────────┴──────────┴──────────┴──────────┴──────────┴──────────┘
 ```
 
-## Profile GPT2 running on OpenAI Completions API-Compatible Server <a id="openai-completions"></a>
-
-### Running GPT2 on [OpenAI Completions API](https://platform.openai.com/docs/api-reference/completions)-compatible server
+## Profile GPT-2 running on OpenAI Completions API-Compatible Server <a id="openai-completions"></a>
 
-<details>
-<summary>See instructions</summary>
-
-Run the vLLM inference server:
+Serve the model on the vLLM server with [OpenAI Completions API](https://platform.openai.com/docs/api-reference/completions) endpoint:
 
 ```bash
 docker run -it --net=host --gpus=all vllm/vllm-openai:latest --model gpt2 --dtype float16 --max-model-len 1024
 ```
 
-</details>
-
 ### Run GenAI-Perf
 
-Run GenAI-Perf from Triton Inference Server SDK container:
+Run GenAI-Perf inside the Triton Inference Server SDK container:
 
 ```bash
-export RELEASE="{{ release }}"
-
-docker run -it --net=host --gpus=all nvcr.io/nvidia/tritonserver:${RELEASE}-py3-sdk
-
-
-# Run GenAI-Perf in the container:
 genai-perf profile \
   -m gpt2 \
   --service-kind openai \
-  --endpoint v1/completions \
   --endpoint-type completions \
-  --num-prompts 100 \
-  --random-seed 123 \
   --synthetic-input-tokens-mean 200 \
   --synthetic-input-tokens-stddev 0 \
   --output-tokens-mean 100 \
-  --output-tokens-stddev 0 \
-  --tokenizer hf-internal-testing/llama-tokenizer \
-  --concurrency 1 \
-  --measurement-interval 4000 \
-  --profile-export-file my_profile_export.json \
-  --url localhost:8000
+  --output-tokens-stddev 0
 ```
 
 Example output:
diff --git a/templates/template_vars.yaml b/templates/template_vars.yaml
index 12e88eb6..373d0fef 100644
--- a/templates/template_vars.yaml
+++ b/templates/template_vars.yaml
@@ -1,6 +1,6 @@
 General:
   release: 24.08
-  triton_cli_version: 0.0.8
+  triton_cli_version: 0.0.11
   genai_perf_version: 0.0.6dev
 
 README:
@@ -46,4 +46,4 @@ tutorial:
 version:
   filename: __init__.py
   template: genai-perf-templates/version_template
-  output_dir: ../genai-perf/genai_perf/
\ No newline at end of file
+  output_dir: ../genai-perf/genai_perf/