Skip to content

Commit

Permalink
Add args to tutorial
Browse files Browse the repository at this point in the history
  • Loading branch information
dyastremsky committed Dec 7, 2024
1 parent 977e37c commit 558a04d
Show file tree
Hide file tree
Showing 2 changed files with 30 additions and 9 deletions.
19 changes: 15 additions & 4 deletions genai-perf/docs/tutorial.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,14 +55,17 @@ Run GenAI-Perf inside the Triton Inference Server SDK container:
```bash
genai-perf profile \
-m gpt2 \
--tokenizer gpt2 \
--service-kind triton \
--backend tensorrtllm \
--synthetic-input-tokens-mean 200 \
--synthetic-input-tokens-stddev 0 \
--output-tokens-mean 100 \
--output-tokens-stddev 0 \
--output-tokens-mean-deterministic \
--streaming
--streaming \
--request-count 50 \
--warmup-request-count 10
```

Example output:
Expand Down Expand Up @@ -94,14 +97,17 @@ Run GenAI-Perf inside the Triton Inference Server SDK container:
```bash
genai-perf profile \
-m gpt2 \
--tokenizer gpt2 \
--service-kind triton \
--backend vllm \
--synthetic-input-tokens-mean 200 \
--synthetic-input-tokens-stddev 0 \
--output-tokens-mean 100 \
--output-tokens-stddev 0 \
--output-tokens-mean-deterministic \
--streaming
--streaming \
--request-count 50 \
--warmup-request-count 10
```

Example output:
Expand Down Expand Up @@ -136,14 +142,16 @@ Run GenAI-Perf inside the Triton Inference Server SDK container:
```bash
genai-perf profile \
-m HuggingFaceH4/zephyr-7b-beta \
--tokenizer HuggingFaceH4/zephyr-7b-beta \
--service-kind openai \
--endpoint-type chat \
--synthetic-input-tokens-mean 200 \
--synthetic-input-tokens-stddev 0 \
--output-tokens-mean 100 \
--output-tokens-stddev 0 \
--streaming \
--tokenizer HuggingFaceH4/zephyr-7b-beta
--request-count 50 \
--warmup-request-count 10
```

Example output:
Expand Down Expand Up @@ -178,12 +186,15 @@ Run GenAI-Perf inside the Triton Inference Server SDK container:
```bash
genai-perf profile \
-m gpt2 \
--tokenizer gpt2 \
--service-kind openai \
--endpoint-type completions \
--synthetic-input-tokens-mean 200 \
--synthetic-input-tokens-stddev 0 \
--output-tokens-mean 100 \
--output-tokens-stddev 0
--output-tokens-stddev 0 \
--request-count 50 \
--warmup-request-count 10
```

Example output:
Expand Down
20 changes: 15 additions & 5 deletions templates/genai-perf-templates/tutorial_template
Original file line number Diff line number Diff line change
Expand Up @@ -55,14 +55,17 @@ Run GenAI-Perf inside the Triton Inference Server SDK container:
```bash
genai-perf profile \
-m gpt2 \
--tokenizer gpt2 \
--service-kind triton \
--backend tensorrtllm \
--synthetic-input-tokens-mean 200 \
--synthetic-input-tokens-stddev 0 \
--output-tokens-mean 100 \
--output-tokens-stddev 0 \
--output-tokens-mean-deterministic \
--streaming
--streaming \
--request-count 50 \
--warmup-request-count 10
```

Example output:
Expand Down Expand Up @@ -94,14 +97,17 @@ Run GenAI-Perf inside the Triton Inference Server SDK container:
```bash
genai-perf profile \
-m gpt2 \
--tokenizer gpt2 \
--service-kind triton \
--backend vllm \
--synthetic-input-tokens-mean 200 \
--synthetic-input-tokens-stddev 0 \
--output-tokens-mean 100 \
--output-tokens-stddev 0 \
--output-tokens-mean-deterministic \
--streaming
--streaming \
--request-count 50 \
--warmup-request-count 10
```

Example output:
Expand Down Expand Up @@ -136,14 +142,16 @@ Run GenAI-Perf inside the Triton Inference Server SDK container:
```bash
genai-perf profile \
-m HuggingFaceH4/zephyr-7b-beta \
--tokenizer HuggingFaceH4/zephyr-7b-beta \
--service-kind openai \
--endpoint-type chat \
--synthetic-input-tokens-mean 200 \
--synthetic-input-tokens-stddev 0 \
--output-tokens-mean 100 \
--output-tokens-stddev 0 \
--streaming \
--tokenizer HuggingFaceH4/zephyr-7b-beta
--request-count 50 \
--warmup-request-count 10
```

Example output:
Expand Down Expand Up @@ -178,12 +186,15 @@ Run GenAI-Perf inside the Triton Inference Server SDK container:
```bash
genai-perf profile \
-m gpt2 \
--tokenizer gpt2 \
--service-kind openai \
--endpoint-type completions \
--synthetic-input-tokens-mean 200 \
--synthetic-input-tokens-stddev 0 \
--output-tokens-mean 100 \
--output-tokens-stddev 0
--output-tokens-stddev 0 \
--request-count 50 \
--warmup-request-count 10
```

Example output:
Expand All @@ -200,4 +211,3 @@ Example output:
│ Request throughput (per sec) │ 2.28 │ N/A │ N/A │ N/A │ N/A │ N/A │
└───────────────────────────────────┴────────┴────────┴────────┴────────┴────────┴────────┘
```

0 comments on commit 558a04d

Please sign in to comment.