Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add doc generation script with doc templates #49

Merged
merged 4 commits into from
Aug 22, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
567 changes: 567 additions & 0 deletions templates/genai-perf-templates/README_template

Large diffs are not rendered by default.

250 changes: 250 additions & 0 deletions templates/genai-perf-templates/compare_template
Original file line number Diff line number Diff line change
@@ -0,0 +1,250 @@
<!--
Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
* Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
* Neither the name of NVIDIA CORPORATION nor the names of its
contributors may be used to endorse or promote products derived
from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS ``AS IS'' AND ANY
EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR
CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
-->

# GenAI-Perf Compare Subcommand

There are two approaches for the users to use the `compare` subcommand to create
plots across multiple runs. First is to directly pass the profile export files
with `--files` option

## Running initially with `--files` option

If the user does not have a YAML configuration file,
they can run the `compare` subcommand with the `--files` option to generate a
set of default plots as well as a pre-filled YAML config file for the plots.

```bash
genai-perf compare --files profile1.json profile2.json profile3.json
```

This will generate the default plots and compare across the three runs.
GenAI-Perf will also generate an initial YAML configuration file `config.yaml`
that is pre-filled with plot configurations as following:

```yaml
plot1:
title: Time to First Token
x_metric: ''
y_metric: time_to_first_tokens
x_label: Time to First Token (ms)
y_label: ''
width: 1200
height: 700
type: box
paths:
- profile1.json
- profile2.json
- profile3.json
output: compare
plot2:
title: Request Latency
x_metric: ''
y_metric: request_latencies
x_label: Request Latency (ms)
y_label: ''
width: 1200
height: 700
type: box
paths:
- profile1.json
- profile2.json
- profile3.json
output: compare
plot3:
title: Distribution of Input Sequence Lengths to Output Sequence Lengths
x_metric: input_sequence_lengths
y_metric: output_sequence_lengths
x_label: Input Sequence Length
y_label: Output Sequence Length
width: 1200
height: 450
type: heatmap
paths:
- profile1.json
- profile2.json
- profile3.json
output: compare
plot4:
title: Time to First Token vs Input Sequence Lengths
x_metric: input_sequence_lengths
y_metric: time_to_first_tokens
x_label: Input Sequence Length
y_label: Time to First Token (ms)
width: 1200
height: 700
type: scatter
paths:
- profile1.json
- profile2.json
- profile3.json
output: compare
plot5:
title: Token-to-Token Latency vs Output Token Position
x_metric: token_positions
y_metric: inter_token_latencies
x_label: Output Token Position
y_label: Token-to-Token Latency (ms)
width: 1200
height: 700
type: scatter
paths:
- profile1.json
- profile2.json
- profile3.json
output: compare
```

Once the user has the YAML configuration file,
they can repeat the process of editing the config file and running with
`--config` option to re-generate the plots iteratively.

```bash
# edit
vi config.yaml

# re-generate the plots
genai-perf compare --config config.yaml
```

## Running directly with `--config` option

If the user would like to create a custom plot (other than the default ones provided),
they can build their own YAML configuration file that contains the information
about the plots they would like to generate.
For instance, if the user would like to see how the inter token latencies change
by the number of output tokens, which is not part of the default plots,
they could add the following YAML block to the file:

```yaml
plot1:
title: Inter Token Latency vs Output Tokens
x_metric: num_output_tokens
y_metric: inter_token_latencies
x_label: Num Output Tokens
y_label: Avg ITL (ms)
width: 1200
height: 450
type: scatter
paths:
- <path-to-profile-export-file>
- <path-to-profile-export-file>
output: compare
```

After adding the lines, the user can run the following command to generate the
plots specified in the configuration file (in this case, `config.yaml`):

```bash
genai-perf compare --config config.yaml
```

The user can check the generated plots under the output directory:
```
compare/
├── inter_token_latency_vs_output_tokens.jpeg
└── ...
```

## YAML Schema

Here are more details about the YAML configuration file and its stricture.
The general YAML schema for the plot configuration looks as following:

```yaml
plot1:
title: [str]
x_metric: [str]
y_metric: [str]
x_label: [str]
y_label: [str]
width: [int]
height: [int]
type: [scatter,box,heatmap]
paths:
- [str]
- ...
output: [str]

plot2:
title: [str]
x_metric: [str]
y_metric: [str]
x_label: [str]
y_label: [str]
width: [int]
height: [int]
type: [scatter,box,heatmap]
paths:
- [str]
- ...
output: [str]

# add more plots
```

The user can add as many plots they would like to generate by adding the plot
blocks in the configuration file (they have a key pattern of `plot<#>`,
but that is not required and the user can set it to any arbitrary string).
For each plot block, the user can specify the following configurations:
- `title`: The title of the plot.
- `x_metric`: The name of the metric to be used on the x-axis.
- `y_metric`: The name of the metric to be used on the y-axis.
- `x_label`: The x-axis label (or description)
- `y_label`: The y-axis label (or description)
- `width`: The width of the entire plot
- `height`: The height of the entire plot
- `type`: The type of the plot. It must be one of the three: `scatter`, `box`,
or `heatmap`.
- `paths`: List of paths to the profile export files to compare.
- `output`: The path to the output directory to store all the plots and YAML
configuration file.

> [!Note]
> User *MUST* provide at least one valid path to the profile export file.



## Example Plots

Here are the list of sample plots that gets created by default from running the
`compare` subcommand:

### Distribution of Input Sequence Lengths to Output Sequence Lengths
<img src="assets/distribution_of_input_sequence_lengths_to_output_sequence_lengths.jpeg" width="800" height="300" />

### Request Latency Analysis
<img src="assets/request_latency.jpeg" width="800" height="300" />

### Time to First Token Analysis
<img src="assets/time_to_first_token.jpeg" width="800" height="300" />

### Time to First Token vs. Input Sequence Lengths
<img src="assets/time_to_first_token_vs_input_sequence_lengths.jpeg" width="800" height="300" />

### Token-to-Token Latency vs. Output Token Position
<img src="assets/token-to-token_latency_vs_output_token_position.jpeg" width="800" height="300" />
106 changes: 106 additions & 0 deletions templates/genai-perf-templates/embeddings_template
Original file line number Diff line number Diff line change
@@ -0,0 +1,106 @@
<!--
Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
* Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
* Neither the name of NVIDIA CORPORATION nor the names of its
contributors may be used to endorse or promote products derived
from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS ``AS IS'' AND ANY
EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR
CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
-->

# Profile Embeddings Models with GenAI-Perf

GenAI-Perf allows you to profile embedding models running on an
[OpenAI Embeddings API](https://platform.openai.com/docs/api-reference/embeddings)-compatible server.

## Create a Sample Embeddings Input File

To create a sample embeddings input file, use the following command:

```bash
echo '{"text": "What was the first car ever driven?"}
{"text": "Who served as the 5th President of the United States of America?"}
{"text": "Is the Sydney Opera House located in Australia?"}
{"text": "In what state did they film Shrek 2?"}' > embeddings.jsonl
```

This will generate a file named embeddings.jsonl with the following content:
```jsonl
{"text": "What was the first car ever driven?"}
{"text": "Who served as the 5th President of the United States of America?"}
{"text": "Is the Sydney Opera House located in Australia?"}
{"text": "In what state did they film Shrek 2?"}
```

## Start an OpenAI Embeddings-Compatible Server
To start an OpenAI embeddings-compatible server, run the following command:
```bash
docker run -it --net=host --rm --gpus=all vllm/vllm-openai:latest --model intfloat/e5-mistral-7b-instruct --dtype float16 --max-model-len 1024
```

## Run GenAI-Perf
To profile embeddings models using GenAI-Perf, use the following command:

```bash
genai-perf profile \
-m intfloat/e5-mistral-7b-instruct \
--service-kind openai \
--endpoint-type embeddings \
--batch-size 2 \
--input-file embeddings.jsonl
```

* `-m intfloat/e5-mistral-7b-instruct` is to specify what model you want to run
(`intfloat/e5-mistral-7b-instruct`)
* `--service-kind openai` is to specify that the server type is OpenAI-API
compatible
* `--endpoint-type embeddings` is to specify that the sent requests should be
formatted to follow the [embeddings
API](https://platform.openai.com/docs/api-reference/embeddings/create)
* `--batch-size 2` is to specify that each request will contain the inputs for 2
individual inferences, making a batch size of 2
* `--input-file embeddings.jsonl` is to specify the input data to be used for
inferencing

This will use default values for optional arguments. You can also pass in
additional arguments with the `--extra-inputs` [flag](../README.md#input-options).
For example, you could use this command:

```bash
genai-perf profile \
-m intfloat/e5-mistral-7b-instruct \
--service-kind openai \
--endpoint-type embeddings \
--extra-inputs user:sample_user
```

Example output:

```
Embeddings Metrics
┏━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┓
┃ Statistic ┃ avg ┃ min ┃ max ┃ p99 ┃ p90 ┃ p75 ┃
┡━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━┩
│ Request latency (ms) │ 42.21 │ 28.18 │ 318.61 │ 56.50 │ 49.21 │ 43.07 │
└──────────────────────┴───────┴───────┴────────┴───────┴───────┴───────┘
Request throughput (per sec): 23.63
```

Loading
Loading