Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update readme with new pip instructions and reorganize #230

Merged
merged 5 commits into from
Dec 24, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
41 changes: 15 additions & 26 deletions genai-perf/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -73,45 +73,34 @@ INSTALLATION

## Installation

The easiest way to install GenAI-Perf is through
[Triton Server SDK container](https://ngc.nvidia.com/catalog/containers/nvidia:tritonserver).
Install the latest release using the following command:
The easiest way to install GenAI-Perf is through pip.
### Install GenAI-Perf (Ubuntu 24.04, Python 3.10+)

```bash
export RELEASE="24.10"

docker run -it --net=host --gpus=all nvcr.io/nvidia/tritonserver:${RELEASE}-py3-sdk

# Check out genai_perf command inside the container:
genai-perf --help
pip install genai-perf
```
**NOTE**: you must already have CUDA 12 installed

<details>

<summary>Alternatively, to install from source:</summary>
<details>

Since GenAI-Perf depends on Perf Analyzer,
you'll need to install the Perf Analyzer binary:
<summary>Alternatively, to install the container:</summary>

### Install Perf Analyzer (Ubuntu, Python 3.10+)
[Triton Server SDK container](https://ngc.nvidia.com/catalog/containers/nvidia:tritonserver)

**NOTE**: you must already have CUDA 12 installed
(checkout the [CUDA installation guide](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html)).
Pull the latest release using the following command:

```bash
pip install tritonclient
export RELEASE="24.12"

sudo apt update && sudo apt install -y --no-install-recommends libb64-0d
```

You can also build Perf Analyzer [from source](../docs/install.md#build-from-source) as well.

### Install GenAI-Perf from source
docker run -it --net=host --gpus=all nvcr.io/nvidia/tritonserver:${RELEASE}-py3-sdk

```bash
pip install git+https://github.com/triton-inference-server/perf_analyzer.git#subdirectory=genai-perf
# Validate the genai-perf command works inside the container:
genai-perf --help
```

You can also build Perf Analyzer [from source](../docs/install.md#build-from-source) to use alongside GenAI-Perf as well.
debermudez marked this conversation as resolved.
Show resolved Hide resolved

</details>

</br>
Expand Down Expand Up @@ -142,7 +131,7 @@ docker run -ti \
--shm-size=1g --ulimit memlock=-1 \
-v /tmp:/tmp \
-v ${HOME}/.cache/huggingface:/root/.cache/huggingface \
nvcr.io/nvidia/tritonserver:24.10-trtllm-python-py3
nvcr.io/nvidia/tritonserver:24.12-trtllm-python-py3

# Install the Triton CLI
pip install git+https://github.com/triton-inference-server/[email protected]
Expand Down
6 changes: 3 additions & 3 deletions genai-perf/docs/lora.md
Original file line number Diff line number Diff line change
Expand Up @@ -90,7 +90,7 @@ docker run -it --net=host --rm --gpus=all \
Run GenAI-Perf from the Triton Inference Server SDK container:

```bash
export RELEASE="24.10"
export RELEASE="24.12"

docker run -it --net=host --gpus=all nvcr.io/nvidia/tritonserver:${RELEASE}-py3-sdk

Expand Down Expand Up @@ -149,7 +149,7 @@ docker run \
Run GenAI-Perf from the Triton Inference Server SDK container:

```bash
export RELEASE="24.10"
export RELEASE="24.12"

docker run -it --net=host --gpus=all nvcr.io/nvidia/tritonserver:${RELEASE}-py3-sdk

Expand Down Expand Up @@ -207,7 +207,7 @@ docker run \
Run GenAI-Perf from the Triton Inference Server SDK container:

```bash
export RELEASE="24.10"
export RELEASE="24.12"

docker run -it --net=host --gpus=all nvcr.io/nvidia/tritonserver:${RELEASE}-py3-sdk

Expand Down
50 changes: 25 additions & 25 deletions templates/genai-perf-templates/README_template
Original file line number Diff line number Diff line change
Expand Up @@ -73,43 +73,34 @@ INSTALLATION

## Installation

The easiest way to install GenAI-Perf is through
[Triton Server SDK container](https://ngc.nvidia.com/catalog/containers/nvidia:tritonserver).
Install the latest release using the following command:
The easiest way to install GenAI-Perf is through pip.
### Install GenAI-Perf (Ubuntu 24.04, Python 3.10+)

```bash
export RELEASE="{{ release }}"

docker run -it --net=host --gpus=all nvcr.io/nvidia/tritonserver:${RELEASE}-py3-sdk

# Check out genai_perf command inside the container:
genai-perf --help
pip install genai-perf
```
**NOTE**: you must already have CUDA 12 installed


<details>

<summary>Alternatively, to install from source:</summary>
<summary>Alternatively, to install the container:</summary>

Since GenAI-Perf depends on Perf Analyzer,
you'll need to install the Perf Analyzer binary:
[Triton Server SDK container](https://ngc.nvidia.com/catalog/containers/nvidia:tritonserver)

### Install Perf Analyzer (Ubuntu, Python 3.10+)

**NOTE**: you must already have CUDA 12 installed
(checkout the [CUDA installation guide](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html)).
Pull the latest release using the following command:

```bash
pip install tritonclient
```

You can also build Perf Analyzer [from source](../docs/install.md#build-from-source) as well.
export RELEASE="{{ release }}"

### Install GenAI-Perf from source
docker run -it --net=host --gpus=all nvcr.io/nvidia/tritonserver:${RELEASE}-py3-sdk

```bash
pip install git+https://github.com/triton-inference-server/perf_analyzer.git#subdirectory=genai-perf
# Validate the genai-perf command works inside the container:
genai-perf --help
```

You can also build Perf Analyzer [from source](../docs/install.md#build-from-source) to use alongside GenAI-Perf as well.

</details>

</br>
Expand Down Expand Up @@ -182,6 +173,15 @@ See [Tutorial](docs/tutorial.md) for additional examples.

</br>

<!--
=====================
Analyze Subcommand
====================
-->
## Analyze
GenAI-Perf can be used to sweep through PA or GenAI-Perf stimulus allowing the user to profile multiple scenarios with a single command.
See [Analyze](docs/analyze.md) for details on how this subcommand can be utilized.

<!--
======================
VISUALIZATION
Expand Down Expand Up @@ -335,7 +335,7 @@ key authentication. To do so, you must add your API key directly in the command.
Add the following flag to your command.

```bash
-h "Authorization: Bearer ${API_KEY}" -H "Accept: text/event-stream"
-H "Authorization: Bearer ${API_KEY}" -H "Accept: text/event-stream"
```

</br>
Expand Down Expand Up @@ -456,7 +456,7 @@ Alternatively, a string representing a json formatted dict can be provided.
(default: `None`)

##### `--header <str>`
##### `--h <str>`
##### `--H <str>`
Add a custom header to the requests. Headers must be specified as
'Header:Value'. You can repeat this flag for multiple headers.
(default: `None`)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -164,3 +164,4 @@ To do so, create a test file in the tests directory.
You can reference existing converter tests named `test_**_converter.py`.
To run the test, run `pytest tests/test_new_converter.py`, replacing the
file name with the name of the file you created.

1 change: 1 addition & 0 deletions templates/genai-perf-templates/embeddings_template
Original file line number Diff line number Diff line change
Expand Up @@ -122,3 +122,4 @@ Example output:
└──────────────────────┴───────┴───────┴────────┴───────┴───────┴───────┘
Request throughput (per sec): 23.63
```

31 changes: 1 addition & 30 deletions templates/genai-perf-templates/files_template
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ genai-perf/

## File Types
Within the artifacts and docs directories, several file types are generated,
including .gzip, .csv, .json, .html, and .jpeg. Below is a detailed
including .csv, .json, .html, and .jpeg. Below is a detailed
explanation of each file and its purpose.

### Artifacts Directory
Expand All @@ -55,18 +55,6 @@ explanation of each file and its purpose.

The data subdirectory contains the raw and processed performance data files.

##### GZIP Files

- all_data.gzip: Aggregated performance data from all collected metrics.
- input_sequence_lengths_vs_output_sequence_lengths.gzip: This contains data on
the input sequence lengths versus the output sequence lengths for each request.
- request_latency.gzip: This contains the latency for each request.
- time_to_first_token.gzip: This contains the time to first token for each request.
- token_to_token_vs_output_position.gzip: This contains the time from one token
generation to the next versus the position of the output token for each token.
- ttft_vs_input_sequence_lengths.gzip: This contains the time to first token
versus the input sequence length for each request.

##### JSON Files

- inputs.json: This contains the input prompts provided to the LLM during testing.
Expand Down Expand Up @@ -101,23 +89,6 @@ versus the input sequence lengths.
To use the generated files, navigate to the artifacts/data directory. Then,
the next steps depend on the file format you wish to work with.

### GZIP Files

The GZIP files contain Parquet files with calculated data, which can be read
with Pandas in Python. For example, you can create a dataframe with these files:

```
import pandas
df = pandas.read_partquet(path_to_file)`
```

You can then use Pandas to work with the data.

```
print(df.head()) # See the first few rows of the data.
print(df.describe()) # Get summary statistics for the data
```

### CSV and JSON Files
Open .csv and .json files with spreadsheet or JSON parsing tools for structured
data analysis. These can also be read via a text editor, like Vim.
Expand Down
1 change: 1 addition & 0 deletions templates/genai-perf-templates/rankings_template
Original file line number Diff line number Diff line change
Expand Up @@ -119,3 +119,4 @@ Example output:
└──────────────────────┴──────┴──────┴───────┴───────┴──────┴──────┘
Request throughput (per sec): 180.11
```

1 change: 1 addition & 0 deletions templates/genai-perf-templates/tutorial_template
Original file line number Diff line number Diff line change
Expand Up @@ -211,3 +211,4 @@ Example output:
│ Request throughput (per sec) │ 2.28 │ N/A │ N/A │ N/A │ N/A │ N/A │
└───────────────────────────────────┴────────┴────────┴────────┴────────┴────────┴────────┘
```

6 changes: 3 additions & 3 deletions templates/template_vars.yaml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
General:
release: 24.10
release: 24.12
triton_cli_version: 0.0.11
genai_perf_version: 0.0.8dev
genai_perf_version: 0.0.9dev

README:
filename: README.md
Expand All @@ -15,7 +15,7 @@ compare:

customizable_frontends:
filename: customizable_frontends.md
template: genai-perf-templates/customizable_fronetnds_template
template: genai-perf-templates/customizable_frontends_template
output_dir: ../genai-perf/docs/

embeddings:
Expand Down
Loading