Skip to content

Commit

Permalink
Update readme with new pip instructions and reorganize (#230)
Browse files Browse the repository at this point in the history
* Update readme with new pip instructions and reorganize

* Fix spacing and link

* Migrate to use the pypi.org release

* Updated headers and wording around installation

* Update templates for docs
  • Loading branch information
debermudez authored Dec 24, 2024
1 parent bf54dfb commit ea2024d
Show file tree
Hide file tree
Showing 9 changed files with 51 additions and 87 deletions.
41 changes: 15 additions & 26 deletions genai-perf/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -73,45 +73,34 @@ INSTALLATION

## Installation

The easiest way to install GenAI-Perf is through
[Triton Server SDK container](https://ngc.nvidia.com/catalog/containers/nvidia:tritonserver).
Install the latest release using the following command:
The easiest way to install GenAI-Perf is through pip.
### Install GenAI-Perf (Ubuntu 24.04, Python 3.10+)

```bash
export RELEASE="24.10"

docker run -it --net=host --gpus=all nvcr.io/nvidia/tritonserver:${RELEASE}-py3-sdk

# Check out genai_perf command inside the container:
genai-perf --help
pip install genai-perf
```
**NOTE**: you must already have CUDA 12 installed

<details>

<summary>Alternatively, to install from source:</summary>
<details>

Since GenAI-Perf depends on Perf Analyzer,
you'll need to install the Perf Analyzer binary:
<summary>Alternatively, to install the container:</summary>

### Install Perf Analyzer (Ubuntu, Python 3.10+)
[Triton Server SDK container](https://ngc.nvidia.com/catalog/containers/nvidia:tritonserver)

**NOTE**: you must already have CUDA 12 installed
(checkout the [CUDA installation guide](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html)).
Pull the latest release using the following command:

```bash
pip install tritonclient
export RELEASE="24.12"

sudo apt update && sudo apt install -y --no-install-recommends libb64-0d
```

You can also build Perf Analyzer [from source](../docs/install.md#build-from-source) as well.

### Install GenAI-Perf from source
docker run -it --net=host --gpus=all nvcr.io/nvidia/tritonserver:${RELEASE}-py3-sdk

```bash
pip install git+https://github.com/triton-inference-server/perf_analyzer.git#subdirectory=genai-perf
# Validate the genai-perf command works inside the container:
genai-perf --help
```

You can also build Perf Analyzer [from source](../docs/install.md#build-from-source) to use alongside GenAI-Perf as well.

</details>

</br>
Expand Down Expand Up @@ -142,7 +131,7 @@ docker run -ti \
--shm-size=1g --ulimit memlock=-1 \
-v /tmp:/tmp \
-v ${HOME}/.cache/huggingface:/root/.cache/huggingface \
nvcr.io/nvidia/tritonserver:24.10-trtllm-python-py3
nvcr.io/nvidia/tritonserver:24.12-trtllm-python-py3

# Install the Triton CLI
pip install git+https://github.com/triton-inference-server/[email protected]
Expand Down
6 changes: 3 additions & 3 deletions genai-perf/docs/lora.md
Original file line number Diff line number Diff line change
Expand Up @@ -90,7 +90,7 @@ docker run -it --net=host --rm --gpus=all \
Run GenAI-Perf from the Triton Inference Server SDK container:

```bash
export RELEASE="24.10"
export RELEASE="24.12"

docker run -it --net=host --gpus=all nvcr.io/nvidia/tritonserver:${RELEASE}-py3-sdk

Expand Down Expand Up @@ -149,7 +149,7 @@ docker run \
Run GenAI-Perf from the Triton Inference Server SDK container:

```bash
export RELEASE="24.10"
export RELEASE="24.12"

docker run -it --net=host --gpus=all nvcr.io/nvidia/tritonserver:${RELEASE}-py3-sdk

Expand Down Expand Up @@ -207,7 +207,7 @@ docker run \
Run GenAI-Perf from the Triton Inference Server SDK container:

```bash
export RELEASE="24.10"
export RELEASE="24.12"

docker run -it --net=host --gpus=all nvcr.io/nvidia/tritonserver:${RELEASE}-py3-sdk

Expand Down
50 changes: 25 additions & 25 deletions templates/genai-perf-templates/README_template
Original file line number Diff line number Diff line change
Expand Up @@ -73,43 +73,34 @@ INSTALLATION

## Installation

The easiest way to install GenAI-Perf is through
[Triton Server SDK container](https://ngc.nvidia.com/catalog/containers/nvidia:tritonserver).
Install the latest release using the following command:
The easiest way to install GenAI-Perf is through pip.
### Install GenAI-Perf (Ubuntu 24.04, Python 3.10+)

```bash
export RELEASE="{{ release }}"

docker run -it --net=host --gpus=all nvcr.io/nvidia/tritonserver:${RELEASE}-py3-sdk

# Check out genai_perf command inside the container:
genai-perf --help
pip install genai-perf
```
**NOTE**: you must already have CUDA 12 installed


<details>

<summary>Alternatively, to install from source:</summary>
<summary>Alternatively, to install the container:</summary>

Since GenAI-Perf depends on Perf Analyzer,
you'll need to install the Perf Analyzer binary:
[Triton Server SDK container](https://ngc.nvidia.com/catalog/containers/nvidia:tritonserver)

### Install Perf Analyzer (Ubuntu, Python 3.10+)

**NOTE**: you must already have CUDA 12 installed
(checkout the [CUDA installation guide](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html)).
Pull the latest release using the following command:

```bash
pip install tritonclient
```

You can also build Perf Analyzer [from source](../docs/install.md#build-from-source) as well.
export RELEASE="{{ release }}"

### Install GenAI-Perf from source
docker run -it --net=host --gpus=all nvcr.io/nvidia/tritonserver:${RELEASE}-py3-sdk

```bash
pip install git+https://github.com/triton-inference-server/perf_analyzer.git#subdirectory=genai-perf
# Validate the genai-perf command works inside the container:
genai-perf --help
```

You can also build Perf Analyzer [from source](../docs/install.md#build-from-source) to use alongside GenAI-Perf as well.

</details>

</br>
Expand Down Expand Up @@ -182,6 +173,15 @@ See [Tutorial](docs/tutorial.md) for additional examples.

</br>

<!--
=====================
Analyze Subcommand
====================
-->
## Analyze
GenAI-Perf can be used to sweep through PA or GenAI-Perf stimulus allowing the user to profile multiple scenarios with a single command.
See [Analyze](docs/analyze.md) for details on how this subcommand can be utilized.

<!--
======================
VISUALIZATION
Expand Down Expand Up @@ -335,7 +335,7 @@ key authentication. To do so, you must add your API key directly in the command.
Add the following flag to your command.

```bash
-h "Authorization: Bearer ${API_KEY}" -H "Accept: text/event-stream"
-H "Authorization: Bearer ${API_KEY}" -H "Accept: text/event-stream"
```

</br>
Expand Down Expand Up @@ -456,7 +456,7 @@ Alternatively, a string representing a json formatted dict can be provided.
(default: `None`)

##### `--header <str>`
##### `--h <str>`
##### `--H <str>`
Add a custom header to the requests. Headers must be specified as
'Header:Value'. You can repeat this flag for multiple headers.
(default: `None`)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -164,3 +164,4 @@ To do so, create a test file in the tests directory.
You can reference existing converter tests named `test_**_converter.py`.
To run the test, run `pytest tests/test_new_converter.py`, replacing the
file name with the name of the file you created.

1 change: 1 addition & 0 deletions templates/genai-perf-templates/embeddings_template
Original file line number Diff line number Diff line change
Expand Up @@ -122,3 +122,4 @@ Example output:
└──────────────────────┴───────┴───────┴────────┴───────┴───────┴───────┘
Request throughput (per sec): 23.63
```

31 changes: 1 addition & 30 deletions templates/genai-perf-templates/files_template
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ genai-perf/

## File Types
Within the artifacts and docs directories, several file types are generated,
including .gzip, .csv, .json, .html, and .jpeg. Below is a detailed
including .csv, .json, .html, and .jpeg. Below is a detailed
explanation of each file and its purpose.

### Artifacts Directory
Expand All @@ -55,18 +55,6 @@ explanation of each file and its purpose.

The data subdirectory contains the raw and processed performance data files.

##### GZIP Files

- all_data.gzip: Aggregated performance data from all collected metrics.
- input_sequence_lengths_vs_output_sequence_lengths.gzip: This contains data on
the input sequence lengths versus the output sequence lengths for each request.
- request_latency.gzip: This contains the latency for each request.
- time_to_first_token.gzip: This contains the time to first token for each request.
- token_to_token_vs_output_position.gzip: This contains the time from one token
generation to the next versus the position of the output token for each token.
- ttft_vs_input_sequence_lengths.gzip: This contains the time to first token
versus the input sequence length for each request.

##### JSON Files

- inputs.json: This contains the input prompts provided to the LLM during testing.
Expand Down Expand Up @@ -101,23 +89,6 @@ versus the input sequence lengths.
To use the generated files, navigate to the artifacts/data directory. Then,
the next steps depend on the file format you wish to work with.

### GZIP Files

The GZIP files contain Parquet files with calculated data, which can be read
with Pandas in Python. For example, you can create a dataframe with these files:

```
import pandas
df = pandas.read_partquet(path_to_file)`
```

You can then use Pandas to work with the data.

```
print(df.head()) # See the first few rows of the data.
print(df.describe()) # Get summary statistics for the data
```

### CSV and JSON Files
Open .csv and .json files with spreadsheet or JSON parsing tools for structured
data analysis. These can also be read via a text editor, like Vim.
Expand Down
1 change: 1 addition & 0 deletions templates/genai-perf-templates/rankings_template
Original file line number Diff line number Diff line change
Expand Up @@ -119,3 +119,4 @@ Example output:
└──────────────────────┴──────┴──────┴───────┴───────┴──────┴──────┘
Request throughput (per sec): 180.11
```

1 change: 1 addition & 0 deletions templates/genai-perf-templates/tutorial_template
Original file line number Diff line number Diff line change
Expand Up @@ -211,3 +211,4 @@ Example output:
│ Request throughput (per sec) │ 2.28 │ N/A │ N/A │ N/A │ N/A │ N/A │
└───────────────────────────────────┴────────┴────────┴────────┴────────┴────────┴────────┘
```

6 changes: 3 additions & 3 deletions templates/template_vars.yaml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
General:
release: 24.10
release: 24.12
triton_cli_version: 0.0.11
genai_perf_version: 0.0.8dev
genai_perf_version: 0.0.9dev

README:
filename: README.md
Expand All @@ -15,7 +15,7 @@ compare:

customizable_frontends:
filename: customizable_frontends.md
template: genai-perf-templates/customizable_fronetnds_template
template: genai-perf-templates/customizable_frontends_template
output_dir: ../genai-perf/docs/

embeddings:
Expand Down

0 comments on commit ea2024d

Please sign in to comment.