Update readme with new pip instructions and reorganize (#230)

* Update readme with new pip instructions and reorganize * Fix spacing and link * Migrate to use the pypi.org release * Updated headers and wording around installation * Update templates for docs
triton-inference-server · Dec 24, 2024 · ea2024d · ea2024d
1 parent bf54dfb
commit ea2024d
Show file tree

Hide file tree

Showing 9 changed files with 51 additions and 87 deletions.
diff --git a/genai-perf/README.md b/genai-perf/README.md
@@ -73,45 +73,34 @@ INSTALLATION
 
 ## Installation
 
-The easiest way to install GenAI-Perf is through
-[Triton Server SDK container](https://ngc.nvidia.com/catalog/containers/nvidia:tritonserver).
-Install the latest release using the following command:
+The easiest way to install GenAI-Perf is through pip.
+### Install GenAI-Perf (Ubuntu 24.04, Python 3.10+)
 
 ```bash
-export RELEASE="24.10"
-
-docker run -it --net=host --gpus=all  nvcr.io/nvidia/tritonserver:${RELEASE}-py3-sdk
-
-# Check out genai_perf command inside the container:
-genai-perf --help
+pip install genai-perf
 ```
+**NOTE**: you must already have CUDA 12 installed
 
-<details>
 
-<summary>Alternatively, to install from source:</summary>
+<details>
 
-Since GenAI-Perf depends on Perf Analyzer,
-you'll need to install the Perf Analyzer binary:
+<summary>Alternatively, to install the container:</summary>
 
-### Install Perf Analyzer (Ubuntu, Python 3.10+)
+[Triton Server SDK container](https://ngc.nvidia.com/catalog/containers/nvidia:tritonserver)
 
-**NOTE**: you must already have CUDA 12 installed
-(checkout the [CUDA installation guide](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html)).
+Pull the latest release using the following command:
 
 ```bash
-pip install tritonclient
+export RELEASE="24.12"
 
-sudo apt update && sudo apt install -y --no-install-recommends libb64-0d
-```
-
-You can also build Perf Analyzer [from source](../docs/install.md#build-from-source) as well.
-
-### Install GenAI-Perf from source
+docker run -it --net=host --gpus=all  nvcr.io/nvidia/tritonserver:${RELEASE}-py3-sdk
 
-```bash
-pip install git+https://github.com/triton-inference-server/perf_analyzer.git#subdirectory=genai-perf
+# Validate the genai-perf command works inside the container:
+genai-perf --help
 ```
 
+You can also build Perf Analyzer [from source](../docs/install.md#build-from-source) to use alongside GenAI-Perf as well.
+
 </details>
 
 </br>
@@ -142,7 +131,7 @@ docker run -ti \
     --shm-size=1g --ulimit memlock=-1 \
     -v /tmp:/tmp \
     -v ${HOME}/.cache/huggingface:/root/.cache/huggingface \
-    nvcr.io/nvidia/tritonserver:24.10-trtllm-python-py3
+    nvcr.io/nvidia/tritonserver:24.12-trtllm-python-py3
 
 # Install the Triton CLI
 pip install git+https://github.com/triton-inference-server/[email protected]

diff --git a/genai-perf/docs/lora.md b/genai-perf/docs/lora.md
@@ -90,7 +90,7 @@ docker run -it --net=host --rm --gpus=all \
 Run GenAI-Perf from the Triton Inference Server SDK container:
 
 ```bash
-export RELEASE="24.10"
+export RELEASE="24.12"
 
 docker run -it --net=host --gpus=all nvcr.io/nvidia/tritonserver:${RELEASE}-py3-sdk
 
@@ -149,7 +149,7 @@ docker run \
 Run GenAI-Perf from the Triton Inference Server SDK container:
 
 ```bash
-export RELEASE="24.10"
+export RELEASE="24.12"
 
 docker run -it --net=host --gpus=all nvcr.io/nvidia/tritonserver:${RELEASE}-py3-sdk
 
@@ -207,7 +207,7 @@ docker run \
 Run GenAI-Perf from the Triton Inference Server SDK container:
 
 ```bash
-export RELEASE="24.10"
+export RELEASE="24.12"
 
 docker run -it --net=host --gpus=all nvcr.io/nvidia/tritonserver:${RELEASE}-py3-sdk
 

diff --git a/templates/genai-perf-templates/README_template b/templates/genai-perf-templates/README_template
@@ -73,43 +73,34 @@ INSTALLATION
 
 ## Installation
 
-The easiest way to install GenAI-Perf is through
-[Triton Server SDK container](https://ngc.nvidia.com/catalog/containers/nvidia:tritonserver).
-Install the latest release using the following command:
+The easiest way to install GenAI-Perf is through pip.
+### Install GenAI-Perf (Ubuntu 24.04, Python 3.10+)
 
 ```bash
-export RELEASE="{{ release }}"
-
-docker run -it --net=host --gpus=all  nvcr.io/nvidia/tritonserver:${RELEASE}-py3-sdk
-
-# Check out genai_perf command inside the container:
-genai-perf --help
+pip install genai-perf
 ```
+**NOTE**: you must already have CUDA 12 installed
+
 
 <details>
 
-<summary>Alternatively, to install from source:</summary>
+<summary>Alternatively, to install the container:</summary>
 
-Since GenAI-Perf depends on Perf Analyzer,
-you'll need to install the Perf Analyzer binary:
+[Triton Server SDK container](https://ngc.nvidia.com/catalog/containers/nvidia:tritonserver)
 
-### Install Perf Analyzer (Ubuntu, Python 3.10+)
-
-**NOTE**: you must already have CUDA 12 installed
-(checkout the [CUDA installation guide](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html)).
+Pull the latest release using the following command:
 
 ```bash
-pip install tritonclient
-```
-
-You can also build Perf Analyzer [from source](../docs/install.md#build-from-source) as well.
+export RELEASE="{{ release }}"
 
-### Install GenAI-Perf from source
+docker run -it --net=host --gpus=all  nvcr.io/nvidia/tritonserver:${RELEASE}-py3-sdk
 
-```bash
-pip install git+https://github.com/triton-inference-server/perf_analyzer.git#subdirectory=genai-perf
+# Validate the genai-perf command works inside the container:
+genai-perf --help
 ```
 
+You can also build Perf Analyzer [from source](../docs/install.md#build-from-source) to use alongside GenAI-Perf as well.
+
 </details>
 
 </br>
@@ -182,6 +173,15 @@ See [Tutorial](docs/tutorial.md) for additional examples.
 
 </br>
 
+<!--
+=====================
+Analyze Subcommand
+====================
+-->
+## Analyze
+GenAI-Perf can be used to sweep through PA or GenAI-Perf stimulus allowing the user to profile multiple scenarios with a single command.
+See [Analyze](docs/analyze.md) for details on how this subcommand can be utilized.
+
 <!--
 ======================
 VISUALIZATION
@@ -335,7 +335,7 @@ key authentication. To do so, you must add your API key directly in the command.
 Add the following flag to your command.
 
 ```bash
--h "Authorization: Bearer ${API_KEY}" -H "Accept: text/event-stream"
+-H "Authorization: Bearer ${API_KEY}" -H "Accept: text/event-stream"
 ```
 
 </br>
@@ -456,7 +456,7 @@ Alternatively, a string representing a json formatted dict can be provided.
 (default: `None`)
 
 ##### `--header <str>`
-##### `--h <str>`
+##### `--H <str>`
 Add a custom header to the requests. Headers must be specified as
 'Header:Value'. You can repeat this flag for multiple headers.
 (default: `None`)

diff --git a/templates/genai-perf-templates/customizable_frontends_template b/templates/genai-perf-templates/customizable_frontends_template
@@ -164,3 +164,4 @@ To do so, create a test file in the tests directory.
 You can reference existing converter tests named `test_**_converter.py`.
 To run the test, run `pytest tests/test_new_converter.py`, replacing the
 file name with the name of the file you created.
+
diff --git a/templates/genai-perf-templates/embeddings_template b/templates/genai-perf-templates/embeddings_template
@@ -122,3 +122,4 @@ Example output:
 └──────────────────────┴───────┴───────┴────────┴───────┴───────┴───────┘
 Request throughput (per sec): 23.63
 ```
+
diff --git a/templates/genai-perf-templates/files_template b/templates/genai-perf-templates/files_template
@@ -46,7 +46,7 @@ genai-perf/
 
 ## File Types
 Within the artifacts and docs directories, several file types are generated,
-including .gzip, .csv, .json, .html, and .jpeg. Below is a detailed
+including .csv, .json, .html, and .jpeg. Below is a detailed
 explanation of each file and its purpose.
 
 ### Artifacts Directory
@@ -55,18 +55,6 @@ explanation of each file and its purpose.
 
 The data subdirectory contains the raw and processed performance data files.
 
-##### GZIP Files
-
-- all_data.gzip: Aggregated performance data from all collected metrics.
-- input_sequence_lengths_vs_output_sequence_lengths.gzip: This contains data on
-the input sequence lengths versus the output sequence lengths for each request.
-- request_latency.gzip: This contains the latency for each request.
-- time_to_first_token.gzip: This contains the time to first token for each request.
-- token_to_token_vs_output_position.gzip: This contains the time from one token
-generation to the next versus the position of the output token for each token.
-- ttft_vs_input_sequence_lengths.gzip: This contains the time to first token
-versus the input sequence length for each request.
-
 ##### JSON Files
 
 - inputs.json: This contains the input prompts provided to the LLM during testing.
@@ -101,23 +89,6 @@ versus the input sequence lengths.
 To use the generated files, navigate to the artifacts/data directory. Then,
 the next steps depend on the file format you wish to work with.
 
-### GZIP Files
-
-The GZIP files contain Parquet files with calculated data, which can be read
-with Pandas in Python. For example, you can create a dataframe with these files:
-
-```
-import pandas
-df = pandas.read_partquet(path_to_file)`
-```
-
-You can then use Pandas to work with the data.
-
-```
-print(df.head())     # See the first few rows of the data.
-print(df.describe()) # Get summary statistics for the data
-```
-
 ### CSV and JSON Files
 Open .csv and .json files with spreadsheet or JSON parsing tools for structured
 data analysis. These can also be read via a text editor, like Vim.

diff --git a/templates/genai-perf-templates/rankings_template b/templates/genai-perf-templates/rankings_template
@@ -119,3 +119,4 @@ Example output:
 └──────────────────────┴──────┴──────┴───────┴───────┴──────┴──────┘
 Request throughput (per sec): 180.11
 ```
+
diff --git a/templates/genai-perf-templates/tutorial_template b/templates/genai-perf-templates/tutorial_template
@@ -211,3 +211,4 @@ Example output:
 │      Request throughput (per sec) │   2.28 │    N/A │    N/A │    N/A │    N/A │    N/A │
 └───────────────────────────────────┴────────┴────────┴────────┴────────┴────────┴────────┘
 ```
+
diff --git a/templates/template_vars.yaml b/templates/template_vars.yaml
@@ -1,7 +1,7 @@
 General:
-  release: 24.10
+  release: 24.12
   triton_cli_version: 0.0.11
-  genai_perf_version: 0.0.8dev
+  genai_perf_version: 0.0.9dev
 
 README:
   filename: README.md
@@ -15,7 +15,7 @@ compare:
 
 customizable_frontends:
   filename: customizable_frontends.md
-  template: genai-perf-templates/customizable_fronetnds_template
+  template: genai-perf-templates/customizable_frontends_template
   output_dir: ../genai-perf/docs/
 
 embeddings:
Original file line number	Diff line number	Diff line change
Expand Up		@@ -164,3 +164,4 @@ To do so, create a test file in the tests directory.
		You can reference existing converter tests named `test_**_converter.py`.
		To run the test, run `pytest tests/test_new_converter.py`, replacing the
		file name with the name of the file you created.
Original file line number	Diff line number	Diff line change
Expand Up		@@ -122,3 +122,4 @@ Example output:
		└──────────────────────┴───────┴───────┴────────┴───────┴───────┴───────┘
		Request throughput (per sec): 23.63
		```
Original file line number	Diff line number	Diff line change
Expand Up		@@ -119,3 +119,4 @@ Example output:
		└──────────────────────┴──────┴──────┴───────┴───────┴──────┴──────┘
		Request throughput (per sec): 180.11
		```
Original file line number	Diff line number	Diff line change
Expand Up		@@ -211,3 +211,4 @@ Example output:
		│ Request throughput (per sec) │ 2.28 │ N/A │ N/A │ N/A │ N/A │ N/A │
		└───────────────────────────────────┴────────┴────────┴────────┴────────┴────────┴────────┘
		```