diff --git a/genai-perf/README.md b/genai-perf/README.md index 0f2609aa..8fc93ac2 100644 --- a/genai-perf/README.md +++ b/genai-perf/README.md @@ -73,45 +73,34 @@ INSTALLATION ## Installation -The easiest way to install GenAI-Perf is through -[Triton Server SDK container](https://ngc.nvidia.com/catalog/containers/nvidia:tritonserver). -Install the latest release using the following command: +The easiest way to install GenAI-Perf is through pip. +### Install GenAI-Perf (Ubuntu 24.04, Python 3.10+) ```bash -export RELEASE="24.10" - -docker run -it --net=host --gpus=all nvcr.io/nvidia/tritonserver:${RELEASE}-py3-sdk - -# Check out genai_perf command inside the container: -genai-perf --help +pip install genai-perf ``` +**NOTE**: you must already have CUDA 12 installed -
-Alternatively, to install from source: +
-Since GenAI-Perf depends on Perf Analyzer, -you'll need to install the Perf Analyzer binary: +Alternatively, to install the container: -### Install Perf Analyzer (Ubuntu, Python 3.10+) +[Triton Server SDK container](https://ngc.nvidia.com/catalog/containers/nvidia:tritonserver) -**NOTE**: you must already have CUDA 12 installed -(checkout the [CUDA installation guide](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html)). +Pull the latest release using the following command: ```bash -pip install tritonclient +export RELEASE="24.12" -sudo apt update && sudo apt install -y --no-install-recommends libb64-0d -``` - -You can also build Perf Analyzer [from source](../docs/install.md#build-from-source) as well. - -### Install GenAI-Perf from source +docker run -it --net=host --gpus=all nvcr.io/nvidia/tritonserver:${RELEASE}-py3-sdk -```bash -pip install git+https://github.com/triton-inference-server/perf_analyzer.git#subdirectory=genai-perf +# Validate the genai-perf command works inside the container: +genai-perf --help ``` +You can also build Perf Analyzer [from source](../docs/install.md#build-from-source) to use alongside GenAI-Perf as well. +

@@ -142,7 +131,7 @@ docker run -ti \ --shm-size=1g --ulimit memlock=-1 \ -v /tmp:/tmp \ -v ${HOME}/.cache/huggingface:/root/.cache/huggingface \ - nvcr.io/nvidia/tritonserver:24.10-trtllm-python-py3 + nvcr.io/nvidia/tritonserver:24.12-trtllm-python-py3 # Install the Triton CLI pip install git+https://github.com/triton-inference-server/triton_cli.git@0.0.11 diff --git a/genai-perf/docs/lora.md b/genai-perf/docs/lora.md index e086464a..4ea25d3e 100644 --- a/genai-perf/docs/lora.md +++ b/genai-perf/docs/lora.md @@ -90,7 +90,7 @@ docker run -it --net=host --rm --gpus=all \ Run GenAI-Perf from the Triton Inference Server SDK container: ```bash -export RELEASE="24.10" +export RELEASE="24.12" docker run -it --net=host --gpus=all nvcr.io/nvidia/tritonserver:${RELEASE}-py3-sdk @@ -149,7 +149,7 @@ docker run \ Run GenAI-Perf from the Triton Inference Server SDK container: ```bash -export RELEASE="24.10" +export RELEASE="24.12" docker run -it --net=host --gpus=all nvcr.io/nvidia/tritonserver:${RELEASE}-py3-sdk @@ -207,7 +207,7 @@ docker run \ Run GenAI-Perf from the Triton Inference Server SDK container: ```bash -export RELEASE="24.10" +export RELEASE="24.12" docker run -it --net=host --gpus=all nvcr.io/nvidia/tritonserver:${RELEASE}-py3-sdk diff --git a/templates/genai-perf-templates/README_template b/templates/genai-perf-templates/README_template index edbba913..610990c5 100644 --- a/templates/genai-perf-templates/README_template +++ b/templates/genai-perf-templates/README_template @@ -73,43 +73,34 @@ INSTALLATION ## Installation -The easiest way to install GenAI-Perf is through -[Triton Server SDK container](https://ngc.nvidia.com/catalog/containers/nvidia:tritonserver). -Install the latest release using the following command: +The easiest way to install GenAI-Perf is through pip. +### Install GenAI-Perf (Ubuntu 24.04, Python 3.10+) ```bash -export RELEASE="{{ release }}" - -docker run -it --net=host --gpus=all nvcr.io/nvidia/tritonserver:${RELEASE}-py3-sdk - -# Check out genai_perf command inside the container: -genai-perf --help +pip install genai-perf ``` +**NOTE**: you must already have CUDA 12 installed +
-Alternatively, to install from source: +Alternatively, to install the container: -Since GenAI-Perf depends on Perf Analyzer, -you'll need to install the Perf Analyzer binary: +[Triton Server SDK container](https://ngc.nvidia.com/catalog/containers/nvidia:tritonserver) -### Install Perf Analyzer (Ubuntu, Python 3.10+) - -**NOTE**: you must already have CUDA 12 installed -(checkout the [CUDA installation guide](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html)). +Pull the latest release using the following command: ```bash -pip install tritonclient -``` - -You can also build Perf Analyzer [from source](../docs/install.md#build-from-source) as well. +export RELEASE="{{ release }}" -### Install GenAI-Perf from source +docker run -it --net=host --gpus=all nvcr.io/nvidia/tritonserver:${RELEASE}-py3-sdk -```bash -pip install git+https://github.com/triton-inference-server/perf_analyzer.git#subdirectory=genai-perf +# Validate the genai-perf command works inside the container: +genai-perf --help ``` +You can also build Perf Analyzer [from source](../docs/install.md#build-from-source) to use alongside GenAI-Perf as well. +

@@ -182,6 +173,15 @@ See [Tutorial](docs/tutorial.md) for additional examples.
+ +## Analyze +GenAI-Perf can be used to sweep through PA or GenAI-Perf stimulus allowing the user to profile multiple scenarios with a single command. +See [Analyze](docs/analyze.md) for details on how this subcommand can be utilized. +