Add Conda instructions.

triton-inference-server · Oct 10, 2023 · ed108d0 · ed108d0
1 parent aa8a105
commit ed108d0
Show file tree

Hide file tree

Showing 3 changed files with 187 additions and 1 deletion.
diff --git a/README.md b/README.md
@@ -65,7 +65,6 @@ The backend repository should look like this:
     |-- triton_python_backend_utils.py
 ```
 
-
 ## Using the vLLM Backend
 
 You can see an example model_repository in the `samples` folder.
@@ -78,6 +77,16 @@ This client is meant to function similarly to the Triton
 By default, this will test `prompts.txt`, which we have included in the samples folder.
 
 
+## Running the Latest vLLM Version
+
+By default, the vLLM backend uses the version of vLLM that is available via Pip.
+These are compatible with the newer versions of CUDA running in Triton.
+If you would like to use a specific vLLM commit or the latest version of vLLM, you
+will need to use a
+[custom execution environment](https://github.com/triton-inference-server/python_backend#creating-custom-execution-environments).
+Please see the
+[conda](samples/conda) subdirectory of the `samples` folder for information on how to do so.
+
 ## Important Notes
 
 * At present, Triton only supports one Python-based backend per server. If you try to start multiple vLLM models, you will get an error.

diff --git a/samples/conda/README.md b/samples/conda/README.md
@@ -0,0 +1,72 @@
+<!--
+# Copyright 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions
+# are met:
+#  * Redistributions of source code must retain the above copyright
+#    notice, this list of conditions and the following disclaimer.
+#  * Redistributions in binary form must reproduce the above copyright
+#    notice, this list of conditions and the following disclaimer in the
+#    documentation and/or other materials provided with the distribution.
+#  * Neither the name of NVIDIA CORPORATION nor the names of its
+#    contributors may be used to endorse or promote products derived
+#    from this software without specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS ``AS IS'' AND ANY
+# EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+# PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL THE COPYRIGHT OWNER OR
+# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
+# EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
+# PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
+# PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
+# OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+-->
+
+If you would like to run conda with the latest version of vLLM, you will need to create a
+a [custom execution environment](https://github.com/triton-inference-server/python_backend#creating-custom-execution-environments).
+This is because vLLM currently does not support the latest versions of CUDA in the Triton environment.
+Instructions for creating a custom execution environment with the latest vLLM version are below.
+
+## Step 1: Build a Custom Execution Environment With vLLM and Other Dependencies
+
+The provided script should build the package environment
+for you which will be used to load the model in Triton.
+
+Run the following command from this directory. You can use any version of Triton.
+```
+docker run --gpus all -it --rm -p 8000:8000 -p 8001:8001 -p 8002:8002 --shm-size=8G --ulimit memlock=-1 --ulimit stack=67108864 -v ${PWD}:/work -w /work nvcr.io/nvidia/tritonserver:23.09-py3 bash
+./gen_vllm_env.sh
+```
+
+This step might take a while to build the environment packages. Once complete, the current folder will be populated with
+`triton_python_backend_stub` and `vllm_env`.
+
+## Step 2: Update Your Model Repository
+
+You want to place the stub and environment in your model directory.
+The model directory should look something like this:
+```
+model_repository/
+`-- vllm_model
+    |-- 1
+    |   `-- model.json
+    |-- config.pbtxt
+    |-- triton_python_backend_stub
+    `-- vllm_env
+```
+
+You also want to add this section to the config.pbtxt of your model:
+```
+parameters: {
+  key: "EXECUTION_ENV_PATH",
+  value: {string_value: "$$TRITON_MODEL_DIRECTORY/vllm_env"}
+}
+```
+
+## Step 3: Run Your Model
+
+You can now start Triton server with your model!
diff --git a/samples/conda/gen_vllm_env.ssh b/samples/conda/gen_vllm_env.ssh
@@ -0,0 +1,105 @@
+#!/bin/bash
+# Copyright 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions
+# are met:
+#  * Redistributions of source code must retain the above copyright
+#    notice, this list of conditions and the following disclaimer.
+#  * Redistributions in binary form must reproduce the above copyright
+#    notice, this list of conditions and the following disclaimer in the
+#    documentation and/or other materials provided with the distribution.
+#  * Neither the name of NVIDIA CORPORATION nor the names of its
+#    contributors may be used to endorse or promote products derived
+#    from this software without specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS ``AS IS'' AND ANY
+# EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+# PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL THE COPYRIGHT OWNER OR
+# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
+# EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
+# PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
+# PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
+# OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+#
+# This script creates a conda environment for Triton with vllm
+# dependencies.
+#
+
+# Pick the release tag from the container environment variable
+RELEASE_TAG="r${NVIDIA_TRITON_SERVER_VERSION}"
+
+# Save target directories for conda environment and Python backend stubs
+ENV_DIR="./model_repository/vllm/vllm_env/"
+STUB_FILE="./model_repository/vllm/triton_python_backend_stub"
+
+# If targets already exist, print a message and exit.
+if [ -d "$ENV_DIR" ] && [ -f "$STUB_FILE" ]; then
+    echo "The conda environment directory and Python backend stubs already exist."
+    echo "Exiting environment set-up."
+    exit 0
+fi
+
+# If this script runs, clean up previous targets.
+rm -rf $ENV_DIR $STUB_FILE
+
+# Install and setup conda environment
+FILE_NAME="Miniconda3-latest-Linux-x86_64.sh"
+rm -rf ./miniconda $FILE_NAME
+wget https://repo.anaconda.com/miniconda/$FILE_NAME
+
+# Install miniconda in silent mode
+bash $FILE_NAME -p ./miniconda -b
+
+# Activate conda
+eval "$(./miniconda/bin/conda shell.bash hook)"
+
+# Installing cmake and dependencies
+apt update && apt install software-properties-common rapidjson-dev libarchive-dev zlib1g-dev -y
+# Using CMAKE installation instruction from:: https://apt.kitware.com/
+apt install -y gpg wget && \
+    wget -O - https://apt.kitware.com/keys/kitware-archive-latest.asc 2>/dev/null | \
+        gpg --dearmor - |  \
+        tee /usr/share/keyrings/kitware-archive-keyring.gpg >/dev/null && \
+    . /etc/os-release && \
+    echo "deb [signed-by=/usr/share/keyrings/kitware-archive-keyring.gpg] https://apt.kitware.com/ubuntu/ $UBUNTU_CODENAME main" | \
+    tee /etc/apt/sources.list.d/kitware.list >/dev/null && \
+    apt-get update && \
+    apt-get install -y --no-install-recommends cmake cmake-data
+
+conda create -n vllm_env python=3.10 -y
+conda activate vllm_env
+export PYTHONNOUSERSITE=True
+conda install -c conda-forge libstdcxx-ng=12 -y
+conda install -c conda-forge conda-pack -y
+
+# vLLM needs cuda 11.8 to run properly
+conda install -c "nvidia/label/cuda-11.8.0" cuda-toolkit -y
+
+pip install numpy
+pip install git+https://github.com/huggingface/transformers.git
+pip install git+https://github.com/vllm-project/vllm.git
+
+
+rm -rf python_backend
+git clone https://github.com/triton-inference-server/python_backend -b $RELEASE_TAG
+(cd python_backend/ && mkdir builddir && cd builddir && \
+cmake -DTRITON_ENABLE_GPU=ON -DTRITON_BACKEND_REPO_TAG=$RELEASE_TAG -DTRITON_COMMON_REPO_TAG=$RELEASE_TAG -DTRITON_CORE_REPO_TAG=$RELEASE_TAG ../ && \
+make -j18 triton-python-backend-stub)
+
+mv python_backend/builddir/triton_python_backend_stub ./model_repository/vllm/
+
+# Prepare and copy the conda environment
+cp -r $CONDA_PREFIX/lib/python3.10/site-packages/conda_pack/scripts/posix/activate $CONDA_PREFIX/bin/
+rm -r $CONDA_PREFIX/nsight*
+cp -r $CONDA_PREFIX ./model_repository/vllm/
+
+conda deactivate
+
+# Clean-up
+rm -rf ./miniconda $FILE_NAME
+rm -rf python_backend