Skip to content

Commit

Permalink
Add Conda instructions.
Browse files Browse the repository at this point in the history
  • Loading branch information
dyastremsky committed Oct 10, 2023
1 parent aa8a105 commit ed108d0
Show file tree
Hide file tree
Showing 3 changed files with 187 additions and 1 deletion.
11 changes: 10 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,6 @@ The backend repository should look like this:
|-- triton_python_backend_utils.py
```


## Using the vLLM Backend

You can see an example model_repository in the `samples` folder.
Expand All @@ -78,6 +77,16 @@ This client is meant to function similarly to the Triton
By default, this will test `prompts.txt`, which we have included in the samples folder.


## Running the Latest vLLM Version

By default, the vLLM backend uses the version of vLLM that is available via Pip.
These are compatible with the newer versions of CUDA running in Triton.
If you would like to use a specific vLLM commit or the latest version of vLLM, you
will need to use a
[custom execution environment](https://github.com/triton-inference-server/python_backend#creating-custom-execution-environments).
Please see the
[conda](samples/conda) subdirectory of the `samples` folder for information on how to do so.

## Important Notes

* At present, Triton only supports one Python-based backend per server. If you try to start multiple vLLM models, you will get an error.
Expand Down
72 changes: 72 additions & 0 deletions samples/conda/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
<!--
# Copyright 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
# are met:
# * Redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer.
# * Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in the
# documentation and/or other materials provided with the distribution.
# * Neither the name of NVIDIA CORPORATION nor the names of its
# contributors may be used to endorse or promote products derived
# from this software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS ``AS IS'' AND ANY
# EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
# PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR
# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
# EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
# PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
# PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
# OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
-->

If you would like to run conda with the latest version of vLLM, you will need to create a
a [custom execution environment](https://github.com/triton-inference-server/python_backend#creating-custom-execution-environments).
This is because vLLM currently does not support the latest versions of CUDA in the Triton environment.
Instructions for creating a custom execution environment with the latest vLLM version are below.

## Step 1: Build a Custom Execution Environment With vLLM and Other Dependencies

The provided script should build the package environment
for you which will be used to load the model in Triton.

Run the following command from this directory. You can use any version of Triton.
```
docker run --gpus all -it --rm -p 8000:8000 -p 8001:8001 -p 8002:8002 --shm-size=8G --ulimit memlock=-1 --ulimit stack=67108864 -v ${PWD}:/work -w /work nvcr.io/nvidia/tritonserver:23.09-py3 bash
./gen_vllm_env.sh
```

This step might take a while to build the environment packages. Once complete, the current folder will be populated with
`triton_python_backend_stub` and `vllm_env`.

## Step 2: Update Your Model Repository

You want to place the stub and environment in your model directory.
The model directory should look something like this:
```
model_repository/
`-- vllm_model
|-- 1
| `-- model.json
|-- config.pbtxt
|-- triton_python_backend_stub
`-- vllm_env
```

You also want to add this section to the config.pbtxt of your model:
```
parameters: {
key: "EXECUTION_ENV_PATH",
value: {string_value: "$$TRITON_MODEL_DIRECTORY/vllm_env"}
}
```

## Step 3: Run Your Model

You can now start Triton server with your model!
105 changes: 105 additions & 0 deletions samples/conda/gen_vllm_env.ssh
Original file line number Diff line number Diff line change
@@ -0,0 +1,105 @@
#!/bin/bash
# Copyright 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
# are met:
# * Redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer.
# * Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in the
# documentation and/or other materials provided with the distribution.
# * Neither the name of NVIDIA CORPORATION nor the names of its
# contributors may be used to endorse or promote products derived
# from this software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS ``AS IS'' AND ANY
# EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
# PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR
# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
# EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
# PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
# PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
# OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

#
# This script creates a conda environment for Triton with vllm
# dependencies.
#

# Pick the release tag from the container environment variable
RELEASE_TAG="r${NVIDIA_TRITON_SERVER_VERSION}"

# Save target directories for conda environment and Python backend stubs
ENV_DIR="./model_repository/vllm/vllm_env/"
STUB_FILE="./model_repository/vllm/triton_python_backend_stub"

# If targets already exist, print a message and exit.
if [ -d "$ENV_DIR" ] && [ -f "$STUB_FILE" ]; then
echo "The conda environment directory and Python backend stubs already exist."
echo "Exiting environment set-up."
exit 0
fi

# If this script runs, clean up previous targets.
rm -rf $ENV_DIR $STUB_FILE

# Install and setup conda environment
FILE_NAME="Miniconda3-latest-Linux-x86_64.sh"
rm -rf ./miniconda $FILE_NAME
wget https://repo.anaconda.com/miniconda/$FILE_NAME

# Install miniconda in silent mode
bash $FILE_NAME -p ./miniconda -b

# Activate conda
eval "$(./miniconda/bin/conda shell.bash hook)"

# Installing cmake and dependencies
apt update && apt install software-properties-common rapidjson-dev libarchive-dev zlib1g-dev -y
# Using CMAKE installation instruction from:: https://apt.kitware.com/
apt install -y gpg wget && \
wget -O - https://apt.kitware.com/keys/kitware-archive-latest.asc 2>/dev/null | \
gpg --dearmor - | \
tee /usr/share/keyrings/kitware-archive-keyring.gpg >/dev/null && \
. /etc/os-release && \
echo "deb [signed-by=/usr/share/keyrings/kitware-archive-keyring.gpg] https://apt.kitware.com/ubuntu/ $UBUNTU_CODENAME main" | \
tee /etc/apt/sources.list.d/kitware.list >/dev/null && \
apt-get update && \
apt-get install -y --no-install-recommends cmake cmake-data

conda create -n vllm_env python=3.10 -y
conda activate vllm_env
export PYTHONNOUSERSITE=True
conda install -c conda-forge libstdcxx-ng=12 -y
conda install -c conda-forge conda-pack -y

# vLLM needs cuda 11.8 to run properly
conda install -c "nvidia/label/cuda-11.8.0" cuda-toolkit -y

pip install numpy
pip install git+https://github.com/huggingface/transformers.git
pip install git+https://github.com/vllm-project/vllm.git


rm -rf python_backend
git clone https://github.com/triton-inference-server/python_backend -b $RELEASE_TAG
(cd python_backend/ && mkdir builddir && cd builddir && \
cmake -DTRITON_ENABLE_GPU=ON -DTRITON_BACKEND_REPO_TAG=$RELEASE_TAG -DTRITON_COMMON_REPO_TAG=$RELEASE_TAG -DTRITON_CORE_REPO_TAG=$RELEASE_TAG ../ && \
make -j18 triton-python-backend-stub)

mv python_backend/builddir/triton_python_backend_stub ./model_repository/vllm/

# Prepare and copy the conda environment
cp -r $CONDA_PREFIX/lib/python3.10/site-packages/conda_pack/scripts/posix/activate $CONDA_PREFIX/bin/
rm -r $CONDA_PREFIX/nsight*
cp -r $CONDA_PREFIX ./model_repository/vllm/

conda deactivate

# Clean-up
rm -rf ./miniconda $FILE_NAME
rm -rf python_backend

0 comments on commit ed108d0

Please sign in to comment.