Skip to content

Commit

Permalink
Fixes for README
Browse files Browse the repository at this point in the history
  • Loading branch information
dyastremsky committed Oct 10, 2023
1 parent a4921c1 commit 92124bf
Showing 1 changed file with 9 additions and 7 deletions.
16 changes: 9 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,8 +35,10 @@ You can learn more about Triton backends in the [backend
repo](https://github.com/triton-inference-server/backend). Ask
questions or report problems on the [issues
page](https://github.com/triton-inference-server/server/issues).
This backend is designed to run vLLM's
[supported HuggingFace models](https://vllm.readthedocs.io/en/latest/models/supported_models.html).
This backend is designed to run [vLLM](https://github.com/vllm-project/vllm)
with
[one of the HuggingFace models](https://vllm.readthedocs.io/en/latest/models/supported_models.html)
it supports.

Where can I ask general questions about Triton and Triton backends?
Be sure to read all the information below as well as the [general
Expand All @@ -47,8 +49,8 @@ main Triton [issues page](https://github.com/triton-inference-server/server/issu

## Build the vLLM Backend

As a Python-based backend, your Triton server just needs to have the (Python backend)[https://github.com/triton-inference-server/python_backend]
built under `/opt/tritonserver/backends/python`. After that, you can save this in the backends folder as `/opt/tritonserver/backends/vllm`. The `model.py` file in the `src` directory should be in the vllm folder and will function as your Python-based backend.
As a Python-based backend, your Triton server just needs to have the [Python backend](https://github.com/triton-inference-server/python_backend)
located in the backends directory: `/opt/tritonserver/backends/python`. After that, you can save the vLLM backend in the backends folder as `/opt/tritonserver/backends/vllm`. The `model.py` file in the `src` directory should be in the vllm folder and will function as your Python-based backend.

In other words, there are no build steps. You only need to copy this to your Triton backends repository. If you use the official Triton vLLM container, this is already set up for you.

Expand All @@ -68,11 +70,11 @@ The backend repository should look like this:

You can see an example model_repository in the `samples` folder.
You can use this as is and change the model by changing the `model` value in `model.json`.
You can change the GPU utilization and logging in that file as well.
You can change the GPU utilization and logging parameters in that file as well.

In the `samples` folder, you can also find a sample client, `client.py`.
This client is meant to function similarly to the Triton
(vLLM example)[https://github.com/triton-inference-server/tutorials/tree/main/Quick_Deploy/vLLM].
[vLLM example](https://github.com/triton-inference-server/tutorials/tree/main/Quick_Deploy/vLLM).
By default, this will test `prompts.txt`, which we have included in the samples folder.


Expand All @@ -90,4 +92,4 @@ tritonserver --model-repository=/models --backend-config=python,shm-region-prefi
# Triton instance 2
tritonserver --model-repository=/models --backend-config=python,shm-region-prefix-name=prefix2
```
Note that the hangs would only occur if the /dev/shm is shared between the two instances of the server. If you run the servers in different containers that don't share this location, you don't need to specify shm-region-prefix-name.
Note that the hangs would only occur if the /dev/shm is shared between the two instances of the server. If you run the servers in different containers that do not share this location, you do not need to specify shm-region-prefix-name.

0 comments on commit 92124bf

Please sign in to comment.