Skip to content

Commit

Permalink
Remove limitation, model name
Browse files Browse the repository at this point in the history
  • Loading branch information
dyastremsky committed Oct 11, 2023
1 parent b08f426 commit 682ad0c
Show file tree
Hide file tree
Showing 2 changed files with 1 addition and 6 deletions.
6 changes: 1 addition & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -86,11 +86,7 @@ will need to use a
Please see the
[conda](samples/conda) subdirectory of the `samples` folder for information on how to do so.

## Important Notes

* At present, Triton only supports one Python-based backend per server. If you try to start multiple vLLM models, you will get an error.

### Running Multiple Instances of Triton Server
## Running Multiple Instances of Triton Server

Python-based backends use shared memory to transfer requests to the stub process. When running multiple instances of Triton Server on the same machine that use Python-based backend models, there would be shared memory region name conflicts that can result in segmentation faults or hangs. In order to avoid this issue, you need to specify different shm-region-prefix-name using the --backend-config flag.
```
Expand Down
1 change: 0 additions & 1 deletion samples/model_repository/vllm_model/config.pbtxt
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,6 @@
# instructions in the samples/conda README on how to add a parameter
# to use a custom execution environment.

name: "vllm_model"
backend: "vllm"

# Disabling batching in Triton, let vLLM handle the batching on its own.
Expand Down

0 comments on commit 682ad0c

Please sign in to comment.