Remove limitation, model name

triton-inference-server · Oct 11, 2023 · 682ad0c · 682ad0c
1 parent b08f426
commit 682ad0c
Show file tree

Hide file tree

Showing 2 changed files with 1 addition and 6 deletions.
diff --git a/README.md b/README.md
@@ -86,11 +86,7 @@ will need to use a
 Please see the
 [conda](samples/conda) subdirectory of the `samples` folder for information on how to do so.
 
-## Important Notes
-
-* At present, Triton only supports one Python-based backend per server. If you try to start multiple vLLM models, you will get an error.
-
-### Running Multiple Instances of Triton Server
+## Running Multiple Instances of Triton Server
 
 Python-based backends use shared memory to transfer requests to the stub process. When running multiple instances of Triton Server on the same machine that use Python-based backend models, there would be shared memory region name conflicts that can result in segmentation faults or hangs. In order to avoid this issue, you need to specify different shm-region-prefix-name using the --backend-config flag.
 ```

diff --git a/samples/model_repository/vllm_model/config.pbtxt b/samples/model_repository/vllm_model/config.pbtxt
@@ -29,7 +29,6 @@
 # instructions in the samples/conda README on how to add a parameter
 # to use a custom execution environment.
 
-name: "vllm_model"
 backend: "vllm"
 
 # Disabling batching in Triton, let vLLM handle the batching on its own.