Fixes for README

triton-inference-server · Oct 10, 2023 · 92124bf · 92124bf
1 parent a4921c1
commit 92124bf
Showing 1 changed file with 9 additions and 7 deletions.
diff --git a/README.md b/README.md
@@ -35,8 +35,10 @@ You can learn more about Triton backends in the [backend
 repo](https://github.com/triton-inference-server/backend). Ask
 questions or report problems on the [issues
 page](https://github.com/triton-inference-server/server/issues).
-This backend is designed to run vLLM's
-[supported HuggingFace models](https://vllm.readthedocs.io/en/latest/models/supported_models.html).
+This backend is designed to run [vLLM](https://github.com/vllm-project/vllm)
+with
+[one of the HuggingFace models](https://vllm.readthedocs.io/en/latest/models/supported_models.html)
+it supports.
 
 Where can I ask general questions about Triton and Triton backends?
 Be sure to read all the information below as well as the [general
@@ -47,8 +49,8 @@ main Triton [issues page](https://github.com/triton-inference-server/server/issu
 
 ## Build the vLLM Backend
 
-As a Python-based backend, your Triton server just needs to have the (Python backend)[https://github.com/triton-inference-server/python_backend]
-built under `/opt/tritonserver/backends/python`. After that, you can save this in the backends folder as `/opt/tritonserver/backends/vllm`. The `model.py` file in the `src` directory should be in the vllm folder and will function as your Python-based backend.
+As a Python-based backend, your Triton server just needs to have the [Python backend](https://github.com/triton-inference-server/python_backend)
+located in the backends directory: `/opt/tritonserver/backends/python`. After that, you can save the vLLM backend in the backends folder as `/opt/tritonserver/backends/vllm`. The `model.py` file in the `src` directory should be in the vllm folder and will function as your Python-based backend.
 
 In other words, there are no build steps. You only need to copy this to your Triton backends repository. If you use the official Triton vLLM container, this is already set up for you.
 
@@ -68,11 +70,11 @@ The backend repository should look like this:
 
 You can see an example model_repository in the `samples` folder.
 You can use this as is and change the model by changing the `model` value in `model.json`.
-You can change the GPU utilization and logging in that file as well.
+You can change the GPU utilization and logging parameters in that file as well.
 
 In the `samples` folder, you can also find a sample client, `client.py`.
 This client is meant to function similarly to the Triton
-(vLLM example)[https://github.com/triton-inference-server/tutorials/tree/main/Quick_Deploy/vLLM].
+[vLLM example](https://github.com/triton-inference-server/tutorials/tree/main/Quick_Deploy/vLLM).
 By default, this will test `prompts.txt`, which we have included in the samples folder.
 
 
@@ -90,4 +92,4 @@ tritonserver --model-repository=/models --backend-config=python,shm-region-prefi
 # Triton instance 2
 tritonserver --model-repository=/models --backend-config=python,shm-region-prefix-name=prefix2
 ```
-Note that the hangs would only occur if the /dev/shm is shared between the two instances of the server. If you run the servers in different containers that don't share this location, you don't need to specify shm-region-prefix-name.
+Note that the hangs would only occur if the /dev/shm is shared between the two instances of the server. If you run the servers in different containers that do not share this location, you do not need to specify shm-region-prefix-name.