diff --git a/README.md b/README.md index 87d8156e..f4eb27ee 100644 --- a/README.md +++ b/README.md @@ -39,7 +39,7 @@ You can learn more about Triton backends in the [backend repo](https://github.com/triton-inference-server/backend). -This is a [Python-based backend](https://github.com/triton-inference-server/backend/blob/main/docs/python_based_backends.md#python-based-backends). +This is a [Python-based backend](https://github.com/triton-inference-server/backend/blob/main/docs/python_based_backends.md#python-based-backends). When using this backend, all requests are placed on the vLLM AsyncEngine as soon as they are received. Inflight batching and paged attention is handled by the vLLM engine.