diff --git a/README.md b/README.md index 86abb10d..6a45a619 100644 --- a/README.md +++ b/README.md @@ -508,10 +508,8 @@ Supported error codes: #### Request Cancellation Handling One or more requests may be cancelled by the client during execution. Starting -from 23.10, `request.is_cancelled()` returns whether the request is cancelled. - -If a request is cancelled, the model may respond with any dummy object in place -of the normal output tensors on the request. For example: +from 23.10, `request.is_cancelled()` returns whether the request is cancelled or +not. For example: ```python import triton_python_backend_utils as pb_utils @@ -524,7 +522,8 @@ class TritonPythonModel: for request in requests: if request.is_cancelled(): - responses.append(None) + responses.append(pb_utils.InferenceResponse( + error=pb_utils.TritonError("Message", pb_utils.TritonError.CANCELLED))) else: ... @@ -600,8 +599,6 @@ full power of what can be achieved from decoupled API. Read [Decoupled Backends and Models](https://github.com/triton-inference-server/server/blob/main/docs/user_guide/decoupled_models.md) for more details on how to host a decoupled model. -##### - ##### Known Issues * Currently, decoupled Python models can not make async infer requests. diff --git a/src/pb_stub.cc b/src/pb_stub.cc index c379998d..370df866 100644 --- a/src/pb_stub.cc +++ b/src/pb_stub.cc @@ -771,6 +771,11 @@ Stub::ProcessRequests(RequestBatch* request_batch_shm_ptr) std::to_string(response_size) + "\n"; throw PythonBackendException(err); } + for (auto& response : responses) { + if (!py::isinstance(response)) { + std::string str = py::str(response.get_type()); + } + } for (size_t i = 0; i < response_size; i++) { // If the model has checked for cancellation and the request is cancelled, // replace returned type with a cancelled response.