diff --git a/README.md b/README.md
index 86abb10d..6a45a619 100644
--- a/README.md
+++ b/README.md
@@ -508,10 +508,8 @@ Supported error codes:
 #### Request Cancellation Handling
 
 One or more requests may be cancelled by the client during execution. Starting
-from 23.10, `request.is_cancelled()` returns whether the request is cancelled.
-
-If a request is cancelled, the model may respond with any dummy object in place
-of the normal output tensors on the request. For example:
+from 23.10, `request.is_cancelled()` returns whether the request is cancelled or
+not. For example:
 
 ```python
 import triton_python_backend_utils as pb_utils
@@ -524,7 +522,8 @@ class TritonPythonModel:
 
         for request in requests:
             if request.is_cancelled():
-                responses.append(None)
+                responses.append(pb_utils.InferenceResponse(
+                    error=pb_utils.TritonError("Message", pb_utils.TritonError.CANCELLED)))
             else:
                 ...
 
@@ -600,8 +599,6 @@ full power of what can be achieved from decoupled API. Read
 [Decoupled Backends and Models](https://github.com/triton-inference-server/server/blob/main/docs/user_guide/decoupled_models.md)
 for more details on how to host a decoupled model.
 
-#####
-
 ##### Known Issues
 
 * Currently, decoupled Python models can not make async infer requests.
diff --git a/src/pb_stub.cc b/src/pb_stub.cc
index c379998d..370df866 100644
--- a/src/pb_stub.cc
+++ b/src/pb_stub.cc
@@ -771,6 +771,11 @@ Stub::ProcessRequests(RequestBatch* request_batch_shm_ptr)
           std::to_string(response_size) + "\n";
       throw PythonBackendException(err);
     }
+    for (auto& response : responses) {
+      if (!py::isinstance<InferResponse>(response)) {
+        std::string str = py::str(response.get_type());
+      }
+    }
     for (size_t i = 0; i < response_size; i++) {
       // If the model has checked for cancellation and the request is cancelled,
       // replace returned type with a cancelled response.