Clean up ResponseFactory when a final complete flag is set #358

krishung5 · 2024-05-18T00:02:12Z

In the current implementation, users need to ensure that the response_sender goes out of scope or is properly cleaned up before unloading the model to guarantee that the unloading process executes correctly. This is because the response_sender is responsible for cleaning up the response_factory, which holds a shared_ptr to the underlying model. Only when the reference count of the model's shared_ptr drops to 0 will the model be unloaded.

This PR enhances the lifetime management of response_factory so that when the complete final flag is sent, the response_factory will be cleaned up immediately. This allows users to not worry about the lifetime of response_sender and to reuse response_sender if they choose to.

For rescheduled or cancelled requests, updated the documentation for the issue since it's hard and tricky to propagate that information back to the response_sender. I think we can enhance that after Jacky's ongoing changes for response_sender are in place.

Testing: triton-inference-server/server#7246

kthui · 2024-05-21T23:42:57Z

README.md

@@ -599,6 +599,11 @@ Starting from 23.10, request cancellation can be checked directly on the
 the TRITONSERVER_RESPONSE_COMPLETE_FINAL flag at the end of response is still
 needed even the request is cancelled.

+*Note:*: If `response_sender` is reused across multiple requests, the model must


Can you elaborate on "response_sender is reused across multiple requests"? I thought each request has exactly one response sender?

Apologize for the confusion. It is correct that each request has only one response sender. I think what I meant was specific to the case that the TRT-LLM team was experiencing:
A separate thread is launched to handle the requests instead of spawning a new thread for each request, and the thread is terminated in the finalize stage. In this scenario, the last response_sender in the thread will not be cleaned up before model gets unloaded, hence the model can not be unloaded properly. (You can refer to the testing model here for more details)

On a second thought, I think we can remove this section since the complete final flag is required for both request rescheduling and cancellation. In this case, even with that specific thread implementation, the response_sender should clean up the response_factory when the complete final flag is set.

Let me know if it makes sense to you!

Thanks for the clarification! It makes sense.

One last question: Does the thread let the response_sender go out of scope, before the thread is terminated?

For this particular example, I think the response_sender goes out of scope when the thread starts the termination by calling thread.join() or something like that. The issue was that thread.join() is called within the finalize, so the response_sender will only go out of scope after finalize is invoked. If manually doing del response_sender in the thread then the response_factory can be cleaned up.

Shouldn't the thread let the response_sender(s) go out of scope as soon as they are no longer used, i.e. after sending response final flag, before thread.join() is called?

Yes, I think that is the ideal way to manage response_sender. I believe one possible use case where response_sender does not go out of scope as soon as the final flag is sent is that, for example, the response_sender variable in this model will only be cleaned up when the next request is arrived and it gets reassigned to the new response_sender. This is when the destructor of the previous response_sender gets called. So unless users manually call del response_sender at the end of the thread, the last response_sender will not go out of scope until thread.join() is called.

kthui · 2024-05-21T23:46:15Z

src/response_sender.h

+  // The flag to indicate if the response sender is closed. It is set to true
+  // once the TRITONSERVER_RESPONSE_COMPLETE_FINAL flag is set, meaning that the
+  // response_sender should not be used anymore. This flag is separate from the
+  // `is_response_factory_cleaned_` flag because errors might occur after


Can you elaborate on the errors that might occur after sending complete_final flag but before deleting the response_factory? Is it the model that is holding on to the response_sender forever?

When the complete final flag is set, it is still possible that some error occurs before the response actually gets sent to the backend. For example, here and here.
In this case, the response_factory will not be automatically deleted in the backend, and can only be deleted within the destructor of response_sender by sending the clean up message to the backend.

I think this bool variable closed_ is just to make sure users are not using the response sender after they set the final flag, It can not be used to determine if the response_factory is cleaned up from the backend side or not.

Tabrizian · 2024-05-23T01:03:09Z

src/response_sender.cc

+    stub->EnqueueCleanupId(
+        reinterpret_cast<void*>(response_factory_address_),
+        PYTHONSTUB_DecoupledResponseFactoryCleanup);
+  }


Could we remove EnqueueCleanupId (and associated logic on the server side) now that response factory is deleted when it sees the final flag?

I think we would still need to keep it for rescheduled requests. It's possible that the response sender hasn't sent the final flag yet and the request gets rescheduled. There will be a new response factory created for each request even if the request is a rescheduled one, so I think we would still need to clean up the response factory in the destructor.

krishung5 · 2024-05-30T18:44:54Z

Closing as per discussion a better approach would be handle this in the Triton core instead of PYBE. Filed ticket DLIS-6795 to track this.

krishung5 added 4 commits May 9, 2024 18:34

Testing

c426243

Clean up response_factory when the final flag is set

1c7ff5a

Initialize variable

a593f2f

Update doc

9e7e1bd

krishung5 force-pushed the krish-python-finalize branch from e0acd50 to 9e7e1bd Compare May 21, 2024 23:25

krishung5 marked this pull request as ready for review May 21, 2024 23:33

krishung5 requested review from kthui and Tabrizian May 21, 2024 23:36

krishung5 mentioned this pull request May 21, 2024

Add testing for decoupled model use case triton-inference-server/server#7246

Closed

kthui reviewed May 21, 2024

View reviewed changes

mc-nv requested a review from kthui May 21, 2024 23:58

Revert

59cbb36

Tabrizian reviewed May 23, 2024

View reviewed changes

krishung5 closed this May 30, 2024

krishung5 deleted the krish-python-finalize branch May 30, 2024 18:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clean up ResponseFactory when a final complete flag is set #358

Clean up ResponseFactory when a final complete flag is set #358

krishung5 commented May 18, 2024 •

edited

Loading

kthui May 21, 2024

krishung5 May 22, 2024

kthui May 22, 2024

krishung5 May 22, 2024

kthui May 22, 2024

krishung5 May 22, 2024

kthui May 21, 2024

krishung5 May 22, 2024 •

edited

Loading

Tabrizian May 23, 2024

krishung5 May 23, 2024

krishung5 commented May 30, 2024

Clean up ResponseFactory when a final complete flag is set #358

Clean up ResponseFactory when a final complete flag is set #358

Conversation

krishung5 commented May 18, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

krishung5 May 22, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

krishung5 commented May 30, 2024

krishung5 commented May 18, 2024 •

edited

Loading

krishung5 May 22, 2024 •

edited

Loading