-
Notifications
You must be signed in to change notification settings - Fork 152
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clean up ResponseFactory when a final complete flag is set #358
Conversation
e0acd50
to
9e7e1bd
Compare
README.md
Outdated
@@ -599,6 +599,11 @@ Starting from 23.10, request cancellation can be checked directly on the | |||
the TRITONSERVER_RESPONSE_COMPLETE_FINAL flag at the end of response is still | |||
needed even the request is cancelled. | |||
|
|||
*Note:*: If `response_sender` is reused across multiple requests, the model must |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you elaborate on "response_sender
is reused across multiple requests"? I thought each request has exactly one response sender?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologize for the confusion. It is correct that each request has only one response sender. I think what I meant was specific to the case that the TRT-LLM team was experiencing:
A separate thread is launched to handle the requests instead of spawning a new thread for each request, and the thread is terminated in the finalize
stage. In this scenario, the last response_sender
in the thread will not be cleaned up before model gets unloaded, hence the model can not be unloaded properly. (You can refer to the testing model here for more details)
On a second thought, I think we can remove this section since the complete final flag is required for both request rescheduling and cancellation. In this case, even with that specific thread implementation, the response_sender should clean up the response_factory when the complete final flag is set.
Let me know if it makes sense to you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the clarification! It makes sense.
One last question: Does the thread let the response_sender
go out of scope, before the thread is terminated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this particular example, I think the response_sender
goes out of scope when the thread starts the termination by calling thread.join()
or something like that. The issue was that thread.join()
is called within the finalize
, so the response_sender
will only go out of scope after finalize
is invoked. If manually doing del response_sender
in the thread then the response_factory
can be cleaned up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't the thread let the response_sender
(s) go out of scope as soon as they are no longer used, i.e. after sending response final flag, before thread.join()
is called?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think that is the ideal way to manage response_sender. I believe one possible use case where response_sender does not go out of scope as soon as the final flag is sent is that, for example, the response_sender variable in this model will only be cleaned up when the next request is arrived and it gets reassigned to the new response_sender. This is when the destructor of the previous response_sender gets called. So unless users manually call del response_sender
at the end of the thread, the last response_sender will not go out of scope until thread.join()
is called.
// The flag to indicate if the response sender is closed. It is set to true | ||
// once the TRITONSERVER_RESPONSE_COMPLETE_FINAL flag is set, meaning that the | ||
// response_sender should not be used anymore. This flag is separate from the | ||
// `is_response_factory_cleaned_` flag because errors might occur after |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you elaborate on the errors that might occur after sending complete_final flag but before deleting the response_factory? Is it the model that is holding on to the response_sender forever?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When the complete final flag is set, it is still possible that some error occurs before the response actually gets sent to the backend. For example, here and here.
In this case, the response_factory
will not be automatically deleted in the backend, and can only be deleted within the destructor of response_sender
by sending the clean up message to the backend.
I think this bool variable closed_
is just to make sure users are not using the response sender after they set the final flag, It can not be used to determine if the response_factory is cleaned up from the backend side or not.
stub->EnqueueCleanupId( | ||
reinterpret_cast<void*>(response_factory_address_), | ||
PYTHONSTUB_DecoupledResponseFactoryCleanup); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we remove EnqueueCleanupId
(and associated logic on the server side) now that response factory is deleted when it sees the final flag?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we would still need to keep it for rescheduled requests. It's possible that the response sender hasn't sent the final flag yet and the request gets rescheduled. There will be a new response factory created for each request even if the request is a rescheduled one, so I think we would still need to clean up the response factory in the destructor.
Closing as per discussion a better approach would be handle this in the Triton core instead of PYBE. Filed ticket DLIS-6795 to track this. |
In the current implementation, users need to ensure that the
response_sender
goes out of scope or is properly cleaned up before unloading the model to guarantee that the unloading process executes correctly. This is because theresponse_sender
is responsible for cleaning up theresponse_factory
, which holds a shared_ptr to the underlying model. Only when the reference count of the model's shared_ptr drops to 0 will the model be unloaded.This PR enhances the lifetime management of
response_factory
so that when the complete final flag is sent, theresponse_factory
will be cleaned up immediately. This allows users to not worry about the lifetime ofresponse_sender
and to reuseresponse_sender
if they choose to.For rescheduled or cancelled requests, updated the documentation for the issue since it's hard and tricky to propagate that information back to the
response_sender
. I think we can enhance that after Jacky's ongoing changes for response_sender are in place.Testing: triton-inference-server/server#7246