-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Core] Retryable grpc client #47981
base: master
Are you sure you want to change the base?
[Core] Retryable grpc client #47981
Conversation
Signed-off-by: Jiajun Yao <[email protected]>
Signed-off-by: Jiajun Yao <[email protected]>
Signed-off-by: Jiajun Yao <[email protected]>
Signed-off-by: Jiajun Yao <[email protected]>
Signed-off-by: Jiajun Yao <[email protected]>
Signed-off-by: Jiajun Yao <[email protected]>
Signed-off-by: Jiajun Yao <[email protected]>
Signed-off-by: Jiajun Yao <[email protected]>
Signed-off-by: Jiajun Yao <[email protected]>
Signed-off-by: Jiajun Yao <[email protected]>
Signed-off-by: Jiajun Yao <[email protected]>
Signed-off-by: Jiajun Yao <[email protected]>
Signed-off-by: Jiajun Yao <[email protected]>
Signed-off-by: Jiajun Yao <[email protected]>
grpc_client->template CallMethod<Request, Reply>( | ||
prepare_async_function, | ||
request, | ||
[weak_retryable_grpc_client, retryable_grpc_request, callback]( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, a lot of the captured variables should be moved, no need for copy.
|
||
RAY_CHECK(server_unavailable_timeout_time_.has_value()); | ||
|
||
auto status = channel_->GetState(false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a comment what is false
.
Signed-off-by: Jiajun Yao <[email protected]>
* Status::TimedOut. If the whole client does not reconnect within | ||
* server_unavailable_timeout_seconds, server_unavailable_timeout_callback is invoked. | ||
* | ||
* When all callers of the client release the shared_ptr of the client, the client |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since the destruction logic is the same across all callsites, dtor is better than deleter
Signed-off-by: Jiajun Yao <[email protected]>
@@ -763,7 +791,11 @@ void CoreWorker::Shutdown() { | |||
|
|||
// Now that gcs_client is not used within io service, we can reset the pointer and clean | |||
// it up. | |||
gcs_client_.reset(); | |||
if (gcs_client_) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I were you, I would place the disconnection logic into gcs_client_
's deleter.
The benefit of which is: we keep invariant when gcs client is valid, connection is guaranteed to be valid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll leave it for a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you plan to do that, maybe leave a TODO in case we forget?
Feel free to leave it under my name.
@@ -321,6 +321,10 @@ struct GcsServerMocker { | |||
drain_raylet_callbacks.push_back(callback); | |||
}; | |||
|
|||
void IsLocalWorkerDead( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This class should be named as Fake
, not Mock
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR is already big. Let's do these cleanups in a separate PR.
Signed-off-by: Jiajun Yao <[email protected]>
Signed-off-by: Jiajun Yao <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Chatted offline, most of my prev comments are caused by the wide-spread usage of shared pointer... Let's proceed, I will take another look
Why are these changes needed?
Currently gcs_rpc_client has retries for gcs rpc calls and this PR moves the retry functionality to RetryableGrpcClient so that it can be used by non-gcs rpc client (e.g. core worker client).
Also enable retry for
ReportGeneratorItemReturns
rpc since it's idempotent.Related issue number
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.