-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Gracefully cancel the replication when the replica shard is closed #8089
Comments
Adding description here - We need to validate cancellation with remote store completes successfully. We can reuse ITs introduced with #8478 to do this validation. I think we will need to treat the remote store implementation as a sync call and wrap in CancellableThreads to perform interrupt, but maybe there is a similar cancel mechanism to |
Discussed separately with @mch2, as we are close to PR : #9234 CC @anasalkouz @mch2 |
I'm not sure if this is an issue with cancellation as a replica, but it looks like post failover a corrupt file is breaking recovery. I don't know if this is because the replica is still writing concurrently in a previous sync, or if its not handling reading cksum properly.
I log the local files inside of syncSegmentsFromRemoteSegmentStore. [2023-10-17T01:43:37,936][INFO ][o.o.i.s.IndexShard ] [node_t3] [test-idx-1][0] File _0.cfe already exists |
ReplicationSource defines a cancel method intended to allow source implementations to cancel and clean up any ongoing requests. The remote store implementation does not currently provide an implementation.
We need to validate cancellation with remote store completes successfully. We can reuse ITs introduced with #8478 to do this validation.
I think we will need to treat the remote store implementation as a sync call and wrap in CancellableThreads to perform interrupt, but maybe there is a similar cancel mechanism to RetryableTransportClient.cancel to support this.
The text was updated successfully, but these errors were encountered: