-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Limit RW separation to remote store enabled clusters and update recovery flow #16760
base: main
Are you sure you want to change the base?
Conversation
❌ Gradle check result for a932d59: Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
❌ Gradle check result for a932d59: Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #16760 +/- ##
============================================
+ Coverage 72.11% 72.18% +0.06%
- Complexity 65237 65298 +61
============================================
Files 5318 5318
Lines 304003 304032 +29
Branches 43992 44005 +13
============================================
+ Hits 219228 219461 +233
+ Misses 66874 66564 -310
- Partials 17901 18007 +106 ☔ View full report in Codecov by Sentry. |
This PR includes multiple changes to search replica recovery. 1. Change search only replica copies to recover as empty store instead of PEER. This will run a store recovery that syncs segments from remote store directly and eliminate any primary communication. 2. Remove search replicas from the in-sync allocation ID set and update routing table to exclude them from allAllocationIds. This ensures primaries aren't tracking or validating the routing table for any search replica's presence. 3. Change search replica validation to require remote store. There are versions of the above changes that are still possible with primary based node-node replication, but I don't think they are worth making at this time. Signed-off-by: Marc Handalian <[email protected]>
Signed-off-by: Marc Handalian <[email protected]>
Signed-off-by: Marc Handalian <[email protected]>
Thanks @mch2 for the changes. On high level changes look good.
Also just noticed, indices with auto-expand replicas shouldn't support search-replicas otherwise total copies can get more than no. of data nodes as well. |
Thanks for taking a look @shwetathareja, you bring up great points.
Not yet, reguar segrep has a check to fail lagging replicas based on their replication checkpoint (basically segment infos version + a timer) compared to the primary's active set of segments. We could implement something similar to fetch and compare only among the search replica group and remove any outlier. I am thinking to start that we rely on a failure mechanism local to the shards that fail if a single download event takes too long without any progress. Will add a separate task for this.
SR would need to continue syncing up to the latest set uploaded by the dropped primary. Will add a test specifically for red cluster outside of the scale to zero case and see if we need any special handling.
ack will add some qa tests with this PR.
Yeah, I will add a task to handle this. We had brought this up before a while back, I was thinking to support this for search replicas within their own set of nodes if they exist and otherwise reject it, but I think blocking it outright makes sense to start. |
@@ -440,7 +440,7 @@ public ShardRouting moveToUnassigned(UnassignedInfo unassignedInfo) { | |||
assert state != ShardRoutingState.UNASSIGNED : this; | |||
final RecoverySource recoverySource; | |||
if (active()) { | |||
if (primary()) { | |||
if (primary() || isSearchOnly()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isSearchOnly() recovery would be EmptyStore or Existing, then search replica can turn red as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes this is possible, do you mean in terms of reporting cluster health? Right now the cluster would report as yellow we need to revisit the API to report accordingly.
server/src/main/java/org/opensearch/cluster/routing/IndexShardRoutingTable.java
Show resolved
Hide resolved
server/src/main/java/org/opensearch/cluster/routing/allocation/IndexMetadataUpdater.java
Show resolved
Hide resolved
… the AllAllocationIds set in the routing table Signed-off-by: Marc Handalian <[email protected]>
…e store cluster. This check had previously only checked for segrep Signed-off-by: Marc Handalian <[email protected]>
Signed-off-by: Marc Handalian <[email protected]>
…ases Signed-off-by: Marc Handalian <[email protected]>
❌ Gradle check result for 8e0240f: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
… a remote store cluster." reverting this, we already check for remote store earlier. This reverts commit 48ca1a3.
Signed-off-by: Marc Handalian <[email protected]>
@shwetathareja, got some time to work on this. I've added a test for this case to ensure a searcher is still searchable post primary failure. First when there is no writer replica and the cluster turns red and second when there is a writer-replica and turns yellow & the new primary writes to the store. This works without issue, however in the first case we should still be able to restore only writers if required and assign the primary to a new node, restore fails in this case because an open index with same name already exists in the cluster. , taking a look there. edit - this should be possible with _remotestore/_restore api, i think i'm calling wrong restore api in this test. edit 2 - updated remote store restore logic to support this & the test passes. |
… only writers when red Signed-off-by: Marc Handalian <[email protected]>
I don't see any existing QA package with remote store enabled to test full cluster / rolling restarts, looking at adding one |
❌ Gradle check result for f915a31: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Description
This PR includes multiple changes to search replica recovery to further decouple these shards from primaries.
Related Issues
Resolves #15952
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.