-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Batch Fetch] Fix for hasInitiatedFetching to fix allocation explain and manual reroute APIs #14972
[Batch Fetch] Fix for hasInitiatedFetching to fix allocation explain and manual reroute APIs #14972
Conversation
❌ Gradle check result for 2a43b36: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
2a43b36
to
f565b94
Compare
❌ Gradle check result for f565b94: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
❌ Gradle check result for a49b8a4: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
server/src/main/java/org/opensearch/gateway/ShardsBatchGatewayAllocator.java
Show resolved
Hide resolved
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #14972 +/- ##
============================================
- Coverage 71.78% 71.76% -0.03%
- Complexity 62694 62706 +12
============================================
Files 5160 5161 +1
Lines 294211 294370 +159
Branches 42553 42579 +26
============================================
+ Hits 211212 211263 +51
- Misses 65599 65728 +129
+ Partials 17400 17379 -21 ☔ View full report in Codecov by Sentry. |
server/src/internalClusterTest/java/org/opensearch/gateway/RecoveryFromGatewayIT.java
Show resolved
Hide resolved
server/src/main/java/org/opensearch/gateway/ShardsBatchGatewayAllocator.java
Show resolved
Hide resolved
server/src/internalClusterTest/java/org/opensearch/gateway/RecoveryFromGatewayIT.java
Show resolved
Hide resolved
server/src/internalClusterTest/java/org/opensearch/gateway/RecoveryFromGatewayIT.java
Show resolved
Hide resolved
Signed-off-by: Rahul Karajgikar <[email protected]>
❌ Gradle check result for b29502b: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Signed-off-by: Rahul Karajgikar <[email protected]>
❕ Gradle check result for f1bcdc8: UNSTABLE Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure. |
Issue already exists for flaky test: |
server/src/internalClusterTest/java/org/opensearch/gateway/RecoveryFromGatewayIT.java
Show resolved
Hide resolved
d08c425
into
opensearch-project:main
…and manual reroute APIs (#14972) * Fix for hasInitiatedFetching() in batch mode Signed-off-by: Rahul Karajgikar <[email protected]> (cherry picked from commit d08c425) Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
…and manual reroute APIs (#14972) (#14994) * Fix for hasInitiatedFetching() in batch mode (cherry picked from commit d08c425) Signed-off-by: Rahul Karajgikar <[email protected]> Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
…and manual reroute APIs (opensearch-project#14972) * Fix for hasInitiatedFetching() in batch mode Signed-off-by: Rahul Karajgikar <[email protected]>
…and manual reroute APIs (opensearch-project#14972) * Fix for hasInitiatedFetching() in batch mode Signed-off-by: Rahul Karajgikar <[email protected]>
Description
This change fixes the bug in
hasInitiatedFetching()
where it always returns true in batch mode, causing allocation explain to showAWAITING_INFO
even for shards for which decider is returning NO.The idea here is we want to populate shard store information (number of matching bytes per node) IN ADDITION to the decision made by deciders.
This function is meant to check if it is possible to get the above information, without triggering a new asyncFetch.
The intended behaviour is for this function to return true if a fetch has ever happened before, or is ongoing for a batch.
It should return false if there has never been a fetch for this batch.
This is so that it does not trigger a new asyncFetch call, but also uses the data from existing ones or wait for ongoing ones to populate shard store info.
Details of above points are mentioned in [#14903]
We start by comparing batch mode vs non-batch mode implementations.
Non-batch mode logic for deciding if shard is eligible for fetching
OpenSearch/server/src/main/java/org/opensearch/gateway/ReplicaShardAllocator.java
Lines 202 to 228 in fcc231d
Batch mode logic for deciding if shard is eligible for fetching - logic is same.
OpenSearch/server/src/main/java/org/opensearch/gateway/ReplicaShardBatchAllocator.java
Lines 171 to 196 in fcc231d
hasInitiatedFetching()
has a different implementation though - this is the key difference. In non batch-mode it works as intended, but in batch mode it always returns true.Details of fix for
hasinitiatedFetching()
Implementation in non-batch mode:
OpenSearch/server/src/main/java/org/opensearch/gateway/GatewayAllocator.java
Lines 351 to 353 in fcc231d
Here, we check if
asyncFetchStore
has an entry for thisshardId
.The intention here is to check if
fetchData()
call has ever happened for this shard before.asyncFetchStore
entries are populated as part ofGatewayAllocator$InternalReplicaShardAllocator.fetchData()
We are essentially using the fact that if
GatewayAllocator$InternalReplicaShardAllocator.fetchData()
was ever called for this shard, thenasyncFetchStore
would have an entryOpenSearch/server/src/main/java/org/opensearch/gateway/GatewayAllocator.java
Lines 313 to 355 in fcc231d
For batch mode, we also need a similar idea - we need to know if
fetchData()
call has ever happened for this batch before.Let's check the corresponding code for fetchData in batch mode and see if there is any property we can check to establish this fact.
Implementation in batch mode:
Corresponding to
asyncFetchStore
in non-batch mode, we havebatchIdToStoreShardBatch
.However, the logic in batch mode is different, this structure
batchIdToStoreShardBatch
is defined when we create the batches, which happens before any fetching/assignment logic is called.Reroute flow:
OpenSearch/server/src/main/java/org/opensearch/gateway/ShardsBatchGatewayAllocator.java
Lines 238 to 241 in fcc231d
Allocation explain flow:
OpenSearch/server/src/main/java/org/opensearch/gateway/ShardsBatchGatewayAllocator.java
Lines 444 to 458 in fcc231d
Batch creation logic:
Line 349 -
addBatch
updates thisbatchIdToStoreShardBatch
with the entry.OpenSearch/server/src/main/java/org/opensearch/gateway/ShardsBatchGatewayAllocator.java
Lines 333 to 355 in fcc231d
OpenSearch/server/src/main/java/org/opensearch/gateway/ShardsBatchGatewayAllocator.java
Lines 381 to 387 in fcc231d
Since
batchIdToStoreShardBatch
is getting updated beforefetchData()
, we cannot use it to determine if fetching has happened before.Corresponding to
GatewayAllocator$InternalReplicaShardAllocator.fetchData()
in non-batch mode, we haveShardsBatchGatewayAllocator$InternalReplicaBatchShardAllocator.fetchData()
in batch mode.Let's check the implementation of this function in batch mode to see if there are other ways.
OpenSearch/server/src/main/java/org/opensearch/gateway/ShardsBatchGatewayAllocator.java
Lines 561 to 575 in fcc231d
Nothing we can use here either:
OpenSearch/server/src/main/java/org/opensearch/gateway/ShardsBatchGatewayAllocator.java
Lines 584 to 627 in fcc231d
In the case the batch is empty or all nodes are ignored,
asyncFetcher.fetchData()
will not get called.If the batch is non empty, then
asyncFetcher.fetchData()
will get called. (line 619)asyncFetcher
is of typeAsyncShardBatchFetch
which extendsAsyncShardFetch
class, but does not overridefetchData()
. So we can checkAsyncShardFetch.fetchData()
for the exact implementation here.OpenSearch/server/src/main/java/org/opensearch/gateway/AsyncShardFetch.java
Line 146 in fcc231d
OpenSearch/server/src/main/java/org/opensearch/gateway/AsyncShardFetch.java
Lines 172 to 181 in fcc231d
We can see in lines 172 that the batch cache entries are created for any missing nodes here.
In line 173, we see if any node still needs to be fetched.
And if so, we mark those nodes as fetching and trigger
asyncFetch()
OpenSearch/server/src/main/java/org/opensearch/gateway/AsyncShardBatchFetch.java
Line 94 in fcc231d
Structure of batch cache ^
OpenSearch/server/src/main/java/org/opensearch/gateway/AsyncShardFetchCache.java
Lines 94 to 103 in fcc231d
initData()
has custom implementation for batch cache:OpenSearch/server/src/main/java/org/opensearch/gateway/AsyncShardBatchFetch.java
Lines 141 to 144 in fcc231d
So if this code has run and triggered the async Fetch, we can conclude that the following statements will be true:
So we can use these 2 facts as an invariant to see if any async fetching has actually happened before for a batch.
We add checks for these 2 facts in
hasInitiatedFetching()
to validate if a fetch has happened before or not.To check 1, we use:
To check 2, we use the
findNodesToFetch()
function.This function returns all nodes that have no data and also have no fetches initiated. So if we have a case where this function returns even 1 node, we can say return false from
hasInitiatedFetching()
, because we don't info from all nodes to populate all the shard store information.OpenSearch/server/src/main/java/org/opensearch/gateway/AsyncShardFetchCache.java
Lines 109 to 117 in fcc231d
Related Issues
Resolves #14903
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.
Testing
Added a new UT for batch mode and non batch mode to simulate allocation explain case.
Verified that allocation explain returns
NO
decision and does not showAWAITING_INFO
.Also manually tested manual reroutes on a 15 data node, 3 master node setup, and verified that shards with decision NO are not getting added to the batches now.
To repro this:
inFlightFetches
metric and arthas profiling)