-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Weighted Shard Routing] Fail open requests on search shard failures #5072
Merged
Bukhtawar
merged 37 commits into
opensearch-project:main
from
anshu1106:feature/poc-fail-open
Jan 10, 2023
Merged
Changes from all commits
Commits
Show all changes
37 commits
Select commit
Hold shift + click to select a range
02f1fa8
Fail open changes
9e69711
Refactor code
5ee2b02
Merge branch 'main' into feature/poc-fail-open
c0e5b67
Fix test
f3006a5
Add integ test with network disruption and refactor code
683786d
Merge branch 'main' into feature/poc-fail-open
2834af8
Add log statement
4ad2bed
Add integ test for search aggregations and flag open flag
644fa60
Refactor integ tests
7e0df4a
Add integ test for multiget with fail open
a22b341
Add changelog
614d85e
Refactor code
a02f332
Make fail open enabled by default
aac9d4b
Fail open on unassigned shard copies
836af1c
Add tests
2dc6699
Fix tests
16e295e
Merge branch 'main' into feature/poc-fail-open
605a4dc
Fix precommit build
4a90174
Fix test
9ce9be8
Change internal error logic to check for 5xx status
e76deba
Fix test
048eea6
Merge branch 'main' into feature/poc-fail-open
a429c21
Fix integ test failure
d2cf1a2
Merge branch 'main' into feature/poc-fail-open
d0a48e3
Address review comments
fca7ae9
Fix precommit failure
14499b0
Merge branch 'main' into feature/poc-fail-open
6f7aafa
Merge branch 'main' into feature/poc-fail-open
4c8e50c
Fix tests
df3e416
Modify changelog
cc6d1e9
Address review comments
7ca22c5
Remove duplicate shards from routing interator
f376eab
add test to valiate request state persistence
33a04b5
fix test comment
ef961bd
Address review comments
97ea739
log exception
7df7358
Address review comments
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
643 changes: 640 additions & 3 deletions
643
server/src/internalClusterTest/java/org/opensearch/search/SearchWeightedRoutingIT.java
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
156 changes: 156 additions & 0 deletions
156
server/src/main/java/org/opensearch/cluster/routing/FailAwareWeightedRouting.java
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,156 @@ | ||
/* | ||
* SPDX-License-Identifier: Apache-2.0 | ||
* | ||
* The OpenSearch Contributors require contributions made to | ||
* this file be licensed under the Apache-2.0 license or a | ||
* compatible open source license. | ||
*/ | ||
|
||
package org.opensearch.cluster.routing; | ||
|
||
import org.apache.logging.log4j.LogManager; | ||
import org.apache.logging.log4j.Logger; | ||
import org.apache.logging.log4j.message.ParameterizedMessage; | ||
import org.opensearch.OpenSearchException; | ||
import org.opensearch.action.search.SearchShardIterator; | ||
import org.opensearch.cluster.ClusterState; | ||
import org.opensearch.cluster.metadata.WeightedRoutingMetadata; | ||
import org.opensearch.cluster.node.DiscoveryNode; | ||
import org.opensearch.index.shard.ShardId; | ||
import org.opensearch.rest.RestStatus; | ||
import org.opensearch.search.SearchShardTarget; | ||
|
||
import java.util.List; | ||
import java.util.Map; | ||
import java.util.stream.Stream; | ||
|
||
/** | ||
* This class contains logic to find next shard to retry search request in case of failure from other shard copy. | ||
* This decides if retryable shard search requests can be tried on shard copies present in data | ||
* nodes whose attribute value weight for weighted shard routing is set to zero. | ||
*/ | ||
|
||
public enum FailAwareWeightedRouting { | ||
INSTANCE; | ||
|
||
private static final Logger logger = LogManager.getLogger(FailAwareWeightedRouting.class); | ||
|
||
private final static List<RestStatus> internalErrorRestStatusList = List.of( | ||
RestStatus.INTERNAL_SERVER_ERROR, | ||
RestStatus.BAD_GATEWAY, | ||
RestStatus.SERVICE_UNAVAILABLE, | ||
RestStatus.GATEWAY_TIMEOUT | ||
); | ||
|
||
public static FailAwareWeightedRouting getInstance() { | ||
return INSTANCE; | ||
|
||
} | ||
|
||
/** | ||
* * | ||
* @return true if exception is due to cluster availability issues | ||
*/ | ||
private boolean isInternalFailure(Exception exception) { | ||
if (exception instanceof OpenSearchException) { | ||
// checking for 5xx failures | ||
return internalErrorRestStatusList.contains(((OpenSearchException) exception).status()); | ||
} | ||
return false; | ||
} | ||
|
||
/** | ||
* This function checks if the shard is present in data node with weighted routing weight set to 0, | ||
* In such cases we fail open, if shard search request for the shard from other shard copies fail with non | ||
* retryable exception. | ||
* | ||
* @param nodeId the node with the shard copy | ||
* @return true if the node has attribute value with shard routing weight set to zero, else false | ||
*/ | ||
private boolean isWeighedAway(String nodeId, ClusterState clusterState) { | ||
DiscoveryNode node = clusterState.nodes().get(nodeId); | ||
WeightedRoutingMetadata weightedRoutingMetadata = clusterState.metadata().weightedRoutingMetadata(); | ||
if (weightedRoutingMetadata != null) { | ||
WeightedRouting weightedRouting = weightedRoutingMetadata.getWeightedRouting(); | ||
if (weightedRouting != null && weightedRouting.isSet()) { | ||
// Fetch weighted routing attributes with weight set as zero | ||
Stream<String> keys = weightedRouting.weights() | ||
.entrySet() | ||
.stream() | ||
.filter(entry -> entry.getValue().intValue() == WeightedRoutingMetadata.WEIGHED_AWAY_WEIGHT) | ||
.map(Map.Entry::getKey); | ||
|
||
for (Object key : keys.toArray()) { | ||
if (node.getAttributes().get(weightedRouting.attributeName()).equals(key.toString())) { | ||
return true; | ||
} | ||
} | ||
} | ||
} | ||
return false; | ||
} | ||
|
||
/** | ||
* This function returns next shard copy to retry search request in case of failure from previous copy returned | ||
* by the iterator. It has the logic to fail open ie request shard copies present in nodes with weighted shard | ||
* routing weight set to zero | ||
* | ||
* @param shardIt Shard Iterator containing order in which shard copies for a shard need to be requested | ||
* @return the next shard copy | ||
*/ | ||
public SearchShardTarget findNext(final SearchShardIterator shardIt, ClusterState clusterState, Exception exception) { | ||
SearchShardTarget next = shardIt.nextOrNull(); | ||
while (next != null && isWeighedAway(next.getNodeId(), clusterState)) { | ||
SearchShardTarget nextShard = next; | ||
if (canFailOpen(nextShard.getShardId(), exception, clusterState)) { | ||
logger.info( | ||
() -> new ParameterizedMessage("{}: Fail open executed due to exception {}", nextShard.getShardId(), exception) | ||
); | ||
break; | ||
} | ||
next = shardIt.nextOrNull(); | ||
} | ||
return next; | ||
} | ||
|
||
/** | ||
* This function returns next shard copy to retry search request in case of failure from previous copy returned | ||
* by the iterator. It has the logic to fail open ie request shard copies present in nodes with weighted shard | ||
* routing weight set to zero | ||
* | ||
* @param shardsIt Shard Iterator containing order in which shard copies for a shard need to be requested | ||
* @return the next shard copy | ||
*/ | ||
public ShardRouting findNext(final ShardsIterator shardsIt, ClusterState clusterState, Exception exception) { | ||
ShardRouting next = shardsIt.nextOrNull(); | ||
|
||
while (next != null && isWeighedAway(next.currentNodeId(), clusterState)) { | ||
ShardRouting nextShard = next; | ||
if (canFailOpen(nextShard.shardId(), exception, clusterState)) { | ||
logger.info(() -> new ParameterizedMessage("{}: Fail open executed due to exception {}", nextShard.shardId(), exception)); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nitL incorrect usage of logger with exception |
||
break; | ||
} | ||
next = shardsIt.nextOrNull(); | ||
} | ||
return next; | ||
} | ||
|
||
/** | ||
* * | ||
* @return true if can fail open ie request shard copies present in nodes with weighted shard | ||
* routing weight set to zero | ||
*/ | ||
private boolean canFailOpen(ShardId shardId, Exception exception, ClusterState clusterState) { | ||
return isInternalFailure(exception) || hasInActiveShardCopies(clusterState, shardId); | ||
} | ||
|
||
private boolean hasInActiveShardCopies(ClusterState clusterState, ShardId shardId) { | ||
List<ShardRouting> shards = clusterState.routingTable().shardRoutingTable(shardId).shards(); | ||
for (ShardRouting shardRouting : shards) { | ||
if (!shardRouting.active()) { | ||
return true; | ||
} | ||
} | ||
return false; | ||
} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is the most apt way to create singletons, you could achieve it by a static property