Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] improper shard distribution in search nodes after scaling. #14747

Closed
Dileep-Dora opened this issue Jul 15, 2024 · 2 comments
Closed

[BUG] improper shard distribution in search nodes after scaling. #14747

Dileep-Dora opened this issue Jul 15, 2024 · 2 comments

Comments

@Dileep-Dora
Copy link

Describe the bug

When adding search nodes to existing cluster, ideal expectation is all search nodes having equal number of shards, but this is not happening. We had to delete all the indices and restore all at once in order to get the equal distribution, Which implies availability of overall service

Related component

Search:Searchable Snapshots

To Reproduce

  1. create a cluster with search nodes
  2. ingest some data in 2-3 indices
  3. take snapshots
  4. delete and restore on search nodes
  5. add extra search nodes and verify shards distribution

Expected behavior

After adding extra search nodes, all search nodes should have equal number of shards.

Additional Details

Opensearch
version: 2.13

@kkhatua
Copy link
Member

kkhatua commented Jul 24, 2024

Adding new nodes will not necessarily move shards from the older nodes to the new nodes to rebalance. This is because moving shards is essentially a peer recovery activity and consumes resources that could be critical to traffic.

Lucene shards tend to be sticky in nature and will stay on a node as long as there are no violations.

When users add nodes, it might be due to the cluster already having a high shard count or disk usage. In a scenario for the latter, adding new nodes will cause any old nodes exceeding threshold to relocate the shards to nodes with more disk space (in this case, the new nodes).

A simple way to achieve a balanced cluster after adding new nodes would be to explictly set the cluster.routing.allocation.total_shards_per_node to a value that will target approximately the average shards with the larger cluster's node count.

There has been a discussion of making this automatic, but the competing resources of shard migration is why this has not been automated. A monitoring system, however, could look at cat/alocation API and based on the skew periodically update this threshold and achieve the same effect.

(Closing this as it is working as designed)

@kkhatua kkhatua closed this as completed Jul 24, 2024
@github-project-automation github-project-automation bot moved this from 🆕 New to ✅ Done in Search Project Board Jul 24, 2024
@github-project-automation github-project-automation bot moved this from 🆕 New to ✅ Done in Shard Management Project Board Jul 24, 2024
@jed326
Copy link
Collaborator

jed326 commented Jul 24, 2024

Also for Searchable Snapshots specifically, we balance only by average primary shard count:

/**
* Performs heuristic, naive weight-based balancing for remote shards within the cluster by using average nodes per
* cluster as the metric for shard distribution.
* It does so without accounting for the local shards located on any nodes within the cluster.
*/
@Override
void balance() {
List<RoutingNode> remoteRoutingNodes = getRemoteRoutingNodes();
logger.trace("Performing balancing for remote shards.");
if (remoteRoutingNodes.isEmpty()) {
logger.debug("No eligible remote nodes found to perform balancing");
return;
}
final Map<String, Integer> nodePrimaryShardCount = calculateNodePrimaryShardCount(remoteRoutingNodes);
int totalPrimaryShardCount = nodePrimaryShardCount.values().stream().reduce(0, Integer::sum);
totalPrimaryShardCount += routingNodes.unassigned().getNumPrimaries();
int avgPrimaryPerNode = (totalPrimaryShardCount + routingNodes.size() - 1) / routingNodes.size();
ArrayDeque<RoutingNode> sourceNodes = new ArrayDeque<>();
ArrayDeque<RoutingNode> targetNodes = new ArrayDeque<>();
for (RoutingNode node : remoteRoutingNodes) {
if (nodePrimaryShardCount.get(node.nodeId()) > avgPrimaryPerNode) {
sourceNodes.add(node);
} else if (nodePrimaryShardCount.get(node.nodeId()) < avgPrimaryPerNode) {
targetNodes.add(node);
}
}
while (sourceNodes.isEmpty() == false && targetNodes.isEmpty() == false) {
RoutingNode sourceNode = sourceNodes.poll();
tryRebalanceNode(sourceNode, targetNodes, avgPrimaryPerNode, nodePrimaryShardCount);
}
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Status: ✅ Done
Development

No branches or pull requests

4 participants