Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Scaling down nodePool doesn't reassing all shards #870

Closed
pbagona opened this issue Sep 11, 2024 · 4 comments
Closed

[BUG] Scaling down nodePool doesn't reassing all shards #870

pbagona opened this issue Sep 11, 2024 · 4 comments
Assignees
Labels
bug Something isn't working

Comments

@pbagona
Copy link

pbagona commented Sep 11, 2024

What is the bug?

When scaling down nodePool, Operator shows messages about drain of removed node, but after it finished health status is Red and some shards remain unassigned.

How can one reproduce the bug?

Current setup is I have 4 nodepools - master with 3 replica (role master), nodes with 2 replica each 300Gi storage (role data+ingest), ingests with 3 replicas each 100Gi storage (role ingests) and data with 5 replica each 1Ti storage (role data). Scaling down nodePool nodes introduces issues with shard allocation and health status of cluster.

What is the expected behavior?

Expected behavior is that after Operator drains node and decommissions it, cluster health status will be Green.

What is your host/environment?

k8s v1.27.13, OpenSearch k8s operator 2.5.1, OpenSearch cluster 1.3.16

Do you have any screenshots?

yes I will post screenshots

Do you have any additional context?

nodePool nodes and data existed first, then ingests was added and now goal is to remove old nodes nodePool.

I did this same setup with OpenSearch cluster version 2.XX on different k8s cluster and it worked as expected - when one nodePool was removed, operator drained nodes of that nodePool 1 by 1 and removed them and there was no interruption to service and after it finished cluster health status remained green.

When doing these same steps on OpensSearch cluster version 1.3.16, it results it cluster health status RED and some shard unable to allocate. Sometimes it is 1 shard that remains unallocated, sometimes more shards.

I tried removing nodePool in manifest specification all at once and then I tried just to scale it down by one replica but got same outcome.

In operator logs, I see that it correctly waits for node to drain and then decomissions it, but at that very moment cluster goes into RED state and I see error with allocations.

When I add removed nodepool/replica back to manifest, after pod is up and running status of cluster is back to Green and everything is behaving normally.

I tried this several times and get one of few errors about allocation every time.

Also as seen in screenshots bellow, before scaling down, nodes have 12.3gb used storage for disk.indicies but when one of nodes in nodePool gets removed, number of shards seems to be redistributed, but disk.indicies number stays same for all nodes, or changes just minimally but does not cover 12.3gb that should have been relocated to remaining nodes... and when nodePool is scaled back up to original and removed Pod mounts its old PV, everything is back to normal green state.

{"level":"debug","ts":"2024-09-11T17:39:26.493Z","logger":"events","msg":"Start to Exclude int2-opensearch/int2-opensearch","type":"Normal","object":{"kind":"OpenSearchCluster","namespace":"int2-opensearch","name":"int2-opensearch","uid":"4b093d1c-5644-411a-a010-af0c78faf969","apiVersion":"opensearch.opster.io/v1","resourceVersion":"173437926"},"reason":"Scaler"}
{"level":"info","ts":"2024-09-11T17:39:26.531Z","msg":"Group: nodes, Node int2-opensearch-nodes-1 is drained","controller":"opensearchcluster","controllerGroup":"opensearch.opster.io","controllerKind":"OpenSearchCluster","OpenSearchCluster":{"name":"int2-opensearch","namespace":"int2-opensearch"},"namespace":"int2-opensearch","name":"int2-opensearch","reconcileID":"b99653cd-8ca2-46c7-ba4a-b558f966345a"}
{"level":"info","ts":"2024-09-11T17:39:26.546Z","msg":"Reconciling OpenSearchCluster","controller":"opensearchcluster","controllerGroup":"opensearch.opster.io","controllerKind":"OpenSearchCluster","OpenSearchCluster":{"name":"int2-opensearch","namespace":"int2-opensearch"},"namespace":"int2-opensearch","name":"int2-opensearch","reconcileID":"93813285-fb36-4c9e-b2d1-1b2d08e28df5","cluster":{"name":"int2-opensearch","namespace":"int2-opensearch"}}
...
...
{"level":"debug","ts":"2024-09-11T17:39:26.637Z","logger":"events","msg":"Start to Drain int2-opensearch/int2-opensearch","type":"Normal","object":{"kind":"OpenSearchCluster","namespace":"int2-opensearch","name":"int2-opensearch","uid":"4b093d1c-5644-411a-a010-af0c78faf969","apiVersion":"opensearch.opster.io/v1","resourceVersion":"173438036"},"reason":"Scaler"}
{"level":"debug","ts":"2024-09-11T17:39:26.637Z","logger":"events","msg":"Start to decreaseing node int2-opensearch-nodes-1 on nodes ","type":"Normal","object":{"kind":"OpenSearchCluster","namespace":"int2-opensearch","name":"int2-opensearch","uid":"4b093d1c-5644-411a-a010-af0c78faf969","apiVersion":"opensearch.opster.io/v1","resourceVersion":"173438036"},"reason":"Scaler"}

Cluster health status

{
  "cluster_name" : "int2-opensearch",
  "status" : "red",
  "timed_out" : false,
  "number_of_nodes" : 12,
  "number_of_data_nodes" : 6,
  "discovered_master" : true,
  "active_primary_shards" : 92,
  "active_shards" : 183,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 1,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 99.45652173913044
}

Allocation before change

image

Example of allocation after change

image image

Example of unallocated shard explanation

{
  "index" : "***********",
  "shard" : 1,
  "primary" : true,
  "current_state" : "unassigned",
  "unassigned_info" : {
    "reason" : "NODE_LEFT",
    "at" : "2024-09-11T18:04:19.246Z",
    "details" : "node_left [pDXfgCn9TQuRA5bGR1DKPw]",
    "last_allocation_status" : "no_valid_shard_copy"
  },
  "can_allocate" : "no_valid_shard_copy",
  "allocate_explanation" : "cannot allocate because all found copies of the shard are either stale or corrupt",
  "node_allocation_decisions" : [

EDIT:
I did tried it again in order to collect more and noticed that when I scale down nodePool nodes from 2 to 1 ... Operator goes from int2-opensearch-nodes-0, int2-opensearch-nodes-1 to int2-opensearch-nodes-0 and drains node int2-opensearch-nodes-1, during this process, it reallocates some shards to node that is being drained and then node pod is terminated and removed from cluster and logs from operator are as I posted above

image

@pbagona pbagona added bug Something isn't working untriaged Issues that have not yet been triaged labels Sep 11, 2024
@prudhvigodithi prudhvigodithi removed the untriaged Issues that have not yet been triaged label Sep 12, 2024
@prudhvigodithi
Copy link
Member

[Triage]
Thanks @pbagona for the detailed description, I assume this has to do something with 1.3.16 version of OpenSearch (since as you mentioned it works with 2.x). Also since 1.3.x is just in maintenance mode I would recommend to use the latest 2.x version of OpenSearch.

Also when the state is RED have to tried to scale down to zero and then scale up (or a fresh restart)?

Thank you
@swoehrl-mw @getsaurabh02

@swoehrl-mw
Copy link
Collaborator

I concur with @prudhvigodithi here. This looks like a problem with opensearch itself. From your description opensearch is not able to correctly recover some shards if one of the replicas is removed. Since the 1.x version is no longer developed it does not make sense to implement special logic for this in the operator.

@pbagona
Copy link
Author

pbagona commented Oct 2, 2024

@prudhvigodithi I tried to do fresh restart/scale down&up and status stayed same.

thanks guys for help and confirmation where the issue is. We are running on 2.x where we can, but on some k8s clusters we need to keep 1.x due to APP compatibility. They should provide new version which will work with Opensearch 2.x next year, so at least now we have another reason to push them to provide it ASAP.

For now I forwarded this info in team to be super careful when scaling down and removing replicas.

Should I close issue with "Close as not planned"?

@prudhvigodithi
Copy link
Member

Thanks for confirmation @pbagona, I will close this issue for now, please feel free to comment or re-open if required.
@swoehrl-mw @getsaurabh02.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: ✅ Done
Development

No branches or pull requests

3 participants