Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrations fail on single node clusters due to unavailable shards exception #157968

Closed
rudolf opened this issue May 17, 2023 · 5 comments
Closed
Assignees
Labels
Feature:Migrations Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc

Comments

@rudolf
Copy link
Contributor

rudolf commented May 17, 2023

We are seeing frequent migration-related failures on CI with a message like:

Not enough active copies to meet shard count of [ALL] (have 1, needed 2)

E.g. #156117 (comment)

After speaking to the Elasticsearch team this appears to be a race condition in Elasticsearch that only happens on single node clusters. We create indices with "auto_expand_replicas": "0-1" and wait for a shards_acknowledged=true response. On a single node cluster this creates an index with 0 replicas.

info [o.e.c.m.MetadataCreateIndexService] [node-01] [.kibana_8.8.0_reindex_temp] creating index, cause [api], templates [], shards [1]/[1]

However, even if the create index API responds that all shards are available, there is a brief time where the index actually has 1 replica assigned which cannot/has not been assigned. ES then immediately adjusts the replicas down to 0

info [o.e.c.r.a.AllocationService] [node-01] updating number_of_replicas to [0] for indices [.kibana_8.8.0_reindex_temp]

However, in the brief time between these two messages, if Kibana indexes any data or searches against the index we'll get "Not enough active copies to meet shard count of [ALL] (have 1, needed 2)" errors.

This should be fixed upstream by ES, but in the meantime we can work around this problem by always waiting for the index status to turn "green".

Note: the error message is similar to #127136 but this is a different issue.

@rudolf rudolf added Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc Feature:Migrations labels May 17, 2023
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-core (Team:Core)

@rudolf
Copy link
Contributor Author

rudolf commented May 17, 2023

To do this we'd need to remove the if (res.acknowledged && res.shardsAcknowledged) check, something like:

--- a/packages/core/saved-objects/core-saved-objects-migration-server-internal/src/actions/create_index.ts
+++ b/packages/core/saved-objects/core-saved-objects-migration-server-internal/src/actions/create_index.ts
@@ -146,25 +146,20 @@ export const createIndex = ({
       AcknowledgeResponse,
       'create_index_succeeded'
     >((res) => {
-      if (res.acknowledged && res.shardsAcknowledged) {
-        // If the cluster state was updated and all shards started we're done
-        return TaskEither.right('create_index_succeeded');
-      } else {
-        // Otherwise, wait until the target index has a 'green' status meaning
-        // the primary (and on multi node clusters) the replica has been started
-        return pipe(
-          waitForIndexStatus({
-            client,
-            index: indexName,
-            timeout: DEFAULT_TIMEOUT,
-            status: 'green',
-          }),
-          TaskEither.map(() => {
-            /** When the index status is 'green' we know that all shards were started */
-            return 'create_index_succeeded';
-          })
-        );
-      }
+      // Otherwise, wait until the target index has a 'green' status meaning
+      // the primary (and on multi node clusters) the replica has been started
+      return pipe(
+        waitForIndexStatus({
+          client,
+          index: indexName,
+          timeout: DEFAULT_TIMEOUT,
+          status: 'green',
+        }),
+        TaskEither.map(() => {
+          /** When the index status is 'green' we know that all shards were started */
+          return 'create_index_succeeded';
+        })
+      );
     })
   );
 };

gsoldevila added a commit that referenced this issue May 17, 2023
…een (#157973)

Tackles #157968

When creating new indices during SO migrations, we used to rely on the
`res.acknowledged && res.shardsAcknowledged` of the
`esClient.indices.create(...)` to determine that the indices are ready
to use.

However, we believe that due to certain race conditions, this can cause
Kibana migrations to fail (refer to the [related
issue](#157968)).

This PR aims at fixing recent CI failures by adding a systematic
`waitForIndexStatus` after creating an index.
kibanamachine pushed a commit to kibanamachine/kibana that referenced this issue May 17, 2023
…een (elastic#157973)

Tackles elastic#157968

When creating new indices during SO migrations, we used to rely on the
`res.acknowledged && res.shardsAcknowledged` of the
`esClient.indices.create(...)` to determine that the indices are ready
to use.

However, we believe that due to certain race conditions, this can cause
Kibana migrations to fail (refer to the [related
issue](elastic#157968)).

This PR aims at fixing recent CI failures by adding a systematic
`waitForIndexStatus` after creating an index.

(cherry picked from commit 71125b1)
kibanamachine referenced this issue May 17, 2023
…urn green (#157973) (#157993)

# Backport

This will backport the following commits from `main` to `8.8`:
- [[Migrations] Systematically wait for newly created indices to turn
green (#157973)](#157973)

<!--- Backport version: 8.9.7 -->

### Questions ?
Please refer to the [Backport tool
documentation](https://github.com/sqren/backport)

<!--BACKPORT [{"author":{"name":"Gerard
Soldevila","email":"[email protected]"},"sourceCommit":{"committedDate":"2023-05-17T13:34:51Z","message":"[Migrations]
Systematically wait for newly created indices to turn green
(#157973)\n\nTackles
https://github.com/elastic/kibana/issues/157968\r\n\r\nWhen creating new
indices during SO migrations, we used to rely on
the\r\n`res.acknowledged && res.shardsAcknowledged` of
the\r\n`esClient.indices.create(...)` to determine that the indices are
ready\r\nto use.\r\n\r\nHowever, we believe that due to certain race
conditions, this can cause\r\nKibana migrations to fail (refer to the
[related\r\nissue](https://github.com/elastic/kibana/issues/157968)).\r\n\r\nThis
PR aims at fixing recent CI failures by adding a
systematic\r\n`waitForIndexStatus` after creating an
index.","sha":"71125b192e86ddc3d6747c5b14b92e669eba360f","branchLabelMapping":{"^v8.9.0$":"main","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["bug","Team:Core","release_note:skip","Feature:Migrations","backport:prev-minor","v8.8.0","v8.9.0","v8.8.1"],"number":157973,"url":"https://github.com/elastic/kibana/pull/157973","mergeCommit":{"message":"[Migrations]
Systematically wait for newly created indices to turn green
(#157973)\n\nTackles
https://github.com/elastic/kibana/issues/157968\r\n\r\nWhen creating new
indices during SO migrations, we used to rely on
the\r\n`res.acknowledged && res.shardsAcknowledged` of
the\r\n`esClient.indices.create(...)` to determine that the indices are
ready\r\nto use.\r\n\r\nHowever, we believe that due to certain race
conditions, this can cause\r\nKibana migrations to fail (refer to the
[related\r\nissue](https://github.com/elastic/kibana/issues/157968)).\r\n\r\nThis
PR aims at fixing recent CI failures by adding a
systematic\r\n`waitForIndexStatus` after creating an
index.","sha":"71125b192e86ddc3d6747c5b14b92e669eba360f"}},"sourceBranch":"main","suggestedTargetBranches":["8.8"],"targetPullRequestStates":[{"branch":"8.8","label":"v8.8.0","labelRegex":"^v(\\d+).(\\d+).\\d+$","isSourceBranch":false,"state":"NOT_CREATED"},{"branch":"main","label":"v8.9.0","labelRegex":"^v8.9.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/157973","number":157973,"mergeCommit":{"message":"[Migrations]
Systematically wait for newly created indices to turn green
(#157973)\n\nTackles
https://github.com/elastic/kibana/issues/157968\r\n\r\nWhen creating new
indices during SO migrations, we used to rely on
the\r\n`res.acknowledged && res.shardsAcknowledged` of
the\r\n`esClient.indices.create(...)` to determine that the indices are
ready\r\nto use.\r\n\r\nHowever, we believe that due to certain race
conditions, this can cause\r\nKibana migrations to fail (refer to the
[related\r\nissue](https://github.com/elastic/kibana/issues/157968)).\r\n\r\nThis
PR aims at fixing recent CI failures by adding a
systematic\r\n`waitForIndexStatus` after creating an
index.","sha":"71125b192e86ddc3d6747c5b14b92e669eba360f"}}]}]
BACKPORT-->

Co-authored-by: Gerard Soldevila <[email protected]>
@pgayvallet
Copy link
Contributor

pgayvallet commented May 22, 2023

@gsoldevila I guess #157973 should have closed this one?

EDIT: or maybe not given the last messages from @dmlemeshko on slack

@rudolf
Copy link
Contributor Author

rudolf commented May 24, 2023

Second attempt to fix this #158182

@rudolf
Copy link
Contributor Author

rudolf commented Apr 15, 2024

Closing as we have not seen further failures on CI

@rudolf rudolf closed this as completed Apr 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature:Migrations Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc
Projects
None yet
Development

No branches or pull requests

3 participants