Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Truncate errors at 256 characters, fix flaky test. #556

Merged
merged 3 commits into from
Sep 5, 2024

Conversation

dblock
Copy link
Member

@dblock dblock commented Sep 5, 2024

Description

I was trying to debug #555 and couldn't reproduce it locally. I wish I saw the error in CI, but we're being too clever with truncation of the error message. Be less clever in this PR, which produces the follow output.

ERROR   RESPONSE STATUS (Expected status 200, but received 400: application/json. [move_allocation] can't move 0, failed to find it on node {opensearch-node1}{FHz2nfjpSWCchpbn2wjpXA}{CMFbSOh-S0ikZznn5M5qKA}{172.18.0.3}{172.18.0.3:9300}{dimr}{zone=zoneA, shard_indexing_pressure_ena, ...)

The relocation test is flaky because sometimes the shard is allocated on opensearch-node-1 and sometimes on -2.

Three possible fixes:

  1. Query the routing status to find out the node on which the shard for the newly created movies index is allocated, then relocate it to the other one. The problem is to know what this "other" node is in the test.
  2. https://opensearch.slack.com/archives/C04UM4D6XN2/p1725549928203249 suggested adding index.routing.allocation.include.zone: zoneA which would put the shard on node1, disabling rebalance, then removing that setting.
  3. Attempting to force the shard onto node1 in a prologue and ignore errors.

I chose (3) because it's simpler to think about.

Closes #555.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Copy link
Contributor

github-actions bot commented Sep 5, 2024

Changes Analysis

Commit SHA: 6a0f9c4
Comparing To SHA: 62e21f0

API Changes

Summary

NO CHANGES

Report

The full API changes report is available at: https://github.com/opensearch-project/opensearch-api-specification/actions/runs/10724577950/artifacts/1897365853

API Coverage

Before After Δ
Covered (%) 533 (52.2 %) 533 (52.2 %) 0 (0 %)
Uncovered (%) 488 (47.8 %) 488 (47.8 %) 0 (0 %)
Unknown 26 26 0

Copy link
Contributor

github-actions bot commented Sep 5, 2024

Spec Test Coverage Analysis

Total Tested
559 267 (47.76 %)

@dblock dblock changed the title Truncate errors at 256 characters. Truncate errors at 256 characters, fix flaky test. Sep 5, 2024
@@ -51,5 +63,8 @@ chapters:
shard: 0
from_node: opensearch-node1
to_node: opensearch-node2
retry:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shard may be still relocating, retry.

@@ -152,8 +152,9 @@ export class ConsoleResultLogger implements ResultLogger {
}

#maybe_shorten_error_message(message: string | undefined): string | undefined {
if (message === undefined || message.length <= 128 || this._verbose) return message
const part = message.split(',')[0]
const cut_at = 256
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: This is better off as a configurable number in the constructor, esp when we refactor the test framework to be a standalone product.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll leave it as is for now, but we can expose it as an option when someone actually wants to change this value programmatically.

@dblock dblock merged commit f150005 into opensearch-project:main Sep 5, 2024
18 checks passed
@dblock dblock deleted the truncate-at-256 branch September 5, 2024 19:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Reroute an index shard between nodes test is flaky
2 participants