-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] RemoteClusterClientTests.testConnectAndExecuteRequest is flaky #12338
Comments
This test in not related to RemoteStore. This seems related to cross cluster, can we create a label for cross cluster? |
@peternied I've got some circumstantial evidence from local testing that suggests #11957 may have introduced this flakiness. Can you take a look? |
1000 iterations, couldn't reproduce locally.
Reviewing how OpenSearch/server/src/main/java/org/opensearch/transport/ClusterConnectionManager.java Lines 269 to 286 in 87ac374
@andrross Did you have theories on how that PR impacted this test case? Otherwise I'm leaning towards closing this 'not reproducible' and we can always reopen if rediscovered. |
@peternied No theories, sorry! @kotwanikunal was doing some testing with a local Jenkins instance running |
Seeing this test failures. https://build.ci.opensearch.org/job/gradle-check/35068/ |
Muting the test until we provide the fix. |
Already muted as part of #12720 |
This test in not related to RemoteStore. This is related to cross cluster. |
Ran the following command
Also ran the above command for different set of iterations such as 1000, 10000 and 100000, but everytime it seems to be passing |
@mohitamg Did you remove the |
I didn't @andrross , should I remove it and then run? |
Commented
|
Result for 20k iterations
|
I think the test failed because of some non ideal criteria, likely some sort of resource constraint because of which the connection could not be established as expected. Running the test in silo rules out the resource constraint which can probably lead to a failure. Wondering if we can introduce a resource constraint in the test case to validate this. Also, can running the invalid API path on constraint resources I mean can we assert if the node is healthy if we reach |
Describe the bug
org.opensearch.transport.RemoteClusterClientTests.testConnectAndExecuteRequest seems to be able to get network exceptions during this workflow
Related component
Storage:Remote
To Reproduce
Initial failure on developer desktop, was not able to reproduce it on rerun
Expected behavior
All tests should pass reliably
Additional Details
Host/Environment (please complete the following information):
The text was updated successfully, but these errors were encountered: