-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test failed in CI: omicron-cockroach-admin cockroach_cli::tests::test_node_decommission_compatibility
#6506
Comments
Weird! I think a panic there means stdout was empty, so (presumably?) the command failed. I'll add some logging that should help the next time it flakes, assuming that guess is correct. |
Hoping to shed light on the test flake in #6506
Hoping to shed light on the test flake in #6506
Some logging now! https://buildomat.eng.oxide.computer/wg/0/details/01J8QWRHJZ9MWKRFH6N67ZE6TA/L16FQY2MG9K1N2SdCCur5TZDpyIdjJBGkNDcoewBXHw7U8gH/01J8QWSDBXPAZZ83A3GDJ8JYAM#S5865
|
Hmm, interesting! So it looks like the test thinks cockroach is up and tries to decommission a node, but the node it's talking to isn't actually up yet. I think the logs confirm this. From https://buildomat.eng.oxide.computer/wg/0/artefact/01J8QWRHJZ9MWKRFH6N67ZE6TA/L16FQY2MG9K1N2SdCCur5TZDpyIdjJBGkNDcoewBXHw7U8gH/01J8QWSDBXPAZZ83A3GDJ8JYAM/01J8QZYHP5MQ110BP4NF9JSR6Q/omicron_cockroach_admin-953fa556bc995186-test_node_decommission_compatibility.43654.0.log?format=x-bunyan, we see these log lines (note the timestamps):
The last two are from omicron/test-utils/src/dev/mod.rs Lines 126 to 128 in c727c3f
Immediately following that is this match for populating the db: omicron/test-utils/src/dev/mod.rs Lines 130 to 139 in c727c3f
In this test we're using But if we switch over to the cockroach startup logfile (https://buildomat.eng.oxide.computer/wg/0/artefact/01J8QWRHJZ9MWKRFH6N67ZE6TA/L16FQY2MG9K1N2SdCCur5TZDpyIdjJBGkNDcoewBXHw7U8gH/01J8QWSDBXPAZZ83A3GDJ8JYAM/01J8QZYASHFKGGAKVXYBXM1ZNJ/cockroach.log?format=x-bunyan, which I believe is the right one based on the command line args matching what
This is nearly a full second later than when the test thought the database was up. My gut feeling is that |
This test failed on a CI run on #6499:
Unfortunately, I rebased the PR and have since lost the GitHub check URL for the failure...
Log showing the specific test failure:
https://buildomat.eng.oxide.computer/wg/0/details/01J6Q8748MZKW18KSEF4540CQZ/iy5XpuAznwyLAetcf2Mfjh1o9XO11hzknMn8Mr0yhl9xfXK3/01J6Q87HTCRRS09RECVKRHBBSR#S5100
Excerpt from the log showing the failure:
It looks like we are waiting for output from a CRDB CLI process that never seems to appear? The test has passed successfully on subsequent runs.
The text was updated successfully, but these errors were encountered: