[Fleet] Unskip test suite for agentless and replace waitForNextUpdate with waitFor #198125

criamico · 2024-10-29T09:32:18Z

Closes #189038
Closes #192126

Summary

Attempting again to fix failures related to waitForNextUpdate. I merged #197951 but it failed again right after merging it, this time I'm removing altogether waitForNextUpdate from the whole file and using waitFor or rerender as needed.

NOTE: This test never fails locally, I tried it many times and it never occurs. So it could be some setting in the c.i. that makes it different from the local test. There's also currently no way to use the flaky test runner.

… with waitFor

elasticmachine · 2024-10-29T09:34:18Z

Pinging @elastic/fleet (Team:Fleet)

kpollich · 2024-10-29T12:28:48Z

NOTE: This test never fails locally, I tried it many times and it never occurs. So it could be some setting in the c.i. that makes it different from the local test. There's also currently no way to use the flaky test runner.

The only way I know of to get this to run a bunch of times is to wrap the whole test suite in a for loop, but even doing that I've never seen a failure come out of it 😞

criamico · 2024-10-29T13:40:58Z

@elasticmachine merge upstream

criamico · 2024-10-29T13:45:17Z

The only way I know of to get this to run a bunch of times is to wrap the whole test suite in a for loop, but even doing that I've never seen a failure come out of it

@kpollich I suspect that the ci env is in some way different from the local env so it only fails there. However, even the changes merged yesterday to the testRenderer function didn't help :(

opauloh · 2024-10-29T18:44:12Z

/ci

opauloh · 2024-10-29T19:33:44Z

@criamico / @kpollich Since our team introduced most of these tests, would you like me to take over these flaky issues as part of the new Reliability Epic?

There are tasks to migrate waitForNextUpdate to waitFor , as well for identifying and documenting flakiness.

I can also see we actually missed adding await to a couple of other places that use waitFor as well, leading the assertions to be executed earlier than they should without actually waiting for the waitFor method. It probably fails on CI but not locally due to the slow execution time that can occasionally occur on CI.

Fixing those will also help with the React 18 preparation PR, where some tests are falling due to the missing await.

Also, since it's very easy to miss the addition of await, I'm also considering writing an ES Lint rule for our plugin to prevent missing adding await when using waitFor.

criamico · 2024-10-30T08:22:45Z

@elasticmachine merge upstream

criamico · 2024-10-30T10:44:54Z

I'm also considering writing an ES Lint rule for our plugin to prevent missing adding await when using waitFor.

@opauloh I fixed it the missing await, but it would be great to have a rule to enforce it, it's pretty easy to miss it.

Since our team introduced most of these tests, would you like me to take over these flaky issues as part of the new Reliability Epic?

Let me know if you want me to keep this PR open or it's better to proceed with your epic. The only thing that concerns me is that keeping too many tests skipped might lead to not catching some bugs early, especially on serverless/agentless. @kpollich what do you think?

elasticmachine · 2024-10-30T12:11:46Z

💔 Build Failed

Buildkite Build
Commit: 3507556

Failed CI Steps

Test Failures

[job] [logs] FTR Configs #78 / alerting api integration security and spaces enabled - Group 4 Alerts alerts alerts connector adapters should use connector adapters correctly on system actions
[job] [logs] Jest Tests #15 / useSetupTechnology should update agentless policy name to match integration name if agentless is enabled
[job] [logs] Jest Tests #15 / useSetupTechnology should update agentless policy name to match integration name if agentless is enabled

Metrics [docs]

✅ unchanged

History

💚 Build #246851 succeeded 17d77bd
💛 Build #246695 was flaky 4b3a224
💔 Build #246561 failed 4b3a224

cc @criamico

kpollich · 2024-10-30T12:40:52Z

@kpollich what do you think?

Since @opauloh's team is already tracking work here, it might make sense to have them take this over the finish line. @criamico do you feel like we have a path forward to get this PR green and land it, or is scope of work needed here larger? It seems like there's a lot of moving parts in the reliability epic Paulo linked, and I am not sure if we need to do a lot of that foundational work to fix the root cause of the flakiness here rather than just patching smaller things to get this PR green.

criamico · 2024-10-30T13:14:29Z

Since @opauloh's team is already tracking work here, it might make sense to have them take this over the finish line.

I think it would be better if @opauloh team takes over this work, since they wrote most of the tests and intend to take care of the larger scope of it. I'll close this PR in favor of their solution.

opauloh · 2024-10-31T01:44:14Z

Thanks! I already updated the relevant tickets to include the agentless scope of Fleet!

The only thing that concerns me is that keeping too many tests skipped might lead to not catching some bugs early, especially on serverless/agentless.

Yes, great concern, those tasks are top priority for this sprint, so we ensure they are not skipped for longer but also fixed in the large scope.

[Fleet] Unskip test suite for agentless and replace waitForNextUpdate…

4b3a224

… with waitFor

criamico self-assigned this Oct 29, 2024

criamico added Team:Fleet Team label for Observability Data Collection Fleet team release_note:skip Skip the PR/issue when compiling release notes labels Oct 29, 2024

criamico marked this pull request as ready for review October 29, 2024 09:34

criamico requested review from a team as code owners October 29, 2024 09:34

criamico added the backport:prev-minor Backport to (8.x) the previous minor version (i.e. one version back from main) label Oct 29, 2024

elasticmachine and others added 2 commits October 30, 2024 09:22

Merge branch 'main' into 189038_remove_waitfornextupdate_agentless

17d77bd

Add missing await

3507556

criamico closed this Oct 30, 2024

criamico deleted the 189038_remove_waitfornextupdate_agentless branch October 30, 2024 16:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Fleet] Unskip test suite for agentless and replace waitForNextUpdate with waitFor #198125

[Fleet] Unskip test suite for agentless and replace waitForNextUpdate with waitFor #198125

criamico commented Oct 29, 2024 •

edited

Loading

elasticmachine commented Oct 29, 2024

kpollich commented Oct 29, 2024

criamico commented Oct 29, 2024

criamico commented Oct 29, 2024 •

edited

Loading

opauloh commented Oct 29, 2024

opauloh commented Oct 29, 2024 •

edited

Loading

criamico commented Oct 30, 2024

criamico commented Oct 30, 2024 •

edited

Loading

elasticmachine commented Oct 30, 2024 •

edited

Loading

kpollich commented Oct 30, 2024

criamico commented Oct 30, 2024

opauloh commented Oct 31, 2024

[Fleet] Unskip test suite for agentless and replace waitForNextUpdate with waitFor #198125

[Fleet] Unskip test suite for agentless and replace waitForNextUpdate with waitFor #198125

Conversation

criamico commented Oct 29, 2024 • edited Loading

Summary

elasticmachine commented Oct 29, 2024

kpollich commented Oct 29, 2024

criamico commented Oct 29, 2024

criamico commented Oct 29, 2024 • edited Loading

opauloh commented Oct 29, 2024

opauloh commented Oct 29, 2024 • edited Loading

criamico commented Oct 30, 2024

criamico commented Oct 30, 2024 • edited Loading

elasticmachine commented Oct 30, 2024 • edited Loading

💔 Build Failed

Failed CI Steps

Test Failures

Metrics [docs]

History

kpollich commented Oct 30, 2024

criamico commented Oct 30, 2024

opauloh commented Oct 31, 2024

criamico commented Oct 29, 2024 •

edited

Loading

criamico commented Oct 29, 2024 •

edited

Loading

opauloh commented Oct 29, 2024 •

edited

Loading

criamico commented Oct 30, 2024 •

edited

Loading

elasticmachine commented Oct 30, 2024 •

edited

Loading