Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fleet] Unskip test suite for agentless and replace waitForNextUpdate with waitFor #198125

Conversation

criamico
Copy link
Contributor

@criamico criamico commented Oct 29, 2024

Closes #189038
Closes #192126

Summary

Attempting again to fix failures related to waitForNextUpdate. I merged #197951 but it failed again right after merging it, this time I'm removing altogether waitForNextUpdate from the whole file and using waitFor or rerender as needed.

NOTE: This test never fails locally, I tried it many times and it never occurs. So it could be some setting in the c.i. that makes it different from the local test. There's also currently no way to use the flaky test runner.

@criamico criamico self-assigned this Oct 29, 2024
@criamico criamico added Team:Fleet Team label for Observability Data Collection Fleet team release_note:skip Skip the PR/issue when compiling release notes labels Oct 29, 2024
@criamico criamico marked this pull request as ready for review October 29, 2024 09:34
@criamico criamico requested review from a team as code owners October 29, 2024 09:34
@elasticmachine
Copy link
Contributor

Pinging @elastic/fleet (Team:Fleet)

@criamico criamico added the backport:prev-minor Backport to (8.x) the previous minor version (i.e. one version back from main) label Oct 29, 2024
@kpollich
Copy link
Member

NOTE: This test never fails locally, I tried it many times and it never occurs. So it could be some setting in the c.i. that makes it different from the local test. There's also currently no way to use the flaky test runner.

The only way I know of to get this to run a bunch of times is to wrap the whole test suite in a for loop, but even doing that I've never seen a failure come out of it 😞

@criamico
Copy link
Contributor Author

@elasticmachine merge upstream

@criamico
Copy link
Contributor Author

criamico commented Oct 29, 2024

The only way I know of to get this to run a bunch of times is to wrap the whole test suite in a for loop, but even doing that I've never seen a failure come out of it

@kpollich I suspect that the ci env is in some way different from the local env so it only fails there. However, even the changes merged yesterday to the testRenderer function didn't help :(

@opauloh
Copy link
Contributor

opauloh commented Oct 29, 2024

/ci

@opauloh
Copy link
Contributor

opauloh commented Oct 29, 2024

@criamico / @kpollich Since our team introduced most of these tests, would you like me to take over these flaky issues as part of the new Reliability Epic?

There are tasks to migrate waitForNextUpdate to waitFor , as well for identifying and documenting flakiness.

I can also see we actually missed adding await to a couple of other places that use waitFor as well, leading the assertions to be executed earlier than they should without actually waiting for the waitFor method. It probably fails on CI but not locally due to the slow execution time that can occasionally occur on CI.

Fixing those will also help with the React 18 preparation PR, where some tests are falling due to the missing await.

Also, since it's very easy to miss the addition of await, I'm also considering writing an ES Lint rule for our plugin to prevent missing adding await when using waitFor.

@criamico
Copy link
Contributor Author

@elasticmachine merge upstream

@criamico
Copy link
Contributor Author

criamico commented Oct 30, 2024

I'm also considering writing an ES Lint rule for our plugin to prevent missing adding await when using waitFor.

@opauloh I fixed it the missing await, but it would be great to have a rule to enforce it, it's pretty easy to miss it.

Since our team introduced most of these tests, would you like me to take over these flaky issues as part of the new Reliability Epic?

Let me know if you want me to keep this PR open or it's better to proceed with your epic. The only thing that concerns me is that keeping too many tests skipped might lead to not catching some bugs early, especially on serverless/agentless. @kpollich what do you think?

@elasticmachine
Copy link
Contributor

elasticmachine commented Oct 30, 2024

💔 Build Failed

Failed CI Steps

Test Failures

  • [job] [logs] FTR Configs #78 / alerting api integration security and spaces enabled - Group 4 Alerts alerts alerts connector adapters should use connector adapters correctly on system actions
  • [job] [logs] Jest Tests #15 / useSetupTechnology should update agentless policy name to match integration name if agentless is enabled
  • [job] [logs] Jest Tests #15 / useSetupTechnology should update agentless policy name to match integration name if agentless is enabled

Metrics [docs]

✅ unchanged

History

cc @criamico

@kpollich
Copy link
Member

@kpollich what do you think?

Since @opauloh's team is already tracking work here, it might make sense to have them take this over the finish line. @criamico do you feel like we have a path forward to get this PR green and land it, or is scope of work needed here larger? It seems like there's a lot of moving parts in the reliability epic Paulo linked, and I am not sure if we need to do a lot of that foundational work to fix the root cause of the flakiness here rather than just patching smaller things to get this PR green.

@criamico
Copy link
Contributor Author

Since @opauloh's team is already tracking work here, it might make sense to have them take this over the finish line.

I think it would be better if @opauloh team takes over this work, since they wrote most of the tests and intend to take care of the larger scope of it. I'll close this PR in favor of their solution.

@criamico criamico closed this Oct 30, 2024
@criamico criamico deleted the 189038_remove_waitfornextupdate_agentless branch October 30, 2024 16:12
@opauloh
Copy link
Contributor

opauloh commented Oct 31, 2024

Thanks! I already updated the relevant tickets to include the agentless scope of Fleet!

The only thing that concerns me is that keeping too many tests skipped might lead to not catching some bugs early, especially on serverless/agentless.

Yes, great concern, those tasks are top priority for this sprint, so we ensure they are not skipped for longer but also fixed in the large scope.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment