Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Make tests fail fast #15505

Open
andrross opened this issue Aug 29, 2024 · 7 comments · May be fixed by #15536
Open

[Feature Request] Make tests fail fast #15505

andrross opened this issue Aug 29, 2024 · 7 comments · May be fixed by #15536
Labels
Build Build Tasks/Gradle Plugin, groovy scripts, build tools, Javadoc enforcement. discuss Issues intended to help drive brainstorming and decision making enhancement Enhancement or improvement to existing feature or request

Comments

@andrross
Copy link
Member

andrross commented Aug 29, 2024

Is your feature request related to a problem? Please describe

This is potentially a bad idea, but our test suite takes about 45 minutes or so to complete, and when we encounter flaky failures we retry the whole test run until it passes. While the flaky tests failures are the root cause of the issue, I'm wondering if we can slightly improve the experience by configuring gradle to abort the test run after the first failure is encountered so we can more quickly start the retry.

Describe the solution you'd like

Add the following to build.gradle:

tasks.withType(Test).configureEach { Test test ->
    failFast = true
}

As soon as a failure is encountered, Gradle will stop the currently executing test target, wait for all forked JVMs to complete whatever task they were on, and then exit with a failure. Note that it doesn't abort the concurrently running tasks but it will not execute any tasks that had not yet started.

Related component

Build

Describe alternatives you've considered

Do nothing. There are some downsides here. If your PR introduced multiple failures you'll have to knock them out one-by-one versus getting a single test report with all failures. This will also change the number of data points we're collecting for the test failure metrics since test runs will bail out after a single failure.

Additional context

The fail fast behavior can be overridden on the command line for local tests:

./gradlew :server:internalClusterTest --no-fail-fast
@andrross andrross added enhancement Enhancement or improvement to existing feature or request discuss Issues intended to help drive brainstorming and decision making untriaged labels Aug 29, 2024
@github-actions github-actions bot added the Build Build Tasks/Gradle Plugin, groovy scripts, build tools, Javadoc enforcement. label Aug 29, 2024
@mgodwan
Copy link
Member

mgodwan commented Aug 29, 2024

Thanks @andrross for putting this proposal out. I agree that we should provide a way to fail fast and opt out of it as suggested. Around enabling failFast by default:

  1. From the Pull Request action perspective, the gradle check reports can have 3 status at a high level: SUCCEDED, FAILED and UNSTABLE. While the proposal works for SUCCEDED and FAILED; UNSTABLE is where we find that on the same commit, some test which earlier passed has failed, or vice versa, and is regarded as flaky and unblocks PRs. In such cases, it may be a good idea that the task continues to run at-least in our PR actions.

  2. For the PR action, I'm not sure if it is feasible to have a way to incrementally append a comment on a failure event from a single test as that would really be helpful. (maybe something within the gradle-check.sh to poll the consoleText from Jenkins after every n times we check the status of tests to see if we see something like REPRODUCE WITH in the output, and report the same?)

@andrross
Copy link
Member Author

UNSTABLE is where we find that on the same commit, some test which earlier passed has failed, or vice versa

@mgodwan I believe UNSTABLE means that one of our tests for which retries are enabled failed at least once and then succeeded on a retry. We'd need to confirm that the fail fast behavior doesn't cause the test to immediately fail out unless retries are exhausted.

@mgodwan
Copy link
Member

mgodwan commented Aug 29, 2024

Ah! I see. Thanks for the clarification on this :)

@mgodwan
Copy link
Member

mgodwan commented Aug 29, 2024

We'd need to confirm that the fail fast behavior doesn't cause the test to immediately fail out unless retries are exhausted.

I see some discussion around this here gradle/test-retry-gradle-plugin#75 but don't see any conclusion for supporting both retries and fail fast together.

@jainankitk
Copy link
Collaborator

We'd need to confirm that the fail fast behavior doesn't cause the test to immediately fail out unless retries are exhausted.

Yeah, we need to confirm this behavior before fail fast can be viable option. Also, I am wondering how much time are we going to save in most scenarios. Will it be from 45 min to 10 min, or something like 45 min to 30 min. And, do we need to worry about the increase in resource utilization due to ongoing tasks from prior fail fast?

@andrross andrross linked a pull request Aug 30, 2024 that will close this issue
3 tasks
@andrross
Copy link
Member Author

I created a draft PR to test this out: #15536

@dblock dblock removed the untriaged label Sep 16, 2024
@dblock
Copy link
Member

dblock commented Sep 16, 2024

[Catch All Triage - 1, 2, 3, 4, 5]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Build Build Tasks/Gradle Plugin, groovy scripts, build tools, Javadoc enforcement. discuss Issues intended to help drive brainstorming and decision making enhancement Enhancement or improvement to existing feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants