Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Gradle Check reported as null even though step is executing in background causing multiple orphaned runs and busy fleet #15609

Closed
mgodwan opened this issue Sep 3, 2024 · 8 comments · Fixed by opensearch-project/opensearch-build#4993
Assignees
Labels
bug Something isn't working Build Build Tasks/Gradle Plugin, groovy scripts, build tools, Javadoc enforcement. enhancement Enhancement or improvement to existing feature or request

Comments

@mgodwan
Copy link
Member

mgodwan commented Sep 3, 2024

Describe the bug

  1. Run gets queued into Jenkins fleet
  2. Github Action reports gradle check status as null, and mark the step as failed
  3. I as a PR author re-trigger the gradle check because I'm not aware of the actual issue.
  4. In the background, previously queued gradle check is now executed. [This one will be of no use to me since the PR checks will continue to report failed]
  5. The other check I scheduled in step 3 sees the same fate and it just cascades into multiple runs and fleet always being busy leading to error in step 2 for a longer duration.

Related component

Build

Expected behavior

We need a way to cancel the ongoing gradle check, or modify our integration to not fail fast and wait for queued step to execute instead of marking step as failed in PRs.

Additional Details

No response

@mgodwan mgodwan added bug Something isn't working untriaged labels Sep 3, 2024
@github-actions github-actions bot added the Build Build Tasks/Gradle Plugin, groovy scripts, build tools, Javadoc enforcement. label Sep 3, 2024
@prudhvigodithi
Copy link
Member

prudhvigodithi commented Sep 3, 2024

[Triage]
Hey @mgodwan thanks, today we run the gradle check at the commit level. A PR can have multiple commits and each commit will have a gradle check build triggered. Here if we wait for the queued or existing running build for a PR to get completed before we trigger a new gradle check build for a new commit part of the same PR, it would impact the developer experience as gradle check build sometimes can take hours to complete. (if we decided to wait for the old run to complete) Today the old runs (the ones that are already queued or in progress) for the PR would just add a comment to the PR and eventually would add a new comment (either passed or failed) for the latest commit.

Adding to this there is an open PR opensearch-project/opensearch-build-libraries#485 that will handle the force push scenario (more details on this issue opensearch-project/opensearch-build#2292) where the workflow will not even trigger the gradle check build for the orphaned (force push) commits.

Thanks
@dblock @andrross @reta @getsaurabh02

@mgodwan
Copy link
Member Author

mgodwan commented Sep 4, 2024

@prudhvigodithi I think there is some misunderstanding around the issue shared here.

The issue is that when gradle check is scheduled, and the GH action gets a null response (as it didn't start execution but just got queued), the builds continue to run even though they don't add to the PR result due to being orphaned. The issue you've mentioned

Here if we wait for the queued or existing running build for a PR to get completed before we trigger a new gradle check build for a new commit part of the same PR

Even in such cases, existing builds against the PR should be canceled, and only new commit/force push should execute.
cc: @Bukhtawar @sachinpkale @ashking94 @gbbafna

@mgodwan
Copy link
Member Author

mgodwan commented Sep 4, 2024

Recent related thread https://opensearch.slack.com/archives/C04UM4D6XN2/p1725454723121789

@prudhvigodithi
Copy link
Member

I see, in that case we need to have service/mechanism in between GitHub and Jenkins that can trigger and abort the jobs. This layer in between GitHub and Jenkins can have the information about the PR's and its commits and can trigger a build for new commit or abort the job if a build for the same PR is already running and start a new build with latest commit. This layer can also handle null response scenarios, when during the Jenkins queue is filled (the limit reached) it can wait and re-trigger back again with the same commit. With this we can even handle force push to not even trigger the build.
@peterzhuamazon @getsaurabh02 @dblock @rishabh6788

@mgodwan
Copy link
Member Author

mgodwan commented Sep 4, 2024

@peterzhuamazon Does the PR opensearch-project/opensearch-build#4993 address concerns laid out here?

@rishabh6788
Copy link
Contributor

@mgodwan Yes it should. It will now keep polling jenkins queue till it returns a submitted build job url. There is a 2-hr time-out which should be more than sufficient to return the build number.

@peterzhuamazon
Copy link
Member

@peterzhuamazon Does the PR opensearch-project/opensearch-build#4993 address concerns laid out here?

Yes @mgodwan it should resolve this now.

Thanks.

@peterzhuamazon
Copy link
Member

Closing now as it is resolved.

Thanks.

@github-project-automation github-project-automation bot moved this from 🏗 In progress to ✅ Done in Engineering Effectiveness Board Sep 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Build Build Tasks/Gradle Plugin, groovy scripts, build tools, Javadoc enforcement. enhancement Enhancement or improvement to existing feature or request
Projects
Status: ✅ Done
Development

Successfully merging a pull request may close this issue.

4 participants