-
Notifications
You must be signed in to change notification settings - Fork 95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix failed run count logic for alerting test failures #7554
Fix failed run count logic for alerting test failures #7554
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #7554 +/- ##
=======================================
Coverage 61.08% 61.08%
=======================================
Files 520 520
Lines 27125 27125
=======================================
+ Hits 16568 16570 +2
+ Misses 9093 9092 -1
+ Partials 1464 1463 -1 ☔ View full report in Codecov by Sentry. |
Radius functional test overview
Click here to see the list of tools in the current test run
Test Status⌛ Building Radius and pushing container images for functional tests... |
@@ -715,10 +715,14 @@ jobs: | |||
for (const run of response.data.workflow_runs) { | |||
if (run.conclusion === 'failure') { | |||
failureCount++; | |||
} else { | |||
} else if (run.conclusion === 'success') { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am trying to understand better. So, listWorkflowRuns
above will bring 10 workflows. But I think what we should do is that we should take a look at the last completed workflow outcome and, if that is failure, then we can create an issue.
If we check the last 10 all the time, we may end up creating multiple issues for the same failures. I think adding the completed status check to the call above and setting per_page
to 1 would be sufficient. If the last completed workflow has failed, then just set the failureCount to 2 and create the issue.
Suggestions:
- We can add a status COMPLETED check to the GitHub call above. We can see different statuses here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see this screen shot. When I cancelled workflow, cancelled run was also in completed status.
Because we want to treat only failure
conclusion as a failed run, we need to get 1+ actions runs and then linear search to find the recent failure in the array. So I was fetching 10 action runs and then scan the workflow status to find the failure and ignore the other statuses, such as cancelled.
Also, I want to allow us to adjust the acceptable failures. that's why I add ISSUE_CREATE_THRESHOLD
variable. I can add more comments for the clarification.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I created new composite action to be shared. Please review the PR again.
@@ -0,0 +1,65 @@ | |||
name: "Count completed failed runs" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ytimocin I created the composite action to be shared in two workflows.
per_page: perpage | ||
}); | ||
|
||
// Scan `failure` conclusion runs to find the consecutive failures while |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ytimocin I added the comment why it scans the workflow run payloads
@@ -0,0 +1,65 @@ | |||
name: "Count completed failed runs" | |||
description: This actions counts the number of consecutive failed runs for a given workflow. | |||
inputs: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is validated by this workflow run - https://github.com/radius-project/radius/actions/runs/8865909289/job/24342699250
Radius functional test overview
Click here to see the list of tools in the current test run
Test Status⌛ Building Radius and pushing container images for functional tests... |
0a93e98
to
eeae94e
Compare
Radius functional test overview
Click here to see the list of tools in the current test run
Test Status⌛ Building Radius and pushing container images for functional tests... |
eeae94e
to
77a0a74
Compare
Radius functional test overview
Click here to see the list of tools in the current test run
Test Status⌛ Building Radius and pushing container images for functional tests... |
77a0a74
to
a2f6a99
Compare
Radius functional test overview
Click here to see the list of tools in the current test run
Test Status⌛ Building Radius and pushing container images for functional tests... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ytimocin are you familiar with what caused the reporting to break in the place?
Signed-off-by: Young Bu Park <[email protected]>
Signed-off-by: Young Bu Park <[email protected]>
Signed-off-by: Young Bu Park <[email protected]>
Signed-off-by: Young Bu Park <[email protected]>
Signed-off-by: Young Bu Park <[email protected]>
Signed-off-by: Young Bu Park <[email protected]>
Signed-off-by: Young Bu Park <[email protected]>
Signed-off-by: Young Bu Park <[email protected]>
Signed-off-by: Young Bu Park <[email protected]>
Signed-off-by: Young Bu Park <[email protected]>
Signed-off-by: Young Bu Park <[email protected]>
Signed-off-by: Young Bu Park <[email protected]>
Signed-off-by: Young Bu Park <[email protected]>
a2f6a99
to
e9b3484
Compare
This pull request has been automatically marked as stale because it has been inactive for 90 days. Remove stale label or comment or this PR will be closed in 7 days. |
Description
When it counts the consecutive failed runs, the old code treats
in_progress
assuccess
. This will fix the issue to count the failed runs correctly by skipping non-success runs. I validated this logic in this action run.Type of change
Fixes: #issue_number