Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] investigate unstable scenarios – collect samples of flaky CI runs #563

Open
Shastick opened this issue Mar 15, 2024 · 1 comment
Open
Labels
bug Something isn't working P3 Lower priority

Comments

@Shastick
Copy link
Contributor

Shastick commented Mar 15, 2024

Describe the bug
The CI occasionally fails for no clear reason, and re-running the pipeline again usually shows no issue.

To reproduce
There is no clear way to reproduce this issue (or issues) at the moment

Difference from expected behavior
We would want the CI to consistently fail or succeed.

Possible solution
Currently we re-run flaky CI runs.

Screenshots
If applicable, add screenshots to help explain your problem. Otherwise, remove this section.

System on which behavior was encountered
This happens on the Github CI

Codebase information
Happens on recent main commits, unclear since which commit exactly.

Additional context
As a first step towards finding a fix, we probably want to collect samples of failed CI runs to determine what they share in common, and what scenarios are subject to flakiness.

Observed failures

  • on main, fails:
    • FAILURE for "Nominal planning: conflict with higher priority" scenario
    • FAILURE for "ASTM F3411-19 NetRID DSS interoperability" scenario
  • on a PR, fails:
    • FAILURE for "Nominal planning: conflict with higher priority" scenario
    • FAILURE for "ASTM F3411-19 NetRID DSS interoperability" scenario
  • on a PR
    • FAILURE for "Nominal planning: conflict with higher priority" scenario
    • FAILURE for "ASTM F3411-19 NetRID DSS interoperability" scenario
@Shastick Shastick added bug Something isn't working P3 Lower priority labels Mar 15, 2024
@Shastick Shastick changed the title [CI] investigate flaky scenarios – collect samples of flaky CI runs [CI] investigate unstable scenarios – collect samples of flaky CI runs Mar 15, 2024
@Shastick
Copy link
Contributor Author

Unrelated to the above, I have had a few cases of scenarios failing locally due to clock skew between the mock_uss and the OS, which runs the qualifier. This concerns scenarios that query the interaction log endpoint of the mock_uss.

The details can be found on #750: if anyone runs into intermittent failures caused by missing interaction log entries, this could be an explanation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working P3 Lower priority
Projects
None yet
Development

No branches or pull requests

1 participant