Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: add pagination to job_collector task #8240

Merged

Conversation

ClaudioMascaro
Copy link
Contributor

⚠️ Pre Checklist

Please complete ALL items in this checklist, and remove before submitting

  • I have read through the Contributing Documentation.
  • I have added relevant tests.
  • I have added relevant documentation.
  • I will add labels to the PR, such as pr-type/bug-fix, pr-type/feature-development, etc.

Summary

Add pagination in github_graphql job collector task.

Does this close any open issues?

Closes #8028

Screenshots

We needed to extract data from a large and complex repository, which has over 30000 workflow runs, and some of that can have more than 200 job runs.

Whenever the Collect Job task started, It simply wouldn't finish the query in time, entering in the retry flow:

Screenshot from 2024-12-09 16-56-00

we have tried to reduce api timeout, but it would only increase the number of unsuccessful retries:

Screenshot from 2024-12-06 15-38-14

So, after implementing the solution, it would solve our case, and after 17 hours, it was able to collect all data:

image
(don't mind the log I added locally to debug)

image

as for comparison purposes, we have extracted data from a much less complex repository, which before the implementation, took the following time:

Screenshot from 2024-12-11 08-11-06

and, after bringing off the solution (of course, rerunning in hard refresh mode), there was no change in pipeline overall time:

Screenshot from 2024-12-11 08-18-53

Other Information

Already merged on main branch, reopening as requested: #8233 (review)

@dosubot dosubot bot added size:M This PR changes 30-99 lines, ignoring generated files. component/plugins This issue or PR relates to plugins pr-type/bug-fix This PR fixes a bug labels Dec 12, 2024
@ClaudioMascaro
Copy link
Contributor Author

@klesh

@klesh
Copy link
Contributor

klesh commented Dec 16, 2024

This looks good! Could you also update the end-to-end (e2e) test cases to reflect these changes? Thanks!

@ClaudioMascaro
Copy link
Contributor Author

@klesh do you have any example on writing e2e test for Collector tasks? I couldn't find any practical approach to that. Also there are tests failing for jira plugin.

@klesh klesh force-pushed the feat/pagination-collect-jobs-release branch from 2462665 to 43caa9e Compare December 18, 2024 02:31
@klesh
Copy link
Contributor

klesh commented Dec 18, 2024

Yes, that's unexpected; collectors shouldn't impact e2e tests.

Further investigation revealed that PR #8223 introduced the error, which I've addressed in PR #8243. Everything should be working correctly now.

My apologies for the inconvenience, and thank you for your contribution!

@klesh klesh merged commit 7beae18 into apache:release-v1.0 Dec 18, 2024
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/plugins This issue or PR relates to plugins pr-type/bug-fix This PR fixes a bug size:M This PR changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants