Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite deploy to minimize API requests #38

Merged
merged 19 commits into from
Apr 18, 2024

Conversation

tianon
Copy link
Member

@tianon tianon commented Apr 2, 2024

This replaces our crane cp/crane index append based deploy workflow with more custom code that can generate the full index/manifest list for each tag in a fully deterministic and offline way (based on the data we've already got locally in builds.json), and code that can take that result and push it directly with minimal API requests, especially in the no-op case (where crane index append would make something like $2*(N+1)$ API requests to no-op, this new code should be only $1$ total per tag in the no-op case, even if we later add a pre-flight HEAD request which this code currently skips in favor of directly doing PUT and handling errors as a miss).

This also includes new integration tests and even some new jq tests to cover the new code. As noted in a comment in the code, in theory we can use ocimem in the future to add more unit tests to this code, but it's harder to successfully/correctly test the edge cases that way (since we'd then have to synthesize content in-process instead of relying on pre-synthesized content pre-existing), so I felt pretty good about leaving that as a TODO for now.

(I have been using this successfully for deploying my own personal images, FWIW. 👍)

Closes #22

@tianon
Copy link
Member Author

tianon commented Apr 2, 2024

This is real big. For now, I've left it in "many WIP commits" mode, which might make it easier or more interesting to review, but I don't think we should merge all these commits (and I was planning to squash them).

@tianon
Copy link
Member Author

tianon commented Apr 2, 2024

For purposes of comparison, the Go coverage percentage on main is currently 77.1% total, and this PR is at 77.9% total 😄

Comment on lines +6 to +13
rateLimitBuilds([
count: 1,
durationName: 'hour',
userBoost: true,
]),
pipelineTriggers([
// TODO https://github.com/docker-library/meta-scripts/issues/22
//upstream(threshold: 'UNSTABLE', upstreamProjects: 'meta'),
cron('H H/2 * * *'),
// (we've dropped to only running this periodically to avoid it clogging the whole queue for a no-op, which also gives build+meta more time to cycle and get deps so they have a higher chance to all go out at once -- see the above linked issue)
upstream('meta'),
cron('H H/6 * * *'), // run every few hours whether we "need" it or not
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This effectively sets us to run deploy every time meta succeeds, but at most once per hour (unless we run manually). I don't personally think this is right, but it's maybe close enough for first pass (triggers relatively frequently, but not so frequently that we cause problems for anyone else).

I'd also add the reminder that this job running is only a blocker for users to see new builds, not for dependent/child builds, so it's pretty reasonable for them to have a 1-2 hour delay (and we can always run manually if we know the builds are done and need to get a deploy out faster than that).

Copy link
Collaborator

@whalelines whalelines left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, a couple questions

.test/oci-sort-platforms/test.jq Show resolved Hide resolved
// panic instead of error because this should've already been handled/normalized above (so this is a coding error, not a runtime error)
}

return normal, nil
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is quite a function.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeahhhhhhh -- I actually wrote this as part of main() itself, and when I stepped back from the implementation I almost died and thus forced myself to re-write it in such a way that I could load it up with unit tests so I could have higher confidence that all the edge cases in every branch of this behemoth are actually covered properly.

@yosifkit yosifkit marked this pull request as ready for review April 18, 2024 18:52
@yosifkit yosifkit merged commit ab38f95 into docker-library:main Apr 18, 2024
1 check passed
@yosifkit yosifkit deleted the deploy branch April 18, 2024 18:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

scaling issues in deploy
3 participants