Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: build errors canceling deploy contexts #1225

Merged
merged 1 commit into from
Apr 11, 2024

Conversation

wesbillman
Copy link
Member

Fixes an issue where build errors would cancel the deploys for modules that build successfully.

For example, if we have time and echo (echo depends on time).

build: time
(time builds successfully)
deploy: time
build: echo
(echo fails to build)
>>> time's deploy ctx would get canceled here so it would never deploy

Then, we fix the build error in echo, but time never deployed and since it doesn't depend on echo it's never triggered to rebuild (unless a change to time is made).

This change introduces a buildCtx and deployCtx If there are any build failures we will report them after the deploys for successful builds complete.

@wesbillman
Copy link
Member Author

@alecthomas maybe we can chat about this approach and maybe if there's a better way to clean this up. The summary of what I want this change to do it to try to build and deploy whatever it can and then return the errors after it's attempted everything.

The reason for that is sort of described above, but maybe not super clearly.

During our build deploy loops we would cancel the whole loop if an error occurs. If this happens during initial deploy we might not build or deploy modules because of some previous error. So now with this change we attempt all of them (assuming their dependencies have build).

This cleaned up the code in some places, but made it perhaps more confusing in other areas. Open to refactor ideas as well if there's something we can do to make this more obvious.

@wesbillman
Copy link
Member Author

I also want to validate that we want to attempt to build and deploy as much as possible vs. erroring early when a build or deploy error occurs.

If we error early, we might have to keep a record of what was skipped because of the error so we can build them when the error is fixed. Right now, we fix the error and the things that haven't built will not build until a change is made to that module.

if module, ok := project.(Module); ok {
err := Deploy(deployCtx, module, replicas, waitForDeployOnline, e.client)
if err != nil {
deployErrors = append(deployErrors, err)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not safe to concurrently append to a slice.

deployQueue := make(chan Project, len(projects))
wg, ctx := errgroup.WithContext(ctx)
buildGroup, buildCtx := errgroup.WithContext(ctx)
deployGroup, deployCtx := errgroup.WithContext(ctx)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If your goal is to avoid cancelling the context on failure, don't use errgroup.WithContext() - instead create an errgroup manually like so wg := errgroup.Group{}

@wesbillman wesbillman merged commit e397c88 into main Apr 11, 2024
12 checks passed
@wesbillman wesbillman deleted the fix-build-engine-context branch April 11, 2024 17:28
@wesbillman
Copy link
Member Author

@alecthomas I merged this one after some updates to try to get updates to the other teams. But happy to make more changes if I missed anything here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants