-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🐛 increase the timeout when creating and upgrading CAPI controllers #27
Conversation
When starting up CAPI will try to "warm-up" the configured RuntimeExtensions, and fail if any runtime extension Pods are not up.
This should fix the race conditions seen when creating and upgrading CAPI providers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you Dimitri. We went through the code.
-
We need to also increase timeout when invoking clusterctl ApplyUpgrade API call to ensure that it accounts for registry warmup timeout.
-
We will start discussion upstream to improve algorithm for upgrade capi component process.
// Jitter is added as a random fraction of the duration multiplied by the jitter factor. | ||
return wait.Backoff{ | ||
Duration: 500 * time.Millisecond, | ||
Factor: 1.5, | ||
Steps: 10, | ||
Steps: 14, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should tweak Factor to make the max interval size smaller? Perhaps increasing duration and status, but reducing factor could lead to a longer overall timeout but retain better UX with less excessive waiting?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I though about that too but avoided for now to reduce the risk of introducing another race somewhere else.
What this PR does / why we need it:
This PR is an attempt to fix the race conditions we've been seeing when creating and upgrading CAPI controllers.
There are 2 fixes here:
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Fixes #