-
Notifications
You must be signed in to change notification settings - Fork 377
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix helm race condition #3633
Fix helm race condition #3633
Conversation
This field isn't honored at all and we want to remove concurrent reconciliation anyway so entirely remove the field. Signed-off-by: Juan Luis de Sousa-Valadas Castaño <[email protected]>
Could you add a bit of explanation of the problem and its fix in the PR description (and possibly the commit message)? Edit: oh I see the commit messages provide more details already. |
Oh sorry, added a proper explanation. |
@@ -206,6 +206,8 @@ func (hc *Commands) isInstallable(chart *chart.Chart) bool { | |||
return true | |||
} | |||
|
|||
// InstallChart installs a helm chart | |||
// InstallChart, UpgradeChart and UninstallRelease(releaseName are *NOT* thread-safe |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it makes sense to mention this once int the struct's docs. Saying that the whole struct isn't meant to be used concurrently.
After speaking with Tom we agreed this isn't really thread safe. It's less racy as I couldn't reproduce the helm condition myself but it needs further work. Can't merge it yet |
Helm is unable to reconcile multiple charts at the same time because /var/lib/k0s/helmhome/cache/ is shared between charts and helm cache was not designed to be safe to be used concurrently. Signed-off-by: Juan Luis de Sousa-Valadas Castaño <[email protected]>
5f9d6ea
to
e49166c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎉
Up to now there was a context passed but it was never cancelled. Signed-off-by: Juan Luis de Sousa-Valadas Castaño <[email protected]>
Backport failed for Please cherry-pick the changes locally. git fetch origin release-1.25
git worktree add -d .worktree/backport-3633-to-release-1.25 origin/release-1.25
cd .worktree/backport-3633-to-release-1.25
git checkout -b backport-3633-to-release-1.25
ancref=$(git merge-base 9c1a0c7a7b3d2f7bd0e0479d36d1eddca0204fc0 62f0a7e8b6749d3e1937cbf23b878014e97bc181)
git cherry-pick -x $ancref..62f0a7e8b6749d3e1937cbf23b878014e97bc181 |
Backport failed for Please cherry-pick the changes locally. git fetch origin release-1.26
git worktree add -d .worktree/backport-3633-to-release-1.26 origin/release-1.26
cd .worktree/backport-3633-to-release-1.26
git checkout -b backport-3633-to-release-1.26
ancref=$(git merge-base 9c1a0c7a7b3d2f7bd0e0479d36d1eddca0204fc0 62f0a7e8b6749d3e1937cbf23b878014e97bc181)
git cherry-pick -x $ancref..62f0a7e8b6749d3e1937cbf23b878014e97bc181 |
Backport failed for Please cherry-pick the changes locally. git fetch origin release-1.27
git worktree add -d .worktree/backport-3633-to-release-1.27 origin/release-1.27
cd .worktree/backport-3633-to-release-1.27
git checkout -b backport-3633-to-release-1.27
ancref=$(git merge-base 9c1a0c7a7b3d2f7bd0e0479d36d1eddca0204fc0 62f0a7e8b6749d3e1937cbf23b878014e97bc181)
git cherry-pick -x $ancref..62f0a7e8b6749d3e1937cbf23b878014e97bc181 |
Successfully created backport PR for |
I pointed out in #3282 that there is a lack of documentation regarding the invocation with |
/xref #2974, which introduced the concurrencyLevel which has been removed here |
Description
The PR includes three commits.
The first commit removes the field concurrencyLevel in helm controller. This field isn't honored at all and we want to remove concurrent reconciliation anyway so entirely remove the field.
The second commit makes extensions controller thread-safe. Helm is unable to reconcile multiple charts at the same time because /var/lib/k0s/helmhome/cache/ is shared between charts and helm cache was not designed to be safe to be used concurrently. To fix it we document some functions of the Commands struct are not thread safe (they couldn't have a local lock unless it was shared across all instances) and modify the controller not concurrent.
The third commit implements timeouts, we were already passing a context but this wasn't used at all. This is to prevent a reconciler being stuck forever
Requirement to fix #3282 and #3433
Type of change
How Has This Been Tested?
Checklist: