Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix helm race condition #3633

Merged

Conversation

juanluisvaladas
Copy link
Contributor

@juanluisvaladas juanluisvaladas commented Oct 23, 2023

Description

The PR includes three commits.

The first commit removes the field concurrencyLevel in helm controller. This field isn't honored at all and we want to remove concurrent reconciliation anyway so entirely remove the field.

The second commit makes extensions controller thread-safe. Helm is unable to reconcile multiple charts at the same time because /var/lib/k0s/helmhome/cache/ is shared between charts and helm cache was not designed to be safe to be used concurrently. To fix it we document some functions of the Commands struct are not thread safe (they couldn't have a local lock unless it was shared across all instances) and modify the controller not concurrent.

The third commit implements timeouts, we were already passing a context but this wasn't used at all. This is to prevent a reconciler being stuck forever

Requirement to fix #3282 and #3433

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update

How Has This Been Tested?

  • Manual test
  • Auto test added

Checklist:

  • My code follows the style guidelines of this project
  • My commit messages are signed-off
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • Any dependent changes have been merged and published in downstream modules
  • I have checked my code and corrected any misspellings

This field isn't honored at all and we want to remove concurrent
reconciliation anyway so entirely remove the field.

Signed-off-by: Juan Luis de Sousa-Valadas Castaño <[email protected]>
@juanluisvaladas juanluisvaladas requested a review from a team as a code owner October 23, 2023 15:23
@juanluisvaladas juanluisvaladas added backport/release-1.25 PR that needs to be backported/cherrypicked to release-1.25 branch backport/release-1.26 PR that needs to be backported/cherrypicked to release-1.26 branch backport/release-1.27 PR that needs to be backported/cherrypicked to release-1.27 branch backport/release-1.28 PR that needs to be backported/cherrypicked to release-1.28 branch labels Oct 23, 2023
@twz123
Copy link
Member

twz123 commented Oct 23, 2023

Could you add a bit of explanation of the problem and its fix in the PR description (and possibly the commit message)?

Edit: oh I see the commit messages provide more details already.

@juanluisvaladas
Copy link
Contributor Author

Could you add a bit of explanation of the problem and its fix in the PR description (and possibly the commit message)?

Oh sorry, added a proper explanation.

@twz123 twz123 added the bug Something isn't working label Oct 24, 2023
@twz123 twz123 added this to the 1.29 milestone Oct 24, 2023
twz123
twz123 previously approved these changes Oct 26, 2023
@@ -206,6 +206,8 @@ func (hc *Commands) isInstallable(chart *chart.Chart) bool {
return true
}

// InstallChart installs a helm chart
// InstallChart, UpgradeChart and UninstallRelease(releaseName are *NOT* thread-safe
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it makes sense to mention this once int the struct's docs. Saying that the whole struct isn't meant to be used concurrently.

@juanluisvaladas
Copy link
Contributor Author

After speaking with Tom we agreed this isn't really thread safe. It's less racy as I couldn't reproduce the helm condition myself but it needs further work. Can't merge it yet

Helm is unable to reconcile multiple charts at the same time because
/var/lib/k0s/helmhome/cache/ is shared between charts and helm cache was
not designed to be safe to be used concurrently.

Signed-off-by: Juan Luis de Sousa-Valadas Castaño <[email protected]>
twz123
twz123 previously approved these changes Oct 26, 2023
Copy link
Member

@twz123 twz123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉

Up to now there was a context passed but it was never cancelled.

Signed-off-by: Juan Luis de Sousa-Valadas Castaño <[email protected]>
@juanluisvaladas juanluisvaladas merged commit 42845e9 into k0sproject:main Oct 26, 2023
71 checks passed
@k0s-bot
Copy link

k0s-bot commented Oct 26, 2023

Backport failed for release-1.25, because it was unable to cherry-pick the commit(s).

Please cherry-pick the changes locally.

git fetch origin release-1.25
git worktree add -d .worktree/backport-3633-to-release-1.25 origin/release-1.25
cd .worktree/backport-3633-to-release-1.25
git checkout -b backport-3633-to-release-1.25
ancref=$(git merge-base 9c1a0c7a7b3d2f7bd0e0479d36d1eddca0204fc0 62f0a7e8b6749d3e1937cbf23b878014e97bc181)
git cherry-pick -x $ancref..62f0a7e8b6749d3e1937cbf23b878014e97bc181

@k0s-bot
Copy link

k0s-bot commented Oct 26, 2023

Backport failed for release-1.26, because it was unable to cherry-pick the commit(s).

Please cherry-pick the changes locally.

git fetch origin release-1.26
git worktree add -d .worktree/backport-3633-to-release-1.26 origin/release-1.26
cd .worktree/backport-3633-to-release-1.26
git checkout -b backport-3633-to-release-1.26
ancref=$(git merge-base 9c1a0c7a7b3d2f7bd0e0479d36d1eddca0204fc0 62f0a7e8b6749d3e1937cbf23b878014e97bc181)
git cherry-pick -x $ancref..62f0a7e8b6749d3e1937cbf23b878014e97bc181

@k0s-bot
Copy link

k0s-bot commented Oct 26, 2023

Backport failed for release-1.27, because it was unable to cherry-pick the commit(s).

Please cherry-pick the changes locally.

git fetch origin release-1.27
git worktree add -d .worktree/backport-3633-to-release-1.27 origin/release-1.27
cd .worktree/backport-3633-to-release-1.27
git checkout -b backport-3633-to-release-1.27
ancref=$(git merge-base 9c1a0c7a7b3d2f7bd0e0479d36d1eddca0204fc0 62f0a7e8b6749d3e1937cbf23b878014e97bc181)
git cherry-pick -x $ancref..62f0a7e8b6749d3e1937cbf23b878014e97bc181

@k0s-bot
Copy link

k0s-bot commented Oct 26, 2023

Successfully created backport PR for release-1.28:

@jannispl
Copy link

I pointed out in #3282 that there is a lack of documentation regarding the invocation with --wait for Helm chart installations, is this something that should be addressed here, as my original issue was closed with reference to this one?

@juanluisvaladas
Copy link
Contributor Author

Hi @jannispl your issue should have never been closed. I think the bot closed it because of the sentence: "Requirement to fix #3282 and #3433" but that was never intended. Part of those issues are solved now but we're not done just yet.

@twz123
Copy link
Member

twz123 commented Feb 22, 2024

/xref #2974, which introduced the concurrencyLevel which has been removed here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/helm backport/release-1.25 PR that needs to be backported/cherrypicked to release-1.25 branch backport/release-1.26 PR that needs to be backported/cherrypicked to release-1.26 branch backport/release-1.27 PR that needs to be backported/cherrypicked to release-1.27 branch backport/release-1.28 PR that needs to be backported/cherrypicked to release-1.28 branch bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Updates to helm extensions are not applied
4 participants