Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configurable leader election via chart values #1981

Merged
merged 1 commit into from
Dec 13, 2023

Conversation

aruiz14
Copy link
Contributor

@aruiz14 aruiz14 commented Nov 30, 2023

Part of #1491

It makes leader election parameters configurable via Helm chart values, and adapts the default values by applying the following reasoning:

  • The purpose of leader election on Fleet is to ensure only instance is running at a time.
    • fleet-controller deployment is configured to use only 1 replica, but users could still try to scale it up.
    • Rolling updates could cause to temporarily have 2 living instances of fleet.
    • This could be mitigated by using StatefulSet instead of Deployment, although this would still not prevent users from manually scaling up the StatefulSet too
  • The definition of LeaseDuration says that any candidate should wait at lease this duration before attempting to become leader, so long duration could slow down the rollout of new images.
    • For this reason, I've decreased LeaseDuration from 45 to 30 seconds.
      • core Kubernetes clients use 15 seconds, but they also keep a shorter retry period (2s), but 30s is enough for Fleet's use case.
  • RetryPeriod defines the wait period between actions, including renewing the leader lease. This mean that every $retryPeriod, it will acquire the lease for up to $leaseDuration. It has a default period of 2 seconds, which causes too much pressure on the Kubernetes API
    • For this reason, and taking into account the value used for LeaseDuration, I'm changing RetryPeriod from 2 to 10 seconds.
  • RenewalDeadline is the period during which an active master will keep trying to renew the lock before giving up, which in our implementation means exiting the program. Failures renewing the lease could happen due to network instability in the node running the controller.
    • Considering the LeaseDuration and RetryPeriod, in order to allow at least 2 attempts to renew the lease before it expires, I'm configuring RenewalDeadline to 25 seconds.

Reference: https://pkg.go.dev/k8s.io/client-go/tools/leaderelection#LeaderElectionConfig

@aruiz14 aruiz14 requested a review from a team as a code owner November 30, 2023 16:17
@moio
Copy link
Contributor

moio commented Dec 1, 2023

Reasoning sounds solid to me. The only part I believe not being accurate is:

Closes #1491

That issue is not only about making these parameters configurable, but crucially to make them configurable in Rancher for all downstream clusters at once.

This PR is a step in that direction, but crucially we will still need enablement in Rancher code to make use of these new Helm values, namely #1717.

Please do close #1491 when this PR is merged and backported and a fix for #1717 is also in place.

Thanks!

@aruiz14 aruiz14 changed the title Make leader election configurable via chart values and change defaults Configurable leader election via chart values Dec 13, 2023
@aruiz14 aruiz14 enabled auto-merge (squash) December 13, 2023 10:50
@aruiz14 aruiz14 merged commit 8ff1c09 into master Dec 13, 2023
10 checks passed
@aruiz14 aruiz14 deleted the configurable-leader-election branch December 13, 2023 11:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants