Configurable leader election via chart values #1981
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Part of #1491
It makes leader election parameters configurable via Helm chart values, and adapts the default values by applying the following reasoning:
fleet-controller
deployment is configured to use only 1 replica, but users could still try to scale it up.StatefulSet
instead ofDeployment
, although this would still not prevent users from manually scaling up theStatefulSet
tooLeaseDuration
says that any candidate should wait at lease this duration before attempting to become leader, so long duration could slow down the rollout of new images.LeaseDuration
from 45 to 30 seconds.core
Kubernetes clients use 15 seconds, but they also keep a shorter retry period (2s), but 30s is enough for Fleet's use case.RetryPeriod
defines the wait period between actions, including renewing the leader lease. This mean that every$retryPeriod
, it will acquire the lease for up to$leaseDuration
. It has a default period of 2 seconds, which causes too much pressure on the Kubernetes APILeaseDuration
, I'm changingRetryPeriod
from 2 to 10 seconds.RenewalDeadline
is the period during which an active master will keep trying to renew the lock before giving up, which in our implementation means exiting the program. Failures renewing the lease could happen due to network instability in the node running the controller.LeaseDuration
andRetryPeriod
, in order to allow at least 2 attempts to renew the lease before it expires, I'm configuringRenewalDeadline
to 25 seconds.Reference: https://pkg.go.dev/k8s.io/client-go/tools/leaderelection#LeaderElectionConfig