Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert Fleet-Controller to StatefulSet #1837

Closed
manno opened this issue Oct 5, 2023 · 3 comments
Closed

Convert Fleet-Controller to StatefulSet #1837

manno opened this issue Oct 5, 2023 · 3 comments
Milestone

Comments

@manno
Copy link
Member

manno commented Oct 5, 2023

Our controllers are running in a k8s deployment. We see that work is done outside the leader election if multiple replicas exist or during upgrades. E.g. fleet-controller runs "cleanup" multiple times, controllers start up and create CRDs in parallel, fleet-agent runs multiple registrations.

Leader election is an expensive operation, our default timing is 30s renew, 45s for the lease.
We also see problems with the lock files, client-go > 0.25 (?) will deprecate the current lock resource. The current lease mechanism causes problems with backup and restore. The lead election code lives in wrangler and keeps us from updating client-go.

From the controller-runtime docs:

Note that the Manager is run as a StatefulSet and not a Deployment. This is to ensure that only 1 instance of the Manager is run at a time (a Deployment may sometimes run multiple instances even with replicas set to 1).

See:

@manno manno added this to Fleet Oct 5, 2023
@manno manno converted this from a draft issue Oct 5, 2023
@manno manno added this to the 2024-Q1-2.8x milestone Oct 5, 2023
@p-se p-se assigned p-se and unassigned p-se Oct 9, 2023
@manno
Copy link
Member Author

manno commented Oct 19, 2023

We discussed and will try to improve the existing leader election code first.

@raulcabello
Copy link
Contributor

raulcabello commented Oct 24, 2023

move all code that interacts with the cluster into leader election, i.e. https://github.com/rancher/fleet/blob/master/internal/cmd/controller/controllers/content/controller.go#L45

This is going to be moved into leader election when we split the cleanup into a different container https://github.com/raulcabello/fleet/blob/split-cleanup/internal/cmd/controller/cleanup/start.go#L39

@manno
Copy link
Member Author

manno commented Oct 25, 2023

We're going to fix leader election instead.

  • bump to wrangler 2.1.2 (configurable leader election)
  • move code into leader election, done? removed addcrd, cleanup

@manno manno closed this as completed Oct 25, 2023
@manno manno removed the status in Fleet Oct 25, 2023
@manno manno removed this from Fleet Nov 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants