-
Notifications
You must be signed in to change notification settings - Fork 230
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Convert fleet-agent to controller-runtime #1772
Conversation
1790703
to
2857d39
Compare
2857d39
to
c652c1e
Compare
d1589ae
to
4d2ee4f
Compare
c26a7e7
to
a67c52a
Compare
45266ca
to
e449b48
Compare
e449b48
to
46225bf
Compare
7a53be3
to
2919ee6
Compare
// This mechanism of triggering requeues for changes is not ideal. | ||
// It's a workaround since we can't enqueue directly from the trigger | ||
// mini controller. Triggering via a status update is expensive. | ||
// It's hard to compute a stable hash to make this idempotent, because | ||
// the hash would need to be computed over the whole change. We can't | ||
// just use the resource version of the bundle deployment. We would | ||
// need to look at the deployed resources and compute a hash over them. | ||
// However this status update happens for every changed resource, maybe | ||
// multiple times per resource. It will also trigger on a resync. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy to talk or brainstorm about this, but could it be an option to order all resources by kind
+ key
, hash their resource versions and compare that against SyncGeneration
? This should avoid unnecessary re-deployments, for example on fleet-agent restarts?
Also, we should consider using Generation
instead of the ResourceVersion
. So status updates are omitted and don't requeue bundle deployments?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes let's talk. I'm currently assuming it's cheaper to trigger than to compute a proper hash.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, let's talk about this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes let's talk. I'm currently assuming it's cheaper to trigger than to compute a proper hash.
I'm not sure it's cheaper, especially since BundleDeployments updates may trigger as well updates on other resources on the upstream cluster (e.g. updating status of Bundles, GitRepos or Clusters resources). Also, beware of the multiplying factor in the case of many downstream clusters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BundleDeployments update all the time. At least every 15min, for the heartbeat timestamp in the "agent" status fields. We will need to ignore some of the status updates, when we convert the fleetcontroller.
Since the new controller framework deduplicates events, this might not be too bad.
The alternative is to list all the resources of the bundle. When two resources change, like in the integration test, this will happen twice. Then some computation, then two update to the bd status. That also happens every time we deploy. There is a TriggerSleep
delay, any change after that, the the mini controller updates bundledeployments..
262777e
to
a138a30
Compare
Scheme: scheme, | ||
Metrics: metricsserver.Options{BindAddress: metricsAddr}, | ||
HealthProbeBindAddress: probeAddr, | ||
LeaderElection: false, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we sure we want to remove leader election? If that is the case, do we want to do it in gitjob and fleet controller as well?
We would be losing the ability to be highly available, and we would see some incosistent behaviour if someone manually scale the statefulset
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already removed leader election in https://github.com/rancher/fleet/pull/1905/files
We can add a follow up item to bring it back. We'll need to discuss how that works together with the upcoming sharding story.
merr = append(merr, fmt.Errorf("failed refreshing drift detection: %w", err)) | ||
} | ||
|
||
err = r.Cleanup.CleanupReleases(ctx, key, bd) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We iterate through all BundleDeployments each time any of them changes to see if they need to be cleaned up. That's not very efficient. We can use finalizers as I mentioned in the previous comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's old logic again (https://github.com/rancher/fleet/blob/master/internal/cmd/agent/deployer/cleanup/cleanup.go). I'd not fix it within this PR, but I'll add an item to the follow up items.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we can fix this and this using finalizers in a separate PR?
// This mechanism of triggering requeues for changes is not ideal. | ||
// It's a workaround since we can't enqueue directly from the trigger | ||
// mini controller. Triggering via a status update is expensive. | ||
// It's hard to compute a stable hash to make this idempotent, because | ||
// the hash would need to be computed over the whole change. We can't | ||
// just use the resource version of the bundle deployment. We would | ||
// need to look at the deployed resources and compute a hash over them. | ||
// However this status update happens for every changed resource, maybe | ||
// multiple times per resource. It will also trigger on a resync. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, let's talk about this
|
||
// TODO is this needed with drift correction? | ||
if len(bd.Status.ModifiedStatus) > 0 && monitor.ShouldRedeploy(bd) { | ||
result.RequeueAfter = durations.MonitorBundleDelay |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we want to requeue here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's old behavior. I'm not sure we can remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we don't remove I think we will always requeue if there is drift. I think this should be removed since the reconciler is trigger when one of the resources changes. We should avoid using RequeueAfter
to avoid unnecessary (and maybe infinite?) reconcile calls
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
c10f21e
to
1470596
Compare
* new bundledeployment controller + reconciler * move code from "handlers" into the deployer packages * no more leader election for bundledeployment controller * replace agent's logger with logr.Logger * move newMappers into clusterstatus package, still uses wrangler * move "list resources" up into reconciler, don't fetch twice * agent reconciler only sets status when exiting * trigger bundledeployment via status on updates * move requeue and drift correction into reconciler * simplify bundlestatus condition handling * add controller-runtime args to agent * zap logging config * kubeconfig * increase TriggerSleep delay by 3s
Co-authored-by: Corentin Néau <[email protected]>
Co-authored-by: Corentin Néau <[email protected]>
Co-authored-by: Corentin Néau <[email protected]>
112a27e
to
5fc1324
Compare
Refers to #1734
Requires:
Follow up:
e2e/single-cluster/bundle_diffs_test.go:145
is flaky -> Switch bundle diff test to use random target namespaces #1975requeueAfter
withbd.Status.AppliedDeploymentID = ""
, or is it still needed? - Remove bd requeue, rely on drift correction #1985Manual testing: