Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: effective tls policies reconciler #927

Open
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

KevFan
Copy link
Contributor

@KevFan KevFan commented Oct 9, 2024

Description

Part of #819 #824

  • Refactor the logic for the management of TLSPolicy Certificates into a workflow task
  • Re-add validation for TLS Policy to check for if Issuer kind is correct and if the issuer is present on cluster, with the addition of integrations tests to test for this

Verification

  • Passing integration tests should be enough to test that nothing is broken from this change

@KevFan KevFan self-assigned this Oct 11, 2024
@KevFan KevFan added the kind/enhancement New feature or request label Oct 11, 2024
Comment on lines 73 to 96
if policy.DeletionTimestamp != nil {
logger.V(1).Info("policy is marked for deletion, nothing to do", "name", policy.Name, "namespace", policy.Namespace, "uid", policy.GetUID())
continue
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for finalizer since associated Certificate's will also be deleted by owner ref

logger := controller.LoggerFromContext(ctx).WithName("EffectiveTLSPoliciesReconciler").WithName("Reconcile")

// Get all TLS Policies
policies := lo.Filter(topology.Policies().Items(), func(item machinery.Policy, index int) bool {
Copy link
Collaborator

@maleck13 maleck13 Oct 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So are we reconciling every policy on the cluster with every change to the subscribed kinds? Not quite understanding how this is working

Copy link
Collaborator

@maleck13 maleck13 Oct 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If so do we intend to have a way to not need to do this? Like if a certificate changes, there is a link between that certificate and the TLSPolicy that ownes it so we shouldn't need to reconcile all TLSPolicies because a certificate changed right. Or am I misunderstanding?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fair. There's is a way to do reconcile only affected tls policies instead by using the events received. I've pushed a commit for this so this should be addressed now cd35907

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've reverted it since it looks to give more concerns that needed. Let's have a look can this be optimized in another PR or later point

@KevFan KevFan force-pushed the effective-tls-reconciler branch 5 times, most recently from af8c44c to cd35907 Compare October 16, 2024 09:21
@KevFan KevFan marked this pull request as ready for review October 16, 2024 09:24
Comment on lines 214 to 217
affectedPolicies = append(affectedPolicies, lo.Filter(policies, func(item machinery.Policy, index int) bool {
p := item.(*kuadrantv1alpha1.TLSPolicy)
return utils.IsOwnedBy(ob, p)
})...)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe you could leverage the topology links here too:

Suggested change
affectedPolicies = append(affectedPolicies, lo.Filter(policies, func(item machinery.Policy, index int) bool {
p := item.(*kuadrantv1alpha1.TLSPolicy)
return utils.IsOwnedBy(ob, p)
})...)
c, _ := ob.(*controller.RuntimeObject).Object.(*certmanagerv1.Certificate)
listeners := topology.Targetables().Parents(machinery.LocatorFromObject(c))
affectedPolicies = append(affectedPolicies, lo.FlatMap(listeners, func(l machinery.Targetable) ([]machinery.Policy, bool) {
return l.(*machinery.Listener).Gateway.Policies()
})...)

In this particular case, not sure if it's better or worse. I'm just thinking about avoiding filtering the same list of policies so many times.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, that's true. If you feel strongly on this I can update, but using the owner refernence looks a lot more readable unless filtering the list of policies should be avoided in general? 🤔

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would not say we should avoid filtering policies. Rather, I would say we should avoid writing new logic for things otherwise represented in the topology.

I believe what we want here is get to the list of policies. There must always be a path in the topology between every object directly or indirectly related to a policy and the policy object itself. Getting to the policies some other way suggests to me that we may have failed to represent those links between objects in the topology, that we are underusing the links (and therefore the helper functions such as Parents and Children, thought to impose a given structure to the topology), or yet that we're using the wrong relationship between objects as topological links (e.g. the object is linked to a resource and it's owned by the resource).

If ownership in this case defines the link between objects, then we should use the link. (Using IsOwnedBy would be repeating ourselves.) If the ownership is not the basis for the link, but added only as a mean to facilitate cleaning up resources, then let's not use it as the strong link tying the objects together.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's fair, I've reverted this part where this is done, but I'll have a look in the other code changes to see can I better leverage the links 👍

Copy link
Contributor

@Boomatang Boomatang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of my initial thoughts. Have not ran the code base yet.

})

// Policy is deleted
if policy.DeletionTimestamp != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it not make more sense to do the deletion timestamp filtering when getting the list of policies in the first place.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've reverted the filtering logic in the latest commit and started looping through the attached policies of listeners/gateways instead. I've kept this kind of check but not sure is there any chance that an deleted policy would still be attached to the listener or gateway 🤔

})

// Create
if !ok {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we be more positive in our checks, doing if ok instead of `if !ok. I also wonder if it is better to have the more common action listed first. So if we do more creates that should be listed first but if we do more creates that should be.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure but whether to do !ok or ok first depends also on whether we create or update first, so if we want to be more positive in our check here if ok, this would mean update is first 🤔

Not sure which action do we expect more of. Maybe @mikenairn can answer on this

}
}

// Clean up orphaned certs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cleaning up of orphaned certs gives me an odd feeling, which I have yet to put my finger on. Had the same feeling with the DNS records. The question I ask find asking my self is what does the cleaning up of orphaned certs have to do with TLS polices and their reconcile. I have this feeling that this clean up should be structured as a Task in the Workflow over being part of the TLS policy reconcile. There seems to be a mixing of concerns. I understand how this mixing of concerns is ingrained in us from using the controller runtime. This currently a feeling that this is a code smell, but yet I don't have a good though on what to do about it.

One thing that I don't know at time of writing is how do certs become orphaned.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is because there is a indirect link between policies and the subresouce (certificates in this case) and so gives the expected state of the cluster. Though true, this probably can be done as a separate task, that I can look into if you feel strongly about this.

A cert can be orphaned when a gateway listener is removed, or is the target ref of a tls policy is changed to another gateway. In both of these cases the cert(s) is created and will be orphaned since the policy still exists and still valid

resource := t.client.Resource(CertManagerCertificatesResource).Namespace(cert.GetNamespace())

if err := resource.Delete(ctx, cert.Name, metav1.DeleteOptions{}); err != nil && !apierrors.IsNotFound(err) {
return err
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want this returning an error at this stage? There may be more certs to delete and if there is an error on the first one. We would never try to delete the rest.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, I'll log the error here instead 👍

controllers/tls_workflow.go Outdated Show resolved Hide resolved
controllers/tlspolicy_status_updater.go Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants