Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: adding generate operation #3573

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

JaydipGabani
Copy link
Contributor

@JaydipGabani JaydipGabani commented Oct 5, 2024

What this PR does / why we need it:

Which issue(s) this PR fixes (optional, using fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when the PR gets merged):
Fixes #3659
Related #3501

Special notes for your reviewer:

@codecov-commenter
Copy link

codecov-commenter commented Oct 5, 2024

Codecov Report

Attention: Patch coverage is 27.37226% with 199 lines in your changes missing coverage. Please review.

Project coverage is 47.63%. Comparing base (3350319) to head (a7ae597).
Report is 173 commits behind head on master.

Files with missing lines Patch % Lines
pkg/controller/constraint/constraint_controller.go 0.00% 117 Missing ⚠️
...onstrainttemplate/constrainttemplate_controller.go 47.77% 63 Missing and 19 partials ⚠️

❗ There is a different number of reports uploaded between BASE (3350319) and HEAD (a7ae597). Click for more details.

HEAD has 1 upload less than BASE
Flag BASE (3350319) HEAD (a7ae597)
unittests 2 1
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #3573      +/-   ##
==========================================
- Coverage   54.49%   47.63%   -6.86%     
==========================================
  Files         134      236     +102     
  Lines       12329    19788    +7459     
==========================================
+ Hits         6719     9427    +2708     
- Misses       5116     9478    +4362     
- Partials      494      883     +389     
Flag Coverage Δ
unittests 47.63% <27.37%> (-6.86%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@JaydipGabani JaydipGabani changed the title [WIP] chore: adding generate operation chore: adding generate operation Oct 7, 2024
@JaydipGabani JaydipGabani marked this pull request as ready for review October 7, 2024 22:36
@JaydipGabani JaydipGabani requested a review from a team as a code owner October 7, 2024 22:36
Copy link
Contributor

@maxsmythe maxsmythe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Biggest concern is the timestamp stuff.

DefaultGenerateVAP = flag.Bool("default-create-vap-for-templates", false, "Create VAP resource for template containing VAP-style CEL source. Allowed values are false: do not create Validating Admission Policy unless generateVAP: true is set on constraint template explicitly, true: create Validating Admission Policy unless generateVAP: false is set on constraint template explicitly.")
log = logf.Log.V(logging.DebugLevel).WithName("controller").WithValues(logging.Process, "constraint_controller")
discoveryErr *apiutil.ErrResourceDiscoveryFailed
DefaultWaitForGeneration = flag.Int("default-wait-for-generation", 30, "Wait to generate ValidatingAdmissionPolicyBinding after the constraint is created. Defaults to 30 seconds.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably should not include the hardcoded time in the error text -- prone to bitrot.

return reconcile.Result{}, err
}
currentVapBinding = nil

if currentVapBinding == nil && instance.GetCreationTimestamp().Add(time.Duration(*DefaultWaitForGeneration)).Before(time.Now()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not rely on the creation timestamp... that leaves us vulnerable to clock skew since that is set by an unknown machine.

Also, as discussed in the OSS mtg, the time delay should be based off of the constraint template, specifically wait time seconds after the CRD is created from the template. There is no need to wait due to the constraint.

if generateVAPB && groupVersion != nil {
currentVapBinding, err := vapBindingForVersion(*groupVersion)
if operations.IsAssigned(operations.Generate) {
err := util.ValidateEnforcementAction(enforcementAction, instance.Object)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VAP generation should probably be in its own function -- the reconcile function is unwieldy at this point.

Also, I'm not sure we want to only validate enforcement action when generate is enabled.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

refactored the code

return reconcile.Result{}, err

if operations.IsAssigned(operations.Generate) {
isVapAPIEnabled := false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably move generation lifecycle stuff to its own function.

Makefile Outdated
@@ -70,6 +70,7 @@ MANAGER_IMAGE_PATCH := "apiVersion: apps/v1\
\n - --exempt-namespace=${GATEKEEPER_NAMESPACE}\
\n - --operation=webhook\
\n - --operation=mutation-webhook\
\n - --operation=generate\
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is added to controller-manager instead of audit. was this intentional?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was supposed to be added to audit pod. Fixed it.

@@ -171,6 +171,7 @@ information._
| enableK8sNativeValidation | Enable the K8s Native Validating driver to allow constraint templates to use rules written in VAP-style CEL (beta feature) | `true` |
| defaultCreateVAPForTemplates | (alpha) Create VAP resource for template containing VAP-style CEL source. Allowed values are false: do not create Validating Admission Policy unless generateVAP: true is set on constraint template explicitly, true: create Validating Admission Policy unless generateVAP: false is set on constraint template explicitly. | `false` |
| defaultCreateVAPBindingForConstraints | (alpha) Create VAPBinding resource for constraint of the template containing VAP-style CEL source. Allowed values are false: do not create Validating Admission Policy Binding, true: create Validating Admission Policy Binding. | `false` |
| defaultWaitForVAPBGeneration | (alpha) Wait to generate ValidatingAdmissionPolicyBinding after the constraint CRD is created. | `30` |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| defaultWaitForVAPBGeneration | (alpha) Wait to generate ValidatingAdmissionPolicyBinding after the constraint CRD is created. | `30` |
| defaultWaitForVAPBGeneration | (alpha) Wait time in seconds before generating a ValidatingAdmissionPolicyBinding after a constraint CRD is created. | `30` |

logger = log.Log.V(logging.DebugLevel).WithName("controller").WithValues("kind", "ConstraintTemplate", logging.Process, "constraint_template_controller")
discoveryErr *apiutil.ErrResourceDiscoveryFailed
logger = log.Log.V(logging.DebugLevel).WithName("controller").WithValues("kind", "ConstraintTemplate", logging.Process, "constraint_template_controller")
defaultWaitForVAPBGeneration = flag.Int("default-wait-for-vapb-generation", 30, "(alpha) Wait to generate ValidatingAdmissionPolicyBinding after the constraint CRD is created.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

@@ -574,6 +483,132 @@ func (r *ReconcileConstraint) reportErrorOnConstraintStatus(ctx context.Context,
return err
}

func (r *ReconcileConstraint) generateVAPB(ctx context.Context, enforcementAction util.EnforcementAction, instance *unstructured.Unstructured, status *constraintstatusv1beta1.ConstraintPodStatus) (time.Duration, error) {
ret := time.Duration(0)
if !operations.IsAssigned(operations.Generate) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if none of the controllers has the generate operation, how do we report this issue to the user?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added log statement for now. Another option is to figure the same out by looking at all CTStatus resource in CTstatus controller - https://github.com/open-policy-agent/gatekeeper/blob/master/pkg/controller/constrainttemplatestatus/constrainttemplatestatus_controller.go#L185. However we might need one field under CT Status to post the error.

if err != nil {
return err
}
if t.Before(currentTime.Add(time.Duration(*defaultWaitForVAPBGeneration*2) * time.Second)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this * 2? can you add a comment for this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I modified this to remove *2, added the comment as well.


// waiting for sometime before generating vapbinding, gives api-server time to cache CRDs
timestamp := ct.Annotations[BlockVAPBGenerationUntilAnnotation]
if timestamp != "" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if it's not populated

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed this behavior.

func (r *ReconcileConstraintTemplate) updateTemplateWithBlockVAPBGenerationAnnotations(ctx context.Context, ct *v1beta1.ConstraintTemplate) error {
currentTime := time.Now()
switch {
case ct.Annotations == nil || ct.Annotations[constraint.BlockVAPBGenerationUntilAnnotation] == "":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this need to be a switch statement or can a if clause work better since some of the code below seems to be repeated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the code to if stmt

// currentTime := time.Now()
// if currentTime.Before(blockTime) {
// t.Fatal("VAPBinding should not be created before the timestamp", currentTime, blockTime)
// }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this commented out?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed the commented block. Updated the tests as well.

return err
}
// if wait time is within the time window to generate vap binding, do not update the annotation
if t.Before(currentTime.Add(time.Duration(*constraint.DefaultWaitForVAPBGeneration) * time.Second)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if annotation is already populated, then can we always skip an update? what's the use case for when the annotation needs to be updated with a new time?

Copy link
Contributor Author

@JaydipGabani JaydipGabani Oct 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This ensures that VAPB is generated if a faulty actor has set the timestamp in the future manually—beyond the allowed delay configured by the flag.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the context and it would be useful to add that as a comment.

default:
// reconcile after default wait time for vapb generation if annotation is not set
if ct.Annotations == nil || ct.Annotations[BlockVAPBGenerationUntilAnnotation] == "" {
return time.Duration(*DefaultWaitForVAPBGeneration) * time.Second, r.reportErrorOnConstraintStatus(ctx, status, errors.New("annotation to wait for ValidatingAdmissionPolicyBinding generation not found"), "could not find annotation to wait for ValidatingAdmissionPolicyBinding generation")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this returned time duration is ignore when an error is returned. we should be consistent to always rely on the annotation for the timestamp instead of returning a duration here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The purpose here was to requeue after user defined delay in case wanted annotations are not found (wait for annotation to be set by CT constroller if annotation was not present for any reason). I updated the code to reflect the same.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to have a user-defined delay when annotation is missing, that should be handled by controller code (user sets time delay after CRD gets created, not internal controller mechanics)

@ritazh ritazh added this to the v3.18.0 milestone Oct 30, 2024
@JaydipGabani JaydipGabani changed the title chore: adding generate operation feat: adding generate operation Oct 30, 2024
Signed-off-by: Jaydip Gabani <[email protected]>
Signed-off-by: Jaydip Gabani <[email protected]>
return reconcile.Result{}, r.reportErrorOnConstraintStatus(ctx, status, err, fmt.Sprintf("could not delete ValidatingAdmissionPolicyBinding: %s", vapBindingName))
}
}
if requeueAfter != time.Duration(0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because we are requeueing before cacheConstraint is called (which adds the constraint to the constraint framework), the constraint will not be enforced at all until the VAP objects are created. These seems unnecessary and will hurt the performance of existing G8r uses.

return reconcile.Result{}, r.reportErrorOnConstraintStatus(ctx, status, err, fmt.Sprintf("could not update ValidatingAdmissionPolicyBinding: %s", vapBindingName))
}
}
requeueAfter, err := r.generateVAPB(ctx, enforcementAction, instance, status)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of the VAP generation logic is gated on reflect.DeepEqual() of the constraint as-cached by the constraint framework.

I don't think that is the right logic gate for VAP-gen logic -- it should be gated on whether the extant VAP objects have drifted from the to-be-generated VAP objects. Otherwise users could modify the generated VAP objects and G8r would not re-align them until the next time the constraint is touched.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like we gate actual writes on VAP binding having a diff --- could just always call generateVAP() code. Maybe we could avoid unnecessary logic execution and therefore improve performance by refactoring more, but always calling VAP gen should be sufficient.

Bonus points for calling it after adding the constraint to the constraint framework -- that avoids blocking all enforcement on VAP succeeding as mentioned elsewhere.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated to call generateVAP out side of reflect.DeepEqual() gate. Also calling generateVAP after constraint is marked enforced.

hasVAP, err := ShouldGenerateVAP(unversionedCT)
switch {
case errors.Is(err, celSchema.ErrCodeNotDefined):
generateVAPB = false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this error be swallowed? Or should we tell the user that the constraint template does not support VAP?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will mean when we switch the default to always generate VAP and VAPB, we will be throwing error for existing users using constraint template with Rego. I can also see the other use-case, where we would want to notify users if the intent is to use VAP but there is no CEL in the CT.

@ritazh @sozercan Any thoughts on this?

default:
// reconcile after default wait time for vapb generation if annotation is not set
if ct.Annotations == nil || ct.Annotations[BlockVAPBGenerationUntilAnnotation] == "" {
return time.Duration(*DefaultWaitForVAPBGeneration) * time.Second, r.reportErrorOnConstraintStatus(ctx, status, errors.New("annotation to wait for ValidatingAdmissionPolicyBinding generation not found"), "could not find annotation to wait for ValidatingAdmissionPolicyBinding generation")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to have a user-defined delay when annotation is missing, that should be handled by controller code (user sets time delay after CRD gets created, not internal controller mechanics)

vapBindingName := fmt.Sprintf("gatekeeper-%s", instance.GetName())
log.Info("check if vapbinding exists", "vapBindingName", vapBindingName)
if err := r.reader.Get(ctx, types.NamespacedName{Name: vapBindingName}, currentVapBinding); err != nil {
if !apierrors.IsNotFound(err) && !errors.As(err, &discoveryErr) && !meta.IsNoMatchError(err) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discovery/nomatch errors should not be a concern for bindings (which are a built-in type)

Copy link
Contributor Author

@JaydipGabani JaydipGabani Nov 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to remove the error matching.

return reconcile.Result{}, r.reportErrorOnConstraintStatus(ctx, status, err, fmt.Sprintf("could not update ValidatingAdmissionPolicyBinding: %s", vapBindingName))
}
}
requeueAfter, err := r.generateVAPB(ctx, enforcementAction, instance, status)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like we gate actual writes on VAP binding having a diff --- could just always call generateVAP() code. Maybe we could avoid unnecessary logic execution and therefore improve performance by refactoring more, but always calling VAP gen should be sufficient.

Bonus points for calling it after adding the constraint to the constraint framework -- that avoids blocking all enforcement on VAP succeeding as mentioned elsewhere.

return err
}
// if wait time is within the time window to generate vap binding, do not update the annotation
// otherwise update the annotation with the current time + wait time. This protects against manual updates on annotations with a timestamp that prevents binding from getting generated.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, more accurately, this prevents clock skew from preventing generation on task reschedule.

This design really doesn't protect against malicious users (RBAC for templates does that)

return nil
}

err := r.updateTemplateWithBlockVAPBGenerationAnnotations(ctx, ct)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should probably note that we are purposefully making a second call to the API server here in order to make sure the timestamp is post-CRD creation.

Otherwise the question I have is "why make two requests?"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment here.

if ct.Annotations == nil {
ct.Annotations = make(map[string]string)
}
ct.Annotations[constraint.BlockVAPBGenerationUntilAnnotation] = currentTime.Add(time.Duration(*constraint.DefaultWaitForVAPBGeneration) * time.Second).Format(time.RFC3339)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also have an annotation that is non-clock-dependent, that lets us know when generation is always allowed, otherwise we run the risk of sporadic delays when the pod reschedules.

Likely a significant change, probably not a blocker for beta (just a known risk)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once the annotations are set after CRD is created, pod reschedules won't cause the annotations to be set again. In the event of pod getting reschedule after CRD is created and before the annotations are set, annotations will get set on the restart and then it won't get updated again.

Since we only update annotations if it is not set or it is in future outside of the window defined by users, I am failing to see the need of non-clock-dependent annotation. Can you describe an example where we run the risk of sporadic delays?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add wait for vapb generation
4 participants