Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Difficulties with keda and helm during keda->standard hpa 'upgrade' #6250

Open
SleepyBrett opened this issue Oct 18, 2024 · 0 comments
Open
Labels
bug Something isn't working

Comments

@SleepyBrett
Copy link

SleepyBrett commented Oct 18, 2024

Report

I have created a helm chart that allows users to define standard hpas or 'opt-into' keda. When keda is disabled by values we create a standard hpa, but when keda is enabled, we do not render the hpa and instead render a scaled object that specifies an hpa name. Because of how helm does it's install this is causing us some issues.

A quick overview of how helm installs/upgrades things:

  1. Helm renders all the templates for the current values and produces a number of k8s objects.
  2. Those k8s objects are then created/updated on the cluster (as long as the current ones are owned by helm (have certain labels/annotations)).
  3. Helm then looks for any objects that are 'owned' by the helm release but were not defined in step 1 and it deletes those objects as they are now orphaned.

In the current chart the name of the hpa that is created when keda is disabled is the same as the hpa name we place into the scaled object when keda is enabled. We use the transfer-hpa-ownership annotation to smooth this over.

So from a helm point of view:

  1. renders a scaled object and not an hpa
  2. apply
  1. keda validation webhook sees the current HPA (not owned by keda) and because of the annotation, since the names match, does not care and moves on.
  2. the scaledobject is created
  1. helm removed the current hpa
  2. the keda controller reconciles the scaled object and since the hpa does not exist, it gets created.

So far so good. We now have an hpa owned by the scaled object.

Now when we then disable keda:

  1. helm renders templates and generates an HPA object but no scaledobject
  2. apply, helm updates the hpa that keda created (we think, this is an odd one beucase i would expect helm to choke here on non-ownership, perhaps helm does not remove the current hpa when 'upgrading to keda' because of the keda ownership block? Audit logs could tell us i suppose)
  3. helm removes the scaled object
  4. keda/or k8s controller manager removes the hpa because it was owned by the scaled object
  5. we are left with a deployment with no hpa

So then we think, ok what if the name of the hpa created by a non-keda install and the hpa referenced by a keda install are different. We make the changes but find that when we go to upgrade from non-keda -> keda the validation webhook rejects us, because at the time we are applying the scaledobject the hpa still exists and we get the failed to create resource: admission webhook "vscaledobject.kb.io" denied the request: the workload 'kedatest-microservice' of type 'apps/v1.Deployment' is already managed by the hpa 'kedatest-microservice' error.

Is there any way around this, we have fiddled a bit with the scaled object annotations, but they are, frankly, pretty poorly documented. Specifically validations.keda.sh/hpa-ownership

Expected Behavior

I expect to be able to toggle back and forth between a standard hpa and keda hpa using the standard helm upgrade -i method in a single step process.

Actual Behavior

Deletion of the scaled object deletes the underlying hpa. Leaving a service that has downgraded from keda to standard hpa with no hpa at all.

Steps to Reproduce the Problem

I've kind of discussed above, but I can provide a slimmed down helm chart on request.

Logs from KEDA operator

logs are unimportant

KEDA Version

2.14.0

Kubernetes Version

1.29

Platform

Amazon Web Services

Scaler Details

unimportant

Anything else?

It seems to me that you could implement a new annotation that would, on scaledobject deletion, instead of marking for delete immediately, first remove the ownership claim from the hpa. Thus leaving the hpa intact. Thoughts?

We realize that this is a bit of an edge case, but it is one that would bite pretty hard and it concerns us.

@SleepyBrett SleepyBrett added the bug Something isn't working label Oct 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant