-
Notifications
You must be signed in to change notification settings - Fork 969
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Post Install Hook doesn't ever sync via ArgoCD #6765
Comments
I installed karpenter in a separate namespace and with a different service name than the default. I can't configure the values for the webhook and it returns the error:
However, it is obvious that I did not create the service with this name in the kube-system namespace.
|
I had the same bug when I tried to install Karpenter using Argo cd |
Same here. As a temporary solution, I had to edit each CRD and change the namespace to the proper one. CRDs from Chart refer to these files, it looks like the namespace is hardcoded to |
yes I have same DIFF did you install Karpenter-crd chart separatly from karpenter main chart? and you edited CRD manually? |
I installed it from the main chart and edited manually. |
Yes, after editing CRDs, ArgoCD shows "out of sync" now. Luckily, I only updated one cluster, I'll hold off on updating until the stable version is actually stable. ;) |
I'm seeing the same behavior when upgrading to v0.35.6 (prerequisite to upgrading to v1.0.0) |
Can you relay on the |
I tried to install Karpenter-crd using ArgoCD but when I install main karpenter I had diff |
What namespace are you intending to run the Karpenter controller? In |
yes I had following values.yaml But when I install karpenter it tries to change the namespace to cube-system |
Yes, the post install hook will run and patch the CRDs to make the karpenter service consistent with the helm chart in the Karpenter controller. You can disable the post install hook by setting |
But I am deploying Karpenter-crd and karpenter using ArgoCD and I can't use --no-hooks |
It's worth interjecting here, as far as I can tell without making ttlSecondsAfterFinished value larger than 0 ArgoCD can never install this chart, no matter how you fiddle it. And even then I'm not 100% sure this will fix it. The only other option available to me is to use a tool like kyervno to put a mutation webhook on the job and enforce this change in the cluster, but that's a lot of moving parts and things that can go wrong to be able to change an integer in a manifest. If we do want this configurable I think wrapping the manifest object in a helm conditional configured by a value like This is actually an ArgoCD problem but considering it's a multi year issue I have no hope it's going to be fixed any time soon. |
I'm using the main Chart (because it contains both CRD and controller deployment). And ArgoCD never reaches the post-hook moment (as described in the main message). Because the newly installed CRDs have a hardcoded namespace ArgoCD gets stuck because it is unable to list the current CRD resources, such as current nodeclaims or nodepools (because it's looking for a service in the wrong namespace) and therefore it never reaches the post-hook part. |
Guess if there's a desire to force this into kube-system namespace, might be pertinent to swap all the |
@danielloader The ideal is to leave it dynamic for the user to choose. |
Even if I install CRDs using the An optional solution would be the ability to parameterize in the values whether to install or not install CRDs via the controller helm chart and disable the post-hook from the helm template. ... or put the CRDs into the template and the ability to set the namespace from values. |
@snieg @engedaam Some helm charts have the option to install or not the CRDs, I think it would be interesting to add karpenter/karpenter to the helm chart, this would help to have an alternative. example:
|
You have the option to do this with Argo CD by passing in the https://argo-cd.readthedocs.io/en/stable/user-guide/helm/#helm-skip-crds If you do happen to be using Lovely, you can pass
To the application. Using Lovely will also allow you to kustomize the helm chart so the other issue is not relevant to me. |
Is there any reason why the If it did, it would be straightforward to control whether to deploy the CRDs (as mentioned above by @Vinaum8) as part of the release. Additionally, it would allow the ability to customize the values passed to the crds:
enabled: false
# Custom values to be passed to karpenter-crd dependency
serviceNamespace: my-custom-namespace If anyone here struggles with CRDs that are out of sync using ArgoCD (caused by the namespace that is updated by the post-install hook job), you can temporarily work around this issue by using the ignoreDifferences:
- group: apiextensions.k8s.io
kind: CustomResourceDefinition
name: nodepools.karpenter.sh
jsonPointers:
- /spec/conversion/webhook/clientConfig/service/namespace
- group: apiextensions.k8s.io
kind: CustomResourceDefinition
name: ec2nodeclasses.karpenter.k8s.aws
jsonPointers:
- /spec/conversion/webhook/clientConfig/service/namespace
- group: apiextensions.k8s.io
kind: CustomResourceDefinition
name: nodeclaims.karpenter.sh
jsonPointers:
- /spec/conversion/webhook/clientConfig/service/namespace |
This worked, but we are still back to the first point where ArgoCD cannot complete the post-hook (even though it has been completed)
|
As a workaround / permanent fix, I used argo cd and kustomize + helm. To patch it. To enable kustomize + helm, add the following: kustomize.buildOptions: "--enable-helm" Either via the argocd-cm configmap: apiVersion: v1
kind: ConfigMap
metadata:
name: argocd-cm
namespace: argocd
data:
kustomize.buildOptions: "--enable-helm" or adding to helm values configs:
cm:
kustomize.buildOptions: "--enable-helm" If you want to test it locally beware you need to use a recent kustomize. The embedded kubectl kustomize does not support OCI repo because of how it is invoking helm pull. kustomization.yaml: apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
helmCharts:
- name: karpenter
repo: oci://public.ecr.aws/karpenter
version: 1.0.0
releaseName: karpenter
namespace: kube-system
valuesFile: values.yaml
patches:
- target:
group: batch
version: v1
kind: Job
name: karpenter-post-install-hook
patch: |-
- op: replace
path: /spec/ttlSecondsAfterFinished
value: 30 |
This trick worked for me too. Added below to spec:
sources:
- chart: karpenter
repoURL: public.ecr.aws/karpenter
targetRevision: "{{values.version}}"
helm:
releaseName: karpenter
(...)
kustomize:
patches:
- target:
group: batch
version: v1
kind: Job
name: karpenter-post-install-hook
patch: |-
- op: replace
path: /spec/ttlSecondsAfterFinished
value: 30 |
I was not aware that you could mix helm and kustomize in one application 🤯 |
The real wins are deep in the comments, this is new to me also and might have made me considerably happier outside of this problem. Thanks @snieg Edit (cried):
|
I'm actually using ---
apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
name: karpenter
spec:
generators:
- clusters:
selector:
matchExpressions:
- key: cluster_name
operator: In
values:
- clusterA
- clusterB
- clusterC
values:
version: "1.0.0"
- clusters:
selector:
matchExpressions:
- key: cluster_name
operator: In
values:
- clusterD
- cluster...
values:
version: "0.37.0" # this is the default version
template:
metadata:
name: karpenter-{{metadata.labels.short_name}}
labels:
cluster_newrelic_guid: "{{metadata.labels.newrelic_guid}}"
spec:
destination:
name: "{{name}}"
namespace: kube-system
project: infra
sources:
- chart: karpenter
repoURL: public.ecr.aws/karpenter
targetRevision: "{{values.version}}"
helm:
releaseName: karpenter
valueFiles:
- "$values/apps/karpenter/values/{{metadata.labels.environment}}/values.yaml"
- "$values/apps/karpenter/values/{{metadata.labels.environment}}/{{metadata.labels.cluster_name}}/values.yaml"
# skipCrds: true
- repoURL: https://...
targetRevision: HEAD
ref: values
- repoURL: https://...
targetRevision: HEAD
path: apps/karpenter/values/{{metadata.labels.environment}}/{{metadata.labels.cluster_name}}
kustomize:
patches:
- target:
group: batch
version: v1
kind: Job
name: karpenter-post-install-hook
patch: |-
- op: replace
path: /spec/ttlSecondsAfterFinished
value: 30
info:
- name: ApplicationSet
value: This app belongs "karpenter" Application
syncPolicy:
syncOptions:
- CreateNamespace=true
syncPolicy:
preserveResourcesOnDeletion: true # do not delete Applications if ApplicationSet is deleted |
I've got to the point where I just don't want this post install hook, I have a handful of custom objects I'm capable of migrating myself I just want the option to not have to deal with this any more. I get the good intention but this has totally broken my production environment from being able to configure karpenter and that's not ideal. |
To summarize, what I tried and my current method to update karpenter from Solution provided by @jetersen, doesn't quite work for me, sometimes the post-hook in ArgoCD remains stuck. As a workaround, a method that works for me is to not sync everything. This way the post-hook will not be triggered and the application in ArgoCD will be marked as synced without any error. BEFORE updating to 1.0.0, I moved karpenter to the So to sum up, karpenter must be in namespace I know that this is a very ugly solution, but for now it's probably the only one that works for me. |
I've just forked the chart code into a repo with my PR in it and pulling from there until it's done. If it ever gets merged or resolved so I don't need to use it I can revert back to the upstream. |
Be careful upgrading with argocd... The missing link for me was karpenter was downgrading my v1 nodepools to v1beta because I missed the CRDs for production clusters because I did helm upgrade in staging cluster, discovered issues and converted to kustomize and deployed it without the new CRDs in production (missing includeCRDs in kustomization.yaml) and now I got several dead clusters... Because conflicts and old webhook validations leftovers. I had to delete all webhook validation and mutation for karpenter, and remember to install CRDs. apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
helmCharts:
- name: karpenter
repo: oci://public.ecr.aws/karpenter
version: 1.0.0
releaseName: karpenter
namespace: kube-system
valuesFile: values.yaml
includeCRDs: true This is an application inside applicationset called cluster-addons that has helm templating for their argocd apps. apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: "karpenter-{{ .Values.spec.destination.clusterName }}"
namespace: argocd
annotations:
argocd.argoproj.io/manifest-generate-paths: .
labels:
platform: "true"
spec:
destination:
server: "{{ .Values.spec.destination.server }}"
namespace: kube-system
source:
path: charts/karpenter
repoURL: https://...
targetRevision: main
kustomize:
patches:
- target:
group: batch
version: v1
kind: Job
name: karpenter-post-install-hook
patch: |-
- op: replace
path: /spec/ttlSecondsAfterFinished
value: 30
- target:
version: v1
kind: ServiceAccount
name: karpenter
patch: |-
- op: add
path: /metadata/annotations
value:
eks.amazonaws.com/role-arn: "arn:aws:iam::{{ .Values.eks.cluster.accountId }}:role/eks-{{ .Values.eks.cluster.name }}-karpenter-role"
- target:
version: v1
kind: Deployment
name: karpenter
patch: |-
- op: add
path: /spec/template/spec/containers/0/env/-
value:
name: CLUSTER_NAME
value: "{{ .Values.eks.cluster.name }}"
- op: add
path: /spec/template/spec/containers/0/env/-
value:
name: INTERRUPTION_QUEUE
value: "eks-{{ .Values.eks.cluster.name }}-interruption-queue"
- op: add
path: /spec/template/spec/containers/0/env/-
value:
name: AWS_REGION
value: "{{ .Values.eks.cluster.region }}" |
Fixes an argocd issue where helm hooks never finish syncing when they have ttlSecondsAfterFinished set to 0. See related argocd issue: argoproj/argo-cd#6880 Suggested workaround as implemented by argocd team: argoproj/argo-helm#2861
Closing this issue tentatively now the hook has been removed (but before publishing a new chart/version). |
Just a note that I had opened up a similar issue to this yesterday @ #6847 . In our case, the name of the Service defined in the static CRDs differed from the ArgoCD / Helm generated resource name for the service leading to a borked migration and rollback. |
The problem with the Application-wide |
Description
Observed Behavior:
Due to
karpenter-provider-aws/charts/karpenter/templates/post-install-hook.yaml
Line 15 in a70356a
Expected Behavior:
Chart syncs via ArgoCD installation.
Reproduction Steps (Please include YAML):
Install the karpenter chart via ArgoCD and default values.
Versions:
kubectl version
): 1.30.0The text was updated successfully, but these errors were encountered: