Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Apps don't sync every are in stuck in refreshing #18306

Open
3 tasks done
haooliveira84 opened this issue May 20, 2024 · 7 comments
Open
3 tasks done

Apps don't sync every are in stuck in refreshing #18306

haooliveira84 opened this issue May 20, 2024 · 7 comments
Labels
bug Something isn't working component:application-controller component:sync version:2.11 Latest confirmed affected version is 2.11

Comments

@haooliveira84
Copy link

haooliveira84 commented May 20, 2024

Checklist:

  • I've searched in the docs and FAQ for my answer: https://bit.ly/argocd-faq.
  • I've included steps to reproduce the bug.
  • I've pasted the output of argocd version.

Describe the bug
I've updated the ArgoCD from v2.9.5 to 2.11.0 with dynamicClusterDistribution: enabled and the ArgoCD Applications don't sync every are stuck in refreshing, I see the application controller does not have many logs.
image

To Reproduce
1 - Install Argocd with helm-chart with version: 6.9.3
2 - Edit the values.yaml and configure the controller to dynamicClusterDistribution: enabled

Expected behavior

The application controller balances the clusters with yours pods.
Sync stuck resolved

@haooliveira84 haooliveira84 added the bug Something isn't working label May 20, 2024
@haooliveira84
Copy link
Author

tagging @ishitasequeira from #15036

@haooliveira84
Copy link
Author

Anyone looking to this?

@jenna-foghorn
Copy link

jenna-foghorn commented Jun 14, 2024

same /similar: #18467

Temp band-aid is to kick the argocd-application-controller via kubectl rollout restart statefulset argocd-application-controller -n argocd

@oscrx
Copy link
Contributor

oscrx commented Sep 18, 2024

I had this issue as well, I thought that it happened because the cluster ownership registration in the configmap was not being expired.
I didn't look into it further then because I ran it on prod and people started complaining :)

Maybe the application controllers can use a Kubernetes lease for every cluster instead of a configmap to keep track of the ownership between application controllers.

@oscrx
Copy link
Contributor

oscrx commented Sep 18, 2024

Our configmap stays healthy after restarts, I'll keep it running for a while and see where it breaks. (3 instances currently)

❯ k get cm argocd-app-controller-shard-cm -o yaml | k neat
apiVersion: v1
data:
  shardControllerMapping: '[{"ShardNumber":0,"ControllerName":"argocd-application-controller-bb77ddc6-vlfhk","HeartbeatTime":"2024-09-18T07:06:37Z"},{"ShardNumber":1,"ControllerName":"argocd-application-controller-bb77ddc6-7fvrd","HeartbeatTime":"2024-09-18T07:06:36Z"},{"ShardNumber":2,"ControllerName":"argocd-application-controller-bb77ddc6-t8gpj","HeartbeatTime":"2024-09-18T07:06:35Z"}]'
kind: ConfigMap
metadata:
  name: argocd-app-controller-shard-cm
  namespace: argocd

@jmmclean
Copy link

Im having to kick the application controller every day at this point. I encounter cache: key is missing log events, then it seems ArgoCD goes into a deadlock.

Similar Potential issue #18503

The CPU usage is a clear indicator of the app controller being dead:
image

Also, the saw toothiness of the CPU usage correlates w/ this key is missing error. Also seems my workqueue depth peaks at 434:
image

ArgoCD version: 2.11.3

cc @crenshaw-dev

@jmmclean
Copy link

Following up on the above, after configuring shards with the application-controller and explicitly defining shard: n in the cluster secret, our caching issue went away as well as our workqueue deadlock (we did end up upgrade to 2.11.5 as well as there was a deadlock issue, but it did not resolve our specific deadlock issue).

@andrii-korotkov-verkada andrii-korotkov-verkada added the version:2.11 Latest confirmed affected version is 2.11 label Nov 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working component:application-controller component:sync version:2.11 Latest confirmed affected version is 2.11
Projects
None yet
Development

No branches or pull requests

6 participants