-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
application-controlller Watch failed
#15464
Comments
I am experiencing the same. Every 12 hours, I get about 40 or so errors that all say Time | Host
-----------------------------
14:39:19 UTC | aks-general-00000-vmss000002-argocd
"Watch failed" err="context canceled"
-----------------------------
14:39:19 UTC | aks-general-00000-vmss000002-argocd
"Watch failed" err="context canceled"
-----------------------------
14:39:19 UTC | aks-general-00000-vmss000002-argocd
"Watch failed" err="context canceled"
-----------------------------
14:39:19 UTC | aks-general-00000-vmss000002-argocd
"Watch failed" err="context canceled"
-----------------------------
14:39:19 UTC | aks-general-00000-vmss000002-argocd
"Watch failed" err="context canceled"
-----------------------------
14:39:19 UTC | aks-general-00000-vmss000002-argocd
"Watch failed" err="context canceled"
-----------------------------
14:39:19 UTC | aks-general-00000-vmss000002-argocd
"Watch failed" err="context canceled"
-----------------------------
14:39:19 UTC | aks-general-00000-vmss000002-argocd
"Watch failed" err="context canceled"
-----------------------------
14:39:19 UTC | aks-general-00000-vmss000002-argocd
"Watch failed" err="context canceled"
-----------------------------
14:39:19 UTC | aks-general-00000-vmss000002-argocd
"Watch failed" err="context canceled" |
We are still seeing this issue in argocd 2.11.2 that is causing deployment outages for some of our users. We have 1 installation with multiple controller that manage about 40+ clusters |
This might be unrelated but if you are using a limited rbac for argocd app controller instead of the admin rbac with permission to all resources on cluster you might want to either manually put resource inclusions/exclusions or use the respectRBAC feature available to automatically let argocd figure which resources it has access to and needs to monitor/watch. Ref: |
We are also seeing 75-200 of these logs entries from each application controller every 12 hours on v2.11.3. The timing correlates with the cluster's cache age dropping to 0: Here's a zoomed-in look at a 15 minute window: I don't know what this correlation means but thought it might be worth sharing. |
this morning found controller log was logging this error for whole night every second:
there are problems with my argocd, but this does not help to identify the cause |
Experiencing this issue as well. ArgoCD version is |
I see another 3 similar issues, all marked as "resolved" although it is still happening on v2.13.3 for me at least once a day, sometimes - multiple times a day.
Also looks like this one is related as well #20785. And another one requesting more context to be added to this log entry Anyway just tried to sum it up, I tried multiple sharding algorithms, my pods are not starving on resources, everything looks fine and dandy - but the controllers gets dead locked. Resource counts going down, queue depth goes to zero, thousands of Also my cache age metric look weird, here's example for one of the cluster that's currently being stuck. The spike is when the issue started, but how come cache was at 0 before the issue started while the controller was working just fine? Also see #20785 (comment) |
I managed to capture goroutine pprof on a controller in the stuck state. |
As I am continuing my troubleshooting, I've found potentially related problem that might be triggering this bug. I filed it as a new ticket here because it goes out of the scope of the dead lock described here, imo it would still be an issue even if didn't cause a deadlock, not to mention that they may not be even related at all #21506 |
Checklist:
argocd version
.Describe the bug
Describe the bug
Hi,
I am using the
argo-cd
5.46.2
helm chart.I have noticed that every 12 hours the
application-controller
throws the following error:According to this discussion some
watch
permission are missing.Currently the role associated the
application-controller
service account haswatch
onsecrets
andconfigmaps
:Is there something else missing?
To Reproduce
kubectl logs argo-cd-application-controller-0 | grep Watch
Expected behavior
No error
Version
Logs
The text was updated successfully, but these errors were encountered: