Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reflector ListAndWatch trace error #858

Open
bygui86 opened this issue Nov 22, 2022 · 4 comments
Open

Reflector ListAndWatch trace error #858

bygui86 opened this issue Nov 22, 2022 · 4 comments
Labels
needs-more-info Further information is needed

Comments

@bygui86
Copy link

bygui86 commented Nov 22, 2022

Description

In Botkube logs I see multiple times following trace error:

I1122 08:58:32.111260       1 trace.go:205] Trace[1926275561]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:169 (22-Nov-2022 08:58:21.197) (total time: 10913ms):
Trace[1926275561]: ---"Objects listed" error:<nil> 10912ms (08:58:32.110)
Trace[1926275561]: [10.913396443s] [10.913396443s] END
I1122 08:58:33.410831       1 trace.go:205] Trace[2102234783]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:169 (22-Nov-2022 08:58:21.197) (total time: 12212ms):
Trace[2102234783]: ---"Objects listed" error:<nil> 12212ms (08:58:33.410)
Trace[2102234783]: [12.212870039s] [12.212870039s] END

What is it related to?
How can I fix them?

It seems there is no impact on Botkube functionalities, but it's raising attention of our alerts...

Expected behavior

Do not see any trace error log.

Steps to reproduce

  1. deploy Botkube
  2. watch logs

Additional information

  • Botkube version: 0.15.0
  • K8s version: 1.24.4-gke.800
@bygui86 bygui86 added the bug Something isn't working label Nov 22, 2022
@pkosiec
Copy link
Member

pkosiec commented Nov 22, 2022

Hey @bygui86,
This log comes from the Kubernetes client-go package we use. As you can see here, this is reported when listing the resources takes too long time (more than 10s). Interestingly, it looks like in your case it never goes to the next step after "Objects listed", even if the ListAndWatch should list all items and then establish a single connection to watch the resources (after a successful list - here).

I couldn't reproduce this behavior on a few long-running Botkube 0.15 installations (k3d, colima and Oracle Cloud clusters). I'd suspect some issues with your cluster (K8s API server specifically), or some inability to set up watch connection from the Botkube pod. Do you have proper RBAC configured for Botkube pod like in the default configuration?

And yes, we can suppress such logs (like here: https://github.com/werf/werf/pull/1754/files), but I'm not sure if this is a good idea 🤔 This might sometimes help with debugging some issues. But one thing we can do is to link these traces with info log level only.

But here, in this case, I think there's an error which results in constantly listing all resources. Do the Botkube notifications work for you as they should?

@pkosiec pkosiec added needs-more-info Further information is needed and removed bug Something isn't working labels Nov 22, 2022
@bygui86
Copy link
Author

bygui86 commented Nov 22, 2022

@pkosiec thanks for your quick answer.

I couldn't reproduce this behavior on a few long-running Botkube 0.15 installations (k3d, colima and Oracle Cloud clusters).

I run Botkube in 2 GKE cluster, one public, one private.

I'd suspect some issues with your cluster (K8s API server specifically), or some inability to set up watch connection from the Botkube pod. Do you have proper RBAC configured for Botkube pod like in the default configuration?

I cannot give Botkube the default amount of permissions, it's too high. Especially about Secrets.
I configured RBAC and tested it:

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: botkube
rules:
  # K8s
  - apiGroups:
      - ""
    resources:
      - events
      - configmaps
      - endpoints
      - limitranges
      - namespaces
      - nodes
      - persistentvolumeclaims
      - persistentvolumes
      - pods
      - resourcequotas
      - serviceaccounts
      - services
    verbs:
      - get
      - watch
      - list
  - apiGroups:
      - apps
    resources:
      - deployments
      - statefulsets
      - daemonsets
    verbs:
      - get
      - watch
      - list
  - apiGroups:
      - batch
    resources:
      - cronjobs
      - jobs
    verbs:
      - get
      - watch
      - list
  - apiGroups:
      - policy
    resources:
      - poddisruptionbudgets
      - podsecuritypolicies
    verbs:
      - get
      - watch
      - list
  - apiGroups:
      - apiextensions.k8s.io
    resources:
      - customresourcedefinitions
    verbs:
      - get
      - watch
      - list
  - apiGroups:
      - autoscaling
    resources:
      - horizontalpodautoscalers
    verbs:
      - get
      - watch
      - list
  - apiGroups:
      - rbac.authorization.k8s.io
    resources:
      - clusterrolebindings
      - clusterroles
      - rolebindings
      - roles
    verbs:
      - get
      - watch
      - list
  - apiGroups:
      - admissionregistration.k8s.io
    resources:
      - mutatingwebhookconfigurations
      - validatingwebhookconfigurations
    verbs:
      - get
      - watch
      - list
  - apiGroups:
      - networking.k8s.io
    resources:
      - networkpolicies
      - ingresses
    verbs:
      - get
      - watch
      - list
  - apiGroups:
      - scheduling.k8s.io
    resources:
      - priorityclasses
    verbs:
      - get
      - watch
      - list
  - apiGroups:
      - storage.k8s.io
    resources:
      - storageclasses
    verbs:
      - get
      - watch
      - list
  - apiGroups:
      - snapshot.storage.k8s.io
    resources:
      - volumesnapshotclasses
      - volumesnapshotcontents
      - volumesnapshot
    verbs:
      - get
      - watch
      - list
  # CRDs
  - apiGroups:
      - cloud.google.com
      - internal.autoscaling.gke.io
      - networking.gke.io
      - hub.gke.io
      - nodemanagement.gke.io
      - crd.projectcalico.org
      - rbacmanager.reactiveops.io
      - cassandraoperator.instaclustr.com
      - kafka.strimzi.io
      - monitoring.coreos.com
      - logging.banzaicloud.io
      - logging-extensions.banzaicloud.io
      - traefik.containo.us
    resources:
      - "*"
    verbs:
      - get
      - watch
      - list

And yes, we can suppress such logs (like here: https://github.com/werf/werf/pull/1754/files), but I'm not sure if this is a good idea 🤔 This might sometimes help with debugging some issues. But one thing we can do is to link these traces with info log level only.

I see your point but this should be a "debug" level log, not info. I suggest not to suppress it, but to make level lower so we could enable it in case of deeper debugging required.

But here, in this case, I think there's an error which results in constantly listing all resources. Do the Botkube notifications work for you as they should?

Notifications seem to work fine, but we have a huge configuration so I cannot distinguish if we are getting the same amount of notifications as before.
I noticed that such logs appears 4/5 time after Botkube startup and then never more for hours.

@pkosiec
Copy link
Member

pkosiec commented Nov 23, 2022

I noticed that such logs appears 4/5 time after Botkube startup and then never more for hours.

Oh, okay, so that means it's not a big issue. Alright, I think we can reconfigure klog based on the log configuration 👍 The only hesitation from my side is that the traces you see are actually in the klog INFO level, so I'm not sure if they should be moved to debug 🤔 @mszostok WDYT?

@bygui86
Copy link
Author

bygui86 commented Nov 23, 2022

@pkosiec for now not a big issue, but I don't know long-term.
Anyway I configured Botkube to log in JSON format, so it would be really nice to have such logs converted in JSON as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-more-info Further information is needed
Projects
None yet
Development

No branches or pull requests

2 participants