Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Significant CPU usage and possibly etcd usage when deploying this #447

Open
drewwells opened this issue Jun 10, 2024 · 8 comments
Open

Significant CPU usage and possibly etcd usage when deploying this #447

drewwells opened this issue Jun 10, 2024 · 8 comments

Comments

@drewwells
Copy link

We noticed our ETCD storage usage doubled after doing a production release that included deploying reflector. Is there an architecture document for how this service watches for object changes and decided on API calls to make to kubeapi?

We have one configmap that rarely changes. This is the labels and annotations on it.

metadata:
  annotations:
    checksum/configmap: 4420642124fb6c99affe13e8904ba3ede9bee1d41edc0df8a50696833fe15fca
    reflector.v1.k8s.emberstack.com/reflection-allowed: "true"
    reflector.v1.k8s.emberstack.com/reflection-auto-enabled: "true"
  creationTimestamp: "2024-04-16T20:07:50Z"
  labels:
    reflector.v1.k8s.emberstack.com/reflection-allowed: "true"

Here's the CPU and memory usage of reflector

❯ k -n reflector top po --containers                                                                                        🗑️  env-2a
POD                          NAME        CPU(cores)   MEMORY(bytes)
reflector-5bc45489b8-k9g7f   reflector   1423m        332Mi
@drewwells
Copy link
Author

I see this happening every 3 seconds. Does this service act like a watcher, watching for changes in the cluster? Can we add labels so it only looks at specific configmaps instead of looking at all of them

[reflector-5bc45489b8-k9g7f] 2024-06-10 16:08:10.306 +00:00 [INF] (ES.Kubernetes.Reflector.Core.ConfigMapWatcher) Session closed. Duration: 00:00:03.3548677. Faulted: False.
[reflector-5bc45489b8-k9g7f] 2024-06-10 16:08:10.306 +00:00 [INF] (ES.Kubernetes.Reflector.Core.ConfigMapWatcher) Requesting V1ConfigMap resources
[reflector-5bc45489b8-k9g7f] 2024-06-10 16:08:13.336 +00:00 [INF] (ES.Kubernetes.Reflector.Core.ConfigMapMirror) Auto-reflected feature-flag/ff-feature-flag where permitted. Created 0 - Updated 0 - Deleted 0 - Validated 299.
[reflector-5bc45489b8-k9g7f] 2024-06-10 16:08:13.343 +00:00 [INF] (ES.Kubernetes.Reflector.Core.ConfigMapWatcher) Session closed. Duration: 00:00:03.0375045. Faulted: False.
[reflector-5bc45489b8-k9g7f] 2024-06-10 16:08:13.343 +00:00 [INF] (ES.Kubernetes.Reflector.Core.ConfigMapWatcher) Requesting V1ConfigMap resources
[reflector-5bc45489b8-k9g7f] 2024-06-10 16:08:15.826 +00:00 [INF] (ES.Kubernetes.Reflector.Core.ConfigMapWatcher) Session closed. Duration: 00:00:02.4830903. Faulted: False.
[reflector-5bc45489b8-k9g7f] 2024-06-10 16:08:15.826 +00:00 [INF] (ES.Kubernetes.Reflector.Core.ConfigMapWatcher) Requesting V1ConfigMap resources
[reflector-5bc45489b8-k9g7f] 2024-06-10 16:08:19.845 +00:00 [INF] (ES.Kubernetes.Reflector.Core.ConfigMapMirror) Auto-reflected feature-flag/ff-feature-flag where permitted. Created 0 - Updated 0 - Deleted 0 - Validated 299.
[reflector-5bc45489b8-k9g7f] 2024-06-10 16:08:19.859 +00:00 [INF] (ES.Kubernetes.Reflector.Core.ConfigMapWatcher) Session closed. Duration: 00:00:04.0327364. Faulted: False.
[reflector-5bc45489b8-k9g7f] 2024-06-10 16:08:19.859 +00:00 [INF] (ES.Kubernetes.Reflector.Core.ConfigMapWatcher) Requesting V1ConfigMap resources

@winromulus
Copy link
Contributor

@drewwells reflector opens a watcher with a default timeout (in k8s) of around 40 minutes. The fact that the connection closes every 3 seconds is extremely odd. I would need to know more about the setup. Also, are you sure you didn't set the timeout to something like 3 seconds in the configuration?
Please add more details about the host of k8s, if it's k8s or some other variant etc.

@drewwells
Copy link
Author

Nothing special about the cluster, it's running v1.25.10

@winromulus
Copy link
Contributor

@drewwells Is this standard k8s or any other flavor (like k3s or something). Also are you self hosting or using a cloud provider?

@drewwells
Copy link
Author

it's deployed with kops and the nodes are hosted on AWS. Hmm, usage is vastly different across clusters. The only thing that is consistent is significant ETCD storage usage like 2x before deploying the service.

# staging environment
❯ k -n reflector top po --containers                                                                                                                   
POD                         NAME        CPU(cores)   MEMORY(bytes)
reflector-c786c5fb4-jmqg9   reflector   4m           163Mi

# dev environment
❯ k -n reflector top po --containers                                                                                                                  
POD                          NAME        CPU(cores)   MEMORY(bytes)
reflector-5bc45489b8-k9g7f   reflector   1154m        322Mi

@zzjin
Copy link

zzjin commented Jun 28, 2024

Same issue here:

# env1
POD                          NAME        CPU(cores)   MEMORY(bytes)
reflector-5dddff7688-rp6tx   reflector   1403m        205Mi

# env2
POD                          NAME        CPU(cores)   MEMORY(bytes)
reflector-64dcc58c5f-wrh8q   reflector   2636m        370Mi

What I found is that cpu usage is high when there are too many secrets/configmaps.

> kubectl get secrets -A | wc -l

# env1
24121

# env2
88112

Both cluster doing one and only one same thing: copy one given namespace's TLS secret to other namespaces.
witch means, env1 have one base TLS secret and about 2W+ reflected secrets and env2 have one and about 8W+ reflected secrets.
The base secret is barely changed(90d to upgrade).

  annotations:
    cert-manager.io/alt-names: "*.example.io,example.io"
    cert-manager.io/certificate-name: wildcard-example-io
    cert-manager.io/common-name: example.io
    cert-manager.io/ip-sans: ""
    cert-manager.io/issuer-group: ""
    cert-manager.io/issuer-kind: ClusterIssuer
    cert-manager.io/issuer-name: cluster-issuer-example
    cert-manager.io/uri-sans: ""
    reflector.v1.k8s.emberstack.com/reflection-allowed: "true"
    reflector.v1.k8s.emberstack.com/reflection-allowed-namespaces: \w+-system,\w+-frontend,ns-[\-a-z0-9]*
    reflector.v1.k8s.emberstack.com/reflection-auto-enabled: "true"
    reflector.v1.k8s.emberstack.com/reflection-auto-namespaces: \w+-system,\w+-frontend,ns-[\-a-z0-9]*
  labels:
    controller.cert-manager.io/fao: "true"

IMO, reflector controller only monitor one namespace's secret and copy it to others when changes happen
Wonder why cpu is related to cluster's all secret counts?

kubernetes is standard deployed on GCP VM.

@drewwells
Copy link
Author

An easy way to limiter the watchers is using labels. Also usage goes up after it creates configmaps or secrets, I don't think it needs to watch generated resources. If people change them, let it be until next sync wipes out those changes.

@arjun-beathi
Copy link

In my case, if same secret name in two different namespaces try to sync to common namespace thats when I see reflector closing connection every 4secs.

example: "secretA" from "nsA" and "nsB" trying to sync to "nsC".
Easy reproducible issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

4 participants