Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

step-issuer being OOMKilled when there's relatively many CertificateRequests #60

Open
LarsBingBong opened this issue Jan 16, 2023 · 1 comment
Assignees

Comments

@LarsBingBong
Copy link

So the step-issuer workloads gets OOMKilled when the CertificateRequest object count in X Kubernetes cluster reaches 4334. We experienced over the just passed weekend.

We're on:

  • step-issuer v0.6.0
  • Kubernetes K3s v1.24.6+k3s1
  • cert-manager v1.9.1

When we troubleshooted the issue we restarted the Pod by simply deleting it. Then we followed its startup flow and saw that as it comes up healthy it starts parsing through all CertificateRequest objects on the cluster. This logically uses memory. Apparently so much memory that the step-issuer workload is OOMKilled.


We managed to WORK AROUND it by bumping the resources that the step-issuer can use. From the default value on the Memory limits of 128Mi ( https://github.com/smallstep/helm-charts/blob/master/step-issuer/values.yaml#L34 ) to 500Mi.

This allowed the step-issuer workload to parse all the CertificateRequests and stay healthy.

A more permanent and better solution will be to use the Cert-manager cert-manager.io/revision-history-limit: "5" Ingress annotation. As this will seriously limit the amount of CertificateRequest objects on the cluster.

With that somewhat long intro here's my hot take.

  1. Why in the first place is the step-issuer parsing all the CertificateRequests on the cluster?
    2. Why not limit it to only parse the CertificateRequest created by the event that triggered an issuance of a Certificate or a renewal of a Certificate?

What's the reasoning? Or am I misunderstanding how things works under the hood?


Looking forward to some input and replies on this issue.

🙏🏿 you and have ☀️ day.

@maraino
Copy link
Collaborator

maraino commented Feb 24, 2023

Hi @LarsBingBong, thanks for reporting this. Right now, we don't have the resources to fix this issue, but you've found a workaround by changing the resources.

We will investigate if there's a way to reduce memory usage in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants