Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory Leak using woodpecker with kubernetes #4228

Open
3 tasks done
lara-clink opened this issue Oct 14, 2024 · 5 comments
Open
3 tasks done

Memory Leak using woodpecker with kubernetes #4228

lara-clink opened this issue Oct 14, 2024 · 5 comments
Labels
bug Something isn't working

Comments

@lara-clink
Copy link

Component

agent

Describe the bug

I’ve been encountering what appears to be a memory leak issue when running Woodpecker CI on a Kubernetes cluster. After running pipelines over time, I noticed that the memory usage of the Woodpecker agents and server steadily increases, eventually leading to performance degradation and, in some cases, the need for manual intervention to prevent the system from becoming unresponsive.

Steps to reproduce

Deploy Woodpecker CI in a Kubernetes environment.
Run multiple pipelines continuously over an extended period.
Monitor memory usage of the Woodpecker agents and server, I will attach my grafana memory usage graph.
Notice that memory consumption increases over time without being released after pipeline execution.
image

Expected behavior

Memory usage should stabilize after pipeline executions are completed, and unused memory should be reclaimed properly.

System Info

Woodpecker Version: 2.7.0
Kubernetes Version: v1.29.4
Environment: Running Woodpecker on a Kubernetes cluster
Number of agents: 10

Additional context

I am using golang profiling to find something about it, this is what I could find so far:
Captura de Tela 2024-10-14 às 14 03 23
woodpeckergraph

Has anyone ever faced an issue like this?

Validations

  • Read the docs.
  • Check that there isn't already an issue that reports the same bug to avoid creating a duplicate.
  • Checked that the bug isn't fixed in the next version already [https://woodpecker-ci.org/faq#which-version-of-woodpecker-should-i-use]
@lara-clink lara-clink added the bug Something isn't working label Oct 14, 2024
@zc-devs
Copy link
Contributor

zc-devs commented Oct 15, 2024

Has anyone ever faced an issue like this?

Not me. But have no such a load (10 agents) :)

  1. When did it start / what is the behavior on the previous versions? Have you tested on 2.7.1, next?
  2. How to gather this pprof statistic? Is there some guide? I didn't find anything in the WP docs.
  3. Nice pprof info, but this screens from an agent, which allocated 44.36 MB of memory, if I understand correctly. However, Grafana shows memory usage around 1 GB and that is the issue (I suppose). It would be nice if you had pprof stats from a mentioned agent.
  4. What is the load? I mean WOODPECKER_MAX_WORKFLOWS and how many do you run simultaneously?
    Could you explain the right half of the Grafana chart? Something like:
  • at this point we run 1 pipeline with 10 workflows
  • at this point they all finished
  • at this point we run another 10 pipelines with 1 workflow
  • at this point they finished and there were no load at all for next 1 hour
  1. What is the config of the Server? How much instances? What's about database? What is the load on Server and database?
  2. Where do you store the pipeline (steps) logs?

@lara-clink
Copy link
Author

Hey @zc-devs , we are currently working on our migration project (automated migration from Drone CI to Woodpecker) and I could not collect all of the answers for you yet. By the end of this week I should be able to come back to that.

@lara-clink
Copy link
Author

Has anyone ever faced an issue like this?

Not me. But have no such a load (10 agents) :)

  1. When did it start / what is the behavior on the previous versions? Have you tested on 2.7.1, next?
  2. How to gather this pprof statistic? Is there some guide? I didn't find anything in the WP docs.
  3. Nice pprof info, but this screens from an agent, which allocated 44.36 MB of memory, if I understand correctly. However, Grafana shows memory usage around 1 GB and that is the issue (I suppose). It would be nice if you had pprof stats from a mentioned agent.
  4. What is the load? I mean WOODPECKER_MAX_WORKFLOWS and how many do you run simultaneously?
    Could you explain the right half of the Grafana chart? Something like:
  • at this point we run 1 pipeline with 10 workflows
  • at this point they all finished
  • at this point we run another 10 pipelines with 1 workflow
  • at this point they finished and there were no load at all for next 1 hour
  1. What is the config of the Server? How much instances? What's about database? What is the load on Server and database?
  2. Where do you store the pipeline (steps) logs?
  1. We started to use woodpecker in 2.3.0, and since that we are facing memory leak issues, so we can not know since which version the problem occurs. We have not tested later versions since 2.7.0;
  2. We ran a forked version from 2.7.0
    I used this tutorial to do it: https://hackernoon.com/go-the-complete-guide-to-profiling-your-code-h51r3waz;
  3. There you go:
Captura de Tela 2024-10-29 às 10 29 21 Captura de Tela 2024-10-29 às 10 30 41
  1. The WOODPECKER_MAX_WORKFLOWS is 10 and we have 15 pods, so it is 150 workflows simultaneously. But the grafana just shows that memory usage increases as we still use Woodpecker. The low points means just that we had a deployment and the pods restarted;

memory: 4Gi
requests:
cpu: '2'
memory: 4Gi

@zc-devs
Copy link
Contributor

zc-devs commented Oct 29, 2024

  1. Thank you for the guide. Sadly, it's not so convenient to patch and build own version. Could you make a PR with pprof functionality? It should be optional, so flag like WOODPECKER_PPROF_ENABLED: true|false. It would be helpful in the future for all users.
  2. What are the versions of
k8s.io/api
k8s.io/apimachinery
k8s.io/client-go

in your fork? Have you tried to update it?


Entertaining discussion. Even shared informer has been mentioned.

@lara-clink
Copy link
Author

those are:
k8s.io/api v0.30.2
k8s.io/apimachinery v0.30.2
k8s.io/client-go v0.30.2

and we have not tried updating it yet

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants