Memory Leak using woodpecker with kubernetes #4228

lara-clink · 2024-10-14T17:05:03Z

Component

agent

Describe the bug

I’ve been encountering what appears to be a memory leak issue when running Woodpecker CI on a Kubernetes cluster. After running pipelines over time, I noticed that the memory usage of the Woodpecker agents and server steadily increases, eventually leading to performance degradation and, in some cases, the need for manual intervention to prevent the system from becoming unresponsive.

Steps to reproduce

Deploy Woodpecker CI in a Kubernetes environment.
Run multiple pipelines continuously over an extended period.
Monitor memory usage of the Woodpecker agents and server, I will attach my grafana memory usage graph.
Notice that memory consumption increases over time without being released after pipeline execution.

Expected behavior

Memory usage should stabilize after pipeline executions are completed, and unused memory should be reclaimed properly.

System Info

Woodpecker Version: 2.7.0
Kubernetes Version: v1.29.4
Environment: Running Woodpecker on a Kubernetes cluster
Number of agents: 10

Additional context

I am using golang profiling to find something about it, this is what I could find so far:

Has anyone ever faced an issue like this?

Validations

Read the docs.
Check that there isn't already an issue that reports the same bug to avoid creating a duplicate.
Checked that the bug isn't fixed in the next version already [https://woodpecker-ci.org/faq#which-version-of-woodpecker-should-i-use]

The text was updated successfully, but these errors were encountered:

zc-devs · 2024-10-15T09:43:12Z

Has anyone ever faced an issue like this?

Not me. But have no such a load (10 agents) :)

When did it start / what is the behavior on the previous versions? Have you tested on 2.7.1, next?
How to gather this pprof statistic? Is there some guide? I didn't find anything in the WP docs.
Nice pprof info, but this screens from an agent, which allocated 44.36 MB of memory, if I understand correctly. However, Grafana shows memory usage around 1 GB and that is the issue (I suppose). It would be nice if you had pprof stats from a mentioned agent.
What is the load? I mean WOODPECKER_MAX_WORKFLOWS and how many do you run simultaneously?
Could you explain the right half of the Grafana chart? Something like:

at this point we run 1 pipeline with 10 workflows
at this point they all finished
at this point we run another 10 pipelines with 1 workflow
at this point they finished and there were no load at all for next 1 hour

What is the config of the Server? How much instances? What's about database? What is the load on Server and database?
Where do you store the pipeline (steps) logs?

lara-clink · 2024-10-21T13:17:36Z

Hey @zc-devs , we are currently working on our migration project (automated migration from Drone CI to Woodpecker) and I could not collect all of the answers for you yet. By the end of this week I should be able to come back to that.

lara-clink · 2024-10-29T15:44:49Z

Has anyone ever faced an issue like this?

Not me. But have no such a load (10 agents) :)

When did it start / what is the behavior on the previous versions? Have you tested on 2.7.1, next?

How to gather this pprof statistic? Is there some guide? I didn't find anything in the WP docs.

Nice pprof info, but this screens from an agent, which allocated 44.36 MB of memory, if I understand correctly. However, Grafana shows memory usage around 1 GB and that is the issue (I suppose). It would be nice if you had pprof stats from a mentioned agent.

What is the load? I mean WOODPECKER_MAX_WORKFLOWS and how many do you run simultaneously?
Could you explain the right half of the Grafana chart? Something like:

at this point we run 1 pipeline with 10 workflows

at this point they all finished

at this point we run another 10 pipelines with 1 workflow

at this point they finished and there were no load at all for next 1 hour

What is the config of the Server? How much instances? What's about database? What is the load on Server and database?

Where do you store the pipeline (steps) logs?

We started to use woodpecker in 2.3.0, and since that we are facing memory leak issues, so we can not know since which version the problem occurs. We have not tested later versions since 2.7.0;
We ran a forked version from 2.7.0
I used this tutorial to do it: https://hackernoon.com/go-the-complete-guide-to-profiling-your-code-h51r3waz;
There you go:

The WOODPECKER_MAX_WORKFLOWS is 10 and we have 15 pods, so it is 150 workflows simultaneously. But the grafana just shows that memory usage increases as we still use Woodpecker. The low points means just that we had a deployment and the pods restarted;

memory: 4Gi
requests:
cpu: '2'
memory: 4Gi

zc-devs · 2024-10-29T16:15:16Z

Thank you for the guide. Sadly, it's not so convenient to patch and build own version. Could you make a PR with pprof functionality? It should be optional, so flag like WOODPECKER_PPROF_ENABLED: true|false. It would be helpful in the future for all users.
What are the versions of

k8s.io/api
k8s.io/apimachinery
k8s.io/client-go

in your fork? Have you tried to update it?

Entertaining discussion. Even shared informer has been mentioned.

lara-clink · 2024-10-29T19:38:14Z

those are:
k8s.io/api v0.30.2
k8s.io/apimachinery v0.30.2
k8s.io/client-go v0.30.2

and we have not tried updating it yet

lara-clink added the bug Something isn't working label Oct 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory Leak using woodpecker with kubernetes #4228

Memory Leak using woodpecker with kubernetes #4228

lara-clink commented Oct 14, 2024

zc-devs commented Oct 15, 2024

lara-clink commented Oct 21, 2024

lara-clink commented Oct 29, 2024

zc-devs commented Oct 29, 2024 •

edited

Loading

lara-clink commented Oct 29, 2024

Memory Leak using woodpecker with kubernetes #4228

Memory Leak using woodpecker with kubernetes #4228

Comments

lara-clink commented Oct 14, 2024

Component

Describe the bug

Steps to reproduce

Expected behavior

System Info

Additional context

Validations

zc-devs commented Oct 15, 2024

lara-clink commented Oct 21, 2024

lara-clink commented Oct 29, 2024

zc-devs commented Oct 29, 2024 • edited Loading

lara-clink commented Oct 29, 2024

zc-devs commented Oct 29, 2024 •

edited

Loading