-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Nodes unreachable under high memory; kubelet not evicting pods #11312
Comments
Does this node have swap enabled? Why aren't you setting memory limits that are at least lower than what's available on your node? Prometheus is pretty intense, I'm not sure I'd try to run it with less than 4gb allocated just to it, let alone running it on a node with only 4gb total. |
Hi @brandond. Thanks for the prompt response. I don't want to give the impression that the problem is with Prometheus itself; the same problem happened in the past on nodes that were not running Prometheus. To be clear, Prometheus does indeed use a lot of memory, somewhere in the range of 2.5Gi to be more precise. With that said, setting a memory limit did not resolve the issue. The pod was never OOM killed for whatever reason. What's more, the same issue occurred on other nodes with 8Gi of memory. And no, swap is not enabled on any of our nodes. |
I'd probably try to figure out why the nodes are unreachable. All "NotReady" means is that the kubelet has stopped updating the Node heartbeat timestamp. As to why that is happening, you'd have to get into the logs on the node. Is K3s crashing? Is the kernel crashing? Is the node just thrashing in OOM because you haven't set any limits? |
Hello,
I’m running a Kubernetes cluster with k3s, and I’ve been experiencing intermittent issues with some of the nodes. Occasionally, nodes become unreachable, changing from “Ready” to “NotReady,” causing all workloads on them to be inaccessible. I often need to reboot the node to resolve the issue. The problem is very similar to the one described here.
In the most recent incident, I noticed that the Prometheus pod was consuming unusually high memory, which seemed to trigger the issue (no memory limits are set on the pod). My question is: why isn’t memory pressure kicking in and prompting the kubelet to evict pods? I’m using the kube-hetzner project, mostly with default settings. The node in question has 3 vCPUs and 4 GB RAM. These are the kubelet args taken from
/etc/rancher/k3s/config.yaml
:It's worth mentioning that I couldn't reproduce the issue. I stress-tested the node but the pods are always killed before they make the node unstable.
Is there a configuration I can adjust to ensure the kubelet has enough headroom to start evicting pods before the node becomes completely unreachable? I’ve observed that, in another managed cluster we use, nodes never go down under similar conditions. Instead, the node is marked with
MemoryPressure
and the eviction process starts to prevent node instability.Any insights on how to achieve similar resilience would be greatly appreciated! Thank you.
The text was updated successfully, but these errors were encountered: