Adding support for GKE preemptibles nodes #30

fallard84 · 2021-06-09T20:56:09Z

Context: When a GKE nodes get preempted, pods that were on the pods remain there and will run when the nodes comes back online. It is also possible that the daemonset pods status will be ready even when the node is not yet ready (possibly staled cache)

Solution:

Added an option to configure the effect to set for the taint
Taint the node in case the pod is ready but the node is not

Context: When a GKE nodes get preempted, pods that were on the pods remain there and will run when the nodes comes back online. It is also possible that the daemonset pods status will be ready even when the node is not yet ready (possibly staled cache) Solution: - Added an option to configure the effect to set for the taint - Taint the node in case the pod is ready but the node is not

Joseph-Irving · 2021-06-10T08:23:52Z

Why do you need to add a taint if the node is not ready? Kubernetes by default adds a node.kubernetes.io/not-ready taint when a node isn't ready

fallard84 · 2021-06-10T12:18:15Z

Because we also want to wait for pods from a daemonset to be ready, not just the node.

Joseph-Irving · 2021-06-10T12:24:23Z

Sorry I still don't follow, if the daemonset pods aren't ready nidhogg will add the nidhogg taints as it normally does, why do we need this extra check?

fallard84 · 2021-06-10T15:01:49Z

Sorry I still don't follow, if the daemonset pods aren't ready nidhogg will add the nidhogg taints as it normally does, why do we need this extra check?

Sorry for the confusion. I will explain a bit more our use case and what we have experienced using Nidhogg.

We are using GKE with preemptible nodes. That means that our nodes live at most 24h and get replaced continuously. We have a critical networking daemonset deployed and it must absolutely run before other pods can run. While using the current version of Nidhogg, I have seen the following happening when a node was coming back after being preempted:

Nodes were sometime ready before Nidhogg had time to taint it. The taint was always applied but sometimes slightly too late. That caused pods to start running on the node before the daemonset pod was ready. While troubleshooting this issue, I was able to see that when the node was being initialized and not ready yet, Nidhogg would check for the daemonset pod status and was seeing that the pod was ready (even though the daemonset pod didn't even have time to start yet). I simply assumed this was caused by staled pod status cache. This is why I added a check to see that the node must also be ready in order to add the taint earlier in the process. This extra check could also be set optionally through config if this could cause some issue with some other setup.
Pods without toleration were always running, even when Nidhogg had time to taint the node. Upon investigation, I realized that when a node was getting preempted, all pods that were on the node before the preemption were all already scheduled on the node even before it was ready. The default NoSchedule was only preventing new pods to be scheduled on the node, but not preventing already scheduled pod to start running. Hence adding an option to use NoExecute as the effect.

Hopefully that explains better 😅

fallard84 · 2021-06-25T13:54:11Z

@Joseph-Irving Do you have more questions/concerns?

Joseph-Irving · 2021-06-28T15:10:11Z

So if I understand this correctly in GKE when your preemptible nodes get shut down they later start back up again? So the same node comes back up with the pods it previously had running on it? We use spot instances in AWS and they work in a similar way, but when they're terminated that's it, they're gone. A new node will replace them, so there's no weird stale cache thing going on.
I would rather make this ready check an optional code path, as it seems like a fairly niche edge case.
I think being able to configure taint effects is fine, I would just be cautious with them as noExecute can be quite disruptive. If you had some kind of cluster-wide outage of your networking Daemonset all your pods would be evicted from all of your nodes, which could potentially just be far more disruption than you actually need.

universam1 · 2022-06-10T07:23:06Z

I see the point of @fallard84, let me rephrase:

do not consider removing the nidhogg taint before the node status is ready, it might be too early. The daemonsets haven't been scheduled yet
support noExecute in order to be disruptive by purpose, in order to cordon unhealthy nodes.

@Joseph-Irving would you be willing to merge this improvement? For our use case this feature is critical!

jerkern · 2022-10-18T14:54:09Z

This PR seems very useful in being able to customize the effect in the taint, eg. I have a use-case for PreferNoSchedule rather than plain NoSchedule

https://cloud.google.com/kubernetes-engine/docs/how-to/node-taints

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding support for GKE preemptibles nodes #30

Adding support for GKE preemptibles nodes #30

fallard84 commented Jun 9, 2021

Joseph-Irving commented Jun 10, 2021

fallard84 commented Jun 10, 2021

Joseph-Irving commented Jun 10, 2021

fallard84 commented Jun 10, 2021

fallard84 commented Jun 25, 2021

Joseph-Irving commented Jun 28, 2021

universam1 commented Jun 10, 2022 •

edited

Loading

jerkern commented Oct 18, 2022

Adding support for GKE preemptibles nodes #30

Are you sure you want to change the base?

Adding support for GKE preemptibles nodes #30

Conversation

fallard84 commented Jun 9, 2021

Joseph-Irving commented Jun 10, 2021

fallard84 commented Jun 10, 2021

Joseph-Irving commented Jun 10, 2021

fallard84 commented Jun 10, 2021

fallard84 commented Jun 25, 2021

Joseph-Irving commented Jun 28, 2021

universam1 commented Jun 10, 2022 • edited Loading

jerkern commented Oct 18, 2022

universam1 commented Jun 10, 2022 •

edited

Loading