-
Notifications
You must be signed in to change notification settings - Fork 15
Adding support for GKE preemptibles nodes #30
base: master
Are you sure you want to change the base?
Adding support for GKE preemptibles nodes #30
Conversation
Context: When a GKE nodes get preempted, pods that were on the pods remain there and will run when the nodes comes back online. It is also possible that the daemonset pods status will be ready even when the node is not yet ready (possibly staled cache) Solution: - Added an option to configure the effect to set for the taint - Taint the node in case the pod is ready but the node is not
Why do you need to add a taint if the node is not ready? Kubernetes by default adds a |
Because we also want to wait for pods from a daemonset to be ready, not just the node. |
Sorry I still don't follow, if the daemonset pods aren't ready nidhogg will add the nidhogg taints as it normally does, why do we need this extra check? |
Sorry for the confusion. I will explain a bit more our use case and what we have experienced using Nidhogg. We are using GKE with preemptible nodes. That means that our nodes live at most 24h and get replaced continuously. We have a critical networking daemonset deployed and it must absolutely run before other pods can run. While using the current version of Nidhogg, I have seen the following happening when a node was coming back after being preempted:
Hopefully that explains better 😅 |
@Joseph-Irving Do you have more questions/concerns? |
So if I understand this correctly in GKE when your preemptible nodes get shut down they later start back up again? So the same node comes back up with the pods it previously had running on it? We use spot instances in AWS and they work in a similar way, but when they're terminated that's it, they're gone. A new node will replace them, so there's no weird stale cache thing going on. |
I see the point of @fallard84, let me rephrase:
@Joseph-Irving would you be willing to merge this improvement? For our use case this feature is critical! |
This PR seems very useful in being able to customize the effect in the taint, eg. I have a use-case for https://cloud.google.com/kubernetes-engine/docs/how-to/node-taints |
Context: When a GKE nodes get preempted, pods that were on the pods remain there and will run when the nodes comes back online. It is also possible that the daemonset pods status will be ready even when the node is not yet ready (possibly staled cache)
Solution: