You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Nov 17, 2022. It is now read-only.
We're using kops 1.10.0 and k8s 1.10.11. We're using two separate instance groups (IG), nodes (on-demand) and spots (spot), both spread across 3 availability zones. I've applied the appropriate nodeLabels and have defined the following in my k8s-spot-rescheduler deployment manifest:
The nodes IG has the spot=false:PreferNoSchedule taint so the spots IG is preferred. I'm using the cluster autoscaler to autodiscover both IGs via the --node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,kubernetes.io/cluster/kubernetes.metis.wtf and these tags exist on both IGs. I've confirmed that pods on most nodes nodes are able to be drained and moved to spots nodes. With an exception:
k8s-spot-reschedule picks a node and states
moved. Will drain node.
which isn't true
It then figures out it's unable to drain the node due to PDBs
E0117 14:03:51.801764 1 rescheduler.go:302] Failed to drain node: Failed to drain node /ip-172-
20-61-39.ec2.internal, due to following errors: [Failed to evict pod skafos-notebooks/hub-
deployment-cf799d494-gp6z4 within allowed timeout (last error: Cannot evict pod as it would
violate the pod's disruption budget.)]
and aborts the drain.
Now we're left with an on-demand node that has had all of its pods evicted except those with PDBs, leaving the on-demand node underutilized and tainted with ToBeDeletedByClusterAutoscaler. It seems like it should check if it can drain all pods, taking into consideration PDBs, and if it can't, don't evict any pods and don't taint with ToBeDeletedByClusterAutoscaler.
The text was updated successfully, but these errors were encountered:
We're using kops
1.10.0
and k8s1.10.11
. We're using two separate instance groups (IG),nodes
(on-demand) andspots
(spot), both spread across 3 availability zones. I've applied the appropriate nodeLabels and have defined the following in my k8s-spot-rescheduler deployment manifest:The
nodes
IG has thespot=false:PreferNoSchedule
taint so thespots
IG is preferred. I'm using the cluster autoscaler to autodiscover both IGs via the--node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,kubernetes.io/cluster/kubernetes.metis.wtf
and these tags exist on both IGs. I've confirmed that pods on mostnodes
nodes are able to be drained and moved tospots
nodes. With an exception:k8s-spot-reschedule picks a node and states
which isn't true
It then figures out it's unable to drain the node due to PDBs
and aborts the drain.
Now we're left with an on-demand node that has had all of its pods evicted except those with PDBs, leaving the on-demand node underutilized and tainted with
ToBeDeletedByClusterAutoscaler
. It seems like it should check if it can drain all pods, taking into consideration PDBs, and if it can't, don't evict any pods and don't taint withToBeDeletedByClusterAutoscaler
.The text was updated successfully, but these errors were encountered: