You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Karpenter is overestimating the memory capacity of certain node types. When this happens, pods with a certain range of memory requests can trigger Karpenter scale-ups of nodes with insufficient memory for that pending pod to be scheduled. Observing that the pending pod isn't getting scheduled on the newly started node, Karpenter repeatedly attempts to scale up similar nodes with the same result.
In addition to preventing pods from scheduling, this issue has caused us to incur additional costs from third-party integrations that charge by node count, as the repeated erroneous scale ups impact node count metrics used in billing.
In our case, we noticed this with c6g.medium instances running Bottlerocket (with AMI provided by AWS without modification). It's possible that Karpenter underestimates capacity of other instance types and distributions as well, but we have not confirmed this independently. We've also not yet compared the capacity values of c6g nodes running AL2 vs Bottlerocket.
Expected Behavior:
Karpenter should never overestimate the capacity/allocatable of a node (using the default value of VM_MEMORY_OVERHEAD_PERCENT, at least across all unmodified AWS-provided non-custom AMI families).
If this type of situation does occur, Karpenter should not continuously provision new nodes.
We are aware that this risk is called out in the troubleshooting guide:
A VM_MEMORY_OVERHEAD_PERCENT which results in Karpenter overestimating the memory available on a node can result in Karpenter launching nodes which are too small for your workload.
In the worst case, this can result in an instance launch loop and your workload remaining unschedulable indefinitely.
But I think the default should be suitable for all of the AWS-supported non-custom AMI families across instance types and sizes. If this isn't feasible, then perhaps this value should not be a global setting, and should vary by AMI family and instance type/size.
Reproduction Steps (Please include YAML):
The following steps do not reproduce the problem, but will demonstrate the issue:
Create a EC2NodeClass and NodePool with c6g.medium Bottlerocket instances
Trigger scale-up of this NodePool
Compare the capacity and allocatable values of the NodeClaim vs the Node, noting that the NodeClaim has larger memory capacity/allocatable values than the Node object
Description
Observed Behavior:
Karpenter is overestimating the memory capacity of certain node types. When this happens, pods with a certain range of memory requests can trigger Karpenter scale-ups of nodes with insufficient memory for that pending pod to be scheduled. Observing that the pending pod isn't getting scheduled on the newly started node, Karpenter repeatedly attempts to scale up similar nodes with the same result.
In addition to preventing pods from scheduling, this issue has caused us to incur additional costs from third-party integrations that charge by node count, as the repeated erroneous scale ups impact node count metrics used in billing.
In our case, we noticed this with
c6g.medium
instances running Bottlerocket (with AMI provided by AWS without modification). It's possible that Karpenter underestimates capacity of other instance types and distributions as well, but we have not confirmed this independently. We've also not yet compared the capacity values of c6g nodes running AL2 vs Bottlerocket.Expected Behavior:
We are aware that this risk is called out in the troubleshooting guide:
But I think the default should be suitable for all of the AWS-supported non-custom AMI families across instance types and sizes. If this isn't feasible, then perhaps this value should not be a global setting, and should vary by AMI family and instance type/size.
Reproduction Steps (Please include YAML):
The following steps do not reproduce the problem, but will demonstrate the issue:
c6g.medium
Bottlerocket instancesNodeClaim
vs theNode
, noting that theNodeClaim
has larger memory capacity/allocatable values than theNode
objectExample from our case:
NodeClaim:
Node:
Note that 1879040Ki [1835Mi] (NodeClaim) > 1872664Ki (Node).
The default value of
VM_MEMORY_OVERHEAD_PERCENT
(0.075) is in use for this example.Versions:
kubectl version
): 1.28, 1.30The text was updated successfully, but these errors were encountered: