You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
(sorry if this isn't AWS-specific - i'm not familiar with the internals but am happy to repost to kubernetes-sigs/karpenter if that would be more useful)
Observed Behavior:
If karpenter is limited to provisioning nodes with a certain number of GPUs (e.g. 2, 4 or 8), it will refuse to create a node for a pod which requests any other number (e.g. 1, 3, 5, 6 or 7).
Note that, when a node already exists, scheduling a pod onto a node with more gpus than required works as expected.
Expected Behavior:
I would expect karpenter to over-provision gpus when necessary, in the same way as cpu and memory.
Reproduction Steps (Please include YAML):
Follow the karpenter 'getting started' guide on EKS and install the nvidia device plugin.
the following will succeed once the previously created pod has finished, if run before the node is cleaned up (because it fits on the pre-existing node with room to spare). If you run this at a time which would require creation of a new node, it will fail:
Warning FailedScheduling 18s karpenter Failed to schedule pod, incompatible with nodepool "gpu", daemonset overhead={"cpu": │
│ "280m","memory":"130Mi","pods":"6"}, no instance type satisfied resources {"cpu":"280m","memory":"130Mi","nvidia.com/gpu":"5","pods":"7"} │
│ and requirements karpenter.k8s.aws/instance-category In [g], karpenter.k8s.aws/instance-generation In [4 5], karpenter.sh/capacity-type In │
│ [on-demand], karpenter.sh/nodepool In [gpu], kubernetes.io/arch In [amd64], kubernetes.io/os In [linux] (no instance type which had enoug │
│ h resources and the required offering met the scheduling requirements); incompatible with nodepool "default", daemonset overhead={"cpu":"2 │
│ 80m","memory":"130Mi","pods":"6"}, no instance type satisfied resources {"cpu":"280m","memory":"130Mi","nvidia.com/gpu":"5","pods":"7"} an │
│ d requirements karpenter.k8s.aws/instance-category In [c m r], karpenter.k8s.aws/instance-generation Exists >2, karpenter.sh/capacity-type │
│ In [on-demand], karpenter.sh/nodepool In [default], kubernetes.io/arch In [amd64], kubernetes.io/os In [linux] (no instance type which ha │
│ d enough resources and the required offering met the scheduling requirements) │
│
Versions:
Chart Version: 1.0.1
Kubernetes Version (kubectl version):
Client Version: v1.31.1
Kustomize Version: v5.4.2
Server Version: v1.30.4-eks-a737599
Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment
The text was updated successfully, but these errors were encountered:
Description
(sorry if this isn't AWS-specific - i'm not familiar with the internals but am happy to repost to kubernetes-sigs/karpenter if that would be more useful)
Observed Behavior:
If karpenter is limited to provisioning nodes with a certain number of GPUs (e.g. 2, 4 or 8), it will refuse to create a node for a pod which requests any other number (e.g. 1, 3, 5, 6 or 7).
Note that, when a node already exists, scheduling a pod onto a node with more gpus than required works as expected.
Expected Behavior:
I would expect karpenter to over-provision gpus when necessary, in the same way as cpu and memory.
Reproduction Steps (Please include YAML):
Follow the karpenter 'getting started' guide on EKS and install the nvidia device plugin.
Use the following node pool and class
Provisioning the following pod will work and cause a 4 gpu node to be created
the following will succeed once the previously created pod has finished, if run before the node is cleaned up (because it fits on the pre-existing node with room to spare). If you run this at a time which would require creation of a new node, it will fail:
the following will fail, because it doesn't fit on the previously provisioned node, and karpenter won't provision a node with 8 gpus
with the following error
Versions:
1.0.1
kubectl version
):The text was updated successfully, but these errors were encountered: