Karpenter Won't Over-Provision GPUs #7038

tcatling · 2024-09-18T12:36:46Z

Description

(sorry if this isn't AWS-specific - i'm not familiar with the internals but am happy to repost to kubernetes-sigs/karpenter if that would be more useful)

Observed Behavior:

If karpenter is limited to provisioning nodes with a certain number of GPUs (e.g. 2, 4 or 8), it will refuse to create a node for a pod which requests any other number (e.g. 1, 3, 5, 6 or 7).

Note that, when a node already exists, scheduling a pod onto a node with more gpus than required works as expected.

Expected Behavior:

I would expect karpenter to over-provision gpus when necessary, in the same way as cpu and memory.

Reproduction Steps (Please include YAML):

Follow the karpenter 'getting started' guide on EKS and install the nvidia device plugin.

Use the following node pool and class

---
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: gpu
  namespace: karpenter
spec:
  template:
    spec:
      startupTaints:
        - key: node.cilium.io/agent-not-ready
          value: "true"
          effect: NoExecute
      requirements:
        - key: kubernetes.io/arch
          operator: In
          values: ["amd64"]
        - key: kubernetes.io/os
          operator: In
          values: ["linux"]
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["on-demand"]
        - key: karpenter.k8s.aws/instance-category
          operator: In
          values: ["g"]
        - key: karpenter.k8s.aws/instance-generation
          operator: In
          values: ["4", "5"]
      nodeClassRef:
        group: karpenter.k8s.aws
        kind: EC2NodeClass
        name: gpu
      expireAfter: 720h # 30 * 24h = 720h
  limits:
    cpu: 1000
  disruption:
    consolidationPolicy: WhenEmptyOrUnderutilized
    consolidateAfter: 1m
---
apiVersion: karpenter.k8s.aws/v1
kind: EC2NodeClass
metadata:
  name: gpu
  namespace: karpenter
spec:
  tags:
    Name: "{{ .Values.clusterName }}-gpu-karpenter"
  amiFamily: AL2 # Amazon Linux 2
  role: "KarpenterNodeRole-{{ .Values.clusterName }}" # replace with your cluster name
  subnetSelectorTerms:
    - tags:
        karpenter.sh/discovery:  "{{ .Values.clusterName }}" # replace with your cluster name
  securityGroupSelectorTerms:
    - tags:
        karpenter.sh/discovery: "{{ .Values.clusterName }}" # replace with your cluster name
  amiSelectorTerms:
    # ARM_AMI_ID="$(aws ssm get-parameter --name /aws/service/eks/optimized-ami/${K8S_VERSION}/amazon-linux-2-arm64/recommended/image_id --query Parameter.Value --output text)"
    # AMD_AMI_ID="$(aws ssm get-parameter --name /aws/service/eks/optimized-ami/${K8S_VERSION}/amazon-linux-2/recommended/image_id --query Parameter.Value --output text)"
    # GPU_AMI_ID="$(aws ssm get-parameter --name /aws/service/eks/optimized-ami/${K8S_VERSION}/amazon-linux-2-gpu/recommended/image_id --query Parameter.Value --output text)"
    - id: ami-0af7fb740c9da69b3 # GPU Amazon Linux 2 18/09/2024

Provisioning the following pod will work and cause a 4 gpu node to be created

apiVersion: v1
kind: Pod
metadata:
  name: gpu-pod
spec:
  restartPolicy: Never
  containers:
    - name: cuda-container
      image: "nvidia/cuda:11.0.3-base-ubuntu20.04"
      command: ["nvidia-smi"]
      resources:
        requests:
          nvidia.com/gpu: 4
        limits:
          nvidia.com/gpu: 4
  tolerations:
  - key: nvidia.com/gpu
    operator: Exists
    effect: NoSchedule

the following will succeed once the previously created pod has finished, if run before the node is cleaned up (because it fits on the pre-existing node with room to spare). If you run this at a time which would require creation of a new node, it will fail:

apiVersion: v1
kind: Pod
metadata:
  name: gpu-pod
spec:
  restartPolicy: Never
  containers:
    - name: cuda-container
      image: "nvidia/cuda:11.0.3-base-ubuntu20.04"
      command: ["nvidia-smi"]
      resources:
        requests:
          nvidia.com/gpu: 3
        limits:
          nvidia.com/gpu: 3
  tolerations:
  - key: nvidia.com/gpu
    operator: Exists
    effect: NoSchedule

the following will fail, because it doesn't fit on the previously provisioned node, and karpenter won't provision a node with 8 gpus

apiVersion: v1
kind: Pod
metadata:
  name: gpu-pod3
spec:
  restartPolicy: Never
  containers:
    - name: cuda-container
      image: "nvidia/cuda:11.0.3-base-ubuntu20.04"
      command: ["nvidia-smi"]
      resources:
        requests:
          nvidia.com/gpu: 5
        limits:
          nvidia.com/gpu: 5
  tolerations:
  - key: nvidia.com/gpu
    operator: Exists
    effect: NoSchedule

with the following error

   Warning  FailedScheduling  18s   karpenter          Failed to schedule pod, incompatible with nodepool "gpu", daemonset overhead={"cpu": │
│ "280m","memory":"130Mi","pods":"6"}, no instance type satisfied resources {"cpu":"280m","memory":"130Mi","nvidia.com/gpu":"5","pods":"7"}  │
│ and requirements karpenter.k8s.aws/instance-category In [g], karpenter.k8s.aws/instance-generation In [4 5], karpenter.sh/capacity-type In │
│  [on-demand], karpenter.sh/nodepool In [gpu], kubernetes.io/arch In [amd64], kubernetes.io/os In [linux] (no instance type which had enoug │
│ h resources and the required offering met the scheduling requirements); incompatible with nodepool "default", daemonset overhead={"cpu":"2 │
│ 80m","memory":"130Mi","pods":"6"}, no instance type satisfied resources {"cpu":"280m","memory":"130Mi","nvidia.com/gpu":"5","pods":"7"} an │
│ d requirements karpenter.k8s.aws/instance-category In [c m r], karpenter.k8s.aws/instance-generation Exists >2, karpenter.sh/capacity-type │
│  In [on-demand], karpenter.sh/nodepool In [default], kubernetes.io/arch In [amd64], kubernetes.io/os In [linux] (no instance type which ha │
│ d enough resources and the required offering met the scheduling requirements)                                                              │
│

Versions:

Chart Version: 1.0.1
Kubernetes Version (kubectl version):

Client Version: v1.31.1
Kustomize Version: v5.4.2
Server Version: v1.30.4-eks-a737599

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment

The text was updated successfully, but these errors were encountered:

tcatling added bug Something isn't working needs-triage Issues that need to be triaged labels Sep 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Karpenter Won't Over-Provision GPUs #7038

Karpenter Won't Over-Provision GPUs #7038

tcatling commented Sep 18, 2024

Karpenter Won't Over-Provision GPUs #7038

Karpenter Won't Over-Provision GPUs #7038

Comments

tcatling commented Sep 18, 2024

Description