Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Karpenter not consolidating to fit smaller instance #7061

Open
pincher95 opened this issue Sep 24, 2024 · 0 comments
Open

Karpenter not consolidating to fit smaller instance #7061

pincher95 opened this issue Sep 24, 2024 · 0 comments
Labels
bug Something isn't working needs-triage Issues that need to be triaged

Comments

@pincher95
Copy link
Contributor

pincher95 commented Sep 24, 2024

Description

Observed Behavior:
We tighten pods request to fit node allocatable CPU and memory, in this case karpenter nodepool constrained to r6g instance family and large, xlarge, 2xlarge, 4xlarge and 8xlarge instance sizes.
We expected karpenter will provision r6g.4xlarge instead r6g.8xlarge was provisioned.
No error absolved in logs.

Name:               ip-x-x-x-x.ec2.internal
Roles:              <none>
Labels:             beta.kubernetes.io/arch=arm64
                    beta.kubernetes.io/instance-type=r6g.8xlarge
                    beta.kubernetes.io/os=linux
                    failure-domain.beta.kubernetes.io/region=us-east-1
                    failure-domain.beta.kubernetes.io/zone=us-east-1a
                    k8s.io/cloud-provider-aws=fcf6c9b04a44c405d1956e77be017051
                    karpenter.k8s.aws/instance-category=r
                    karpenter.k8s.aws/instance-cpu=32
                    karpenter.k8s.aws/instance-cpu-manufacturer=aws
                    karpenter.k8s.aws/instance-ebs-bandwidth=9500
                    karpenter.k8s.aws/instance-encryption-in-transit-supported=false
                    karpenter.k8s.aws/instance-family=r6g
                    karpenter.k8s.aws/instance-generation=6
                    karpenter.k8s.aws/instance-hypervisor=nitro
                    karpenter.k8s.aws/instance-memory=262144
                    karpenter.k8s.aws/instance-network-bandwidth=12000
                    karpenter.k8s.aws/instance-size=8xlarge
                    karpenter.sh/capacity-type=on-demand
                    karpenter.sh/initialized=true
                    karpenter.sh/nodepool=k8s-staging-us-east-1
                    karpenter.sh/registered=true
                    kubernetes.io/arch=arm64
                    kubernetes.io/hostname=ip-x-x-x-x.ec2.internal
                    kubernetes.io/os=linux
                    node.kubernetes.io/instance-type=r6g.8xlarge
                    topology.ebs.csi.aws.com/zone=us-east-1a
                    topology.k8s.aws/zone-id=use1-az6
                    topology.kubernetes.io/region=us-east-1
                    topology.kubernetes.io/zone=us-east-1a
Annotations:        alpha.kubernetes.io/provided-node-ip: x.x.x.x
                    compatibility.karpenter.k8s.aws/kubelet-drift-hash: 15379597991425564585
                    csi.volume.kubernetes.io/nodeid: {"ebs.csi.aws.com":"i-someId","efs.csi.aws.com":"i-someId"}
                    karpenter.k8s.aws/ec2nodeclass-hash: 16460481617587100572
                    karpenter.k8s.aws/ec2nodeclass-hash-version: v3
                    karpenter.sh/nodepool-hash: 7616082281600894742
                    karpenter.sh/nodepool-hash-version: v3
                    node.alpha.kubernetes.io/ttl: 0
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Tue, 24 Sep 2024 07:53:01 +0300
Taints:             <none>
Unschedulable:      false
Lease:
  HolderIdentity:  ip-x-x-x-x.ec2.internal
  AcquireTime:     <unset>
  RenewTime:       Tue, 24 Sep 2024 10:45:17 +0300
Conditions:
  Type             Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----             ------  -----------------                 ------------------                ------                       -------
  MemoryPressure   False   Tue, 24 Sep 2024 10:42:33 +0300   Tue, 24 Sep 2024 07:53:00 +0300   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure     False   Tue, 24 Sep 2024 10:42:33 +0300   Tue, 24 Sep 2024 07:53:00 +0300   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure      False   Tue, 24 Sep 2024 10:42:33 +0300   Tue, 24 Sep 2024 07:53:00 +0300   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready            True    Tue, 24 Sep 2024 10:42:33 +0300   Tue, 24 Sep 2024 07:53:13 +0300   KubeletReady                 kubelet is posting ready status
Addresses:
  InternalIP:   x.x.x.x
  InternalDNS:  ip-x-x-x-x.ec2.internal
  Hostname:     ip-x-x-x-x.ec2.internal
Capacity:
  cpu:                32
  ephemeral-storage:  20894700Ki
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  hugepages-32Mi:     0
  hugepages-64Ki:     0
  memory:             260794688Ki
  pods:               234
Allocatable:
  cpu:                31850m
  ephemeral-storage:  18182813665
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  hugepages-32Mi:     0
  hugepages-64Ki:     0
  memory:             257795392Ki
  pods:               234
System Info:
  Machine ID:                 ec2ec83a1f8946933cf568664375b8b5
  System UUID:                ec2ec83a-1f89-4693-3cf5-68664375b8b5
  Boot ID:                    19e4aca9-e568-435c-a875-ff67afa01017
  Kernel Version:             6.1.109-118.189.amzn2023.aarch64
  OS Image:                   Amazon Linux 2023.5.20240916
  Operating System:           linux
  Architecture:               arm64
  Container Runtime Version:  containerd://1.7.11
  Kubelet Version:            v1.28.13-eks-a737599
  Kube-Proxy Version:         v1.28.13-eks-a737599
ProviderID:                   aws:///us-east-1a/i-someId
Non-terminated Pods:          (8 in total)
  Namespace                   Name                                                           CPU Requests  CPU Limits  Memory Requests  Memory Limits  Age
  ---------                   ----                                                           ------------  ----------  ---------------  -------------  ---
  default                     efs-csi-node-7bjpz                                             0 (0%)        0 (0%)      0 (0%)           0 (0%)         172m
  kube-system                 aws-node-5b9nw                                                 50m (0%)      0 (0%)      0 (0%)           0 (0%)         172m
  kube-system                 ebs-csi-node-rljss                                             30m (0%)      0 (0%)      120Mi (0%)       768Mi (0%)     172m
  kube-system                 kube-proxy-k58sb                                               100m (0%)     0 (0%)      0 (0%)           0 (0%)         172m
  monitoring                  node-exporter-zcj4m                                            10m (0%)      0 (0%)      50M (0%)         50M (0%)       172m
  monitoring                  kubelet-summary-exporter-vfv9z                                 20m (0%)      20m (0%)    50Mi (0%)        50Mi (0%)      172m
  monitoring                  logs-collector-lhs8d                                           50m (0%)      0 (0%)      128Mi (0%)       128Mi (0%)     172m
  staging                     thanos-store-0-1                                               14 (43%)      14 (43%)    117Gi (47%)      117Gi (47%)    174m
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests            Limits
  --------           --------            ------
  cpu                14260m (44%)        14020m (44%)
  memory             125990269056 (47%)  126669746304 (47%)
  ephemeral-storage  0 (0%)              0 (0%)
  hugepages-1Gi      0 (0%)              0 (0%)
  hugepages-2Mi      0 (0%)              0 (0%)
  hugepages-32Mi     0 (0%)              0 (0%)
  hugepages-64Ki     0 (0%)              0 (0%)
Events:
  Type    Reason            Age                   From       Message
  ----    ------            ----                  ----       -------
  Normal  Unconsolidatable  8m43s (x8 over 116m)  karpenter  Can't replace with a cheaper node

Expected Behavior:
Karpenter to provision r6g.4xlarge

Reproduction Steps (Please include YAML):

Nodepool:

apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: k8s-staging-us-east-1
spec:
  disruption:
    budgets:
    - nodes: 0%
      reasons:
      - Drifted
    - nodes: 60%
      reasons:
      - Empty
      - Underutilized
    consolidateAfter: 0s
    consolidationPolicy: WhenEmptyOrUnderutilized
  limits:
    cpu: 1500
    memory: 1500Gi
  template:
    metadata:
    spec:
      expireAfter: 720h
      nodeClassRef:
        group: karpenter.k8s.aws
        kind: EC2NodeClass
        name: k8s-staging-us-east-1
      requirements:
      - key: kubernetes.io/os
        operator: In
        values:
        - linux
      - key: karpenter.sh/capacity-type
        operator: In
        values:
        - on-demand
      - key: karpenter.k8s.aws/instance-family
        operator: In
        values:
        - r6g
      - key: karpenter.k8s.aws/instance-size
        operator: NotIn
        values:
        - nano
        - micro
        - small
        - medium
        - 9xlarge
        - 12xlarge
        - 16xlarge
        - 18xlarge
        - 24xlarge
        - 32xlarge
        - 48xlarge
        - metal
      - key: topology.kubernetes.io/zone
        operator: In
        values:
        - us-east-1a
      - key: kubernetes.io/arch
        operator: In
        values:
        - arm64
      taints:

EC2NodeClass:

apiVersion: karpenter.k8s.aws/v1
kind: EC2NodeClass
metadata:
  finalizers:
  - karpenter.k8s.aws/termination
  name: k8s-staging-us-east-1
spec:
  amiFamily: AL2023
  amiSelectorTerms:
  - alias: al2023@latest
  blockDeviceMappings:
  - deviceName: /dev/xvda
    ebs:
      deleteOnTermination: true
      encrypted: true
      iops: 3000
      throughput: 150
      volumeSize: 20Gi
      volumeType: gp3
  metadataOptions:
    httpEndpoint: enabled
    httpProtocolIPv6: disabled
    httpPutResponseHopLimit: 2
    httpTokens: required
  role: some-role
  securityGroupSelectorTerms:
  - id: some-sg
  subnetSelectorTerms:
  - id: some-subnet
  tags:
    cluster-name: k8s-staging-us-east-1
  userData: |
    MIME-Version: 1.0
    Content-Type: multipart/mixed; boundary="BOUNDARY"

    --BOUNDARY
    Content-Type: application/node.eks.aws

    apiVersion: node.eks.aws/v1alpha1
    kind: NodeConfig
    spec:
      kubelet:
        config:
          shutdownGracePeriod: 30s
          featureGates:
            DisableKubeletCloudCredentialProviders: true

    --BOUNDARY--

Versions:

  • Chart Version: v1.0.2
  • Kubernetes Version (kubectl version): v1.28.12
  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment
@pincher95 pincher95 added bug Something isn't working needs-triage Issues that need to be triaged labels Sep 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs-triage Issues that need to be triaged
Projects
None yet
Development

No branches or pull requests

1 participant