Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding podAntiAffinity #15578

Merged
merged 1 commit into from
Oct 8, 2024
Merged

Conversation

jdowni000
Copy link
Contributor

@jdowni000 jdowni000 commented Oct 8, 2024

SUMMARY

By default, due to low default resource requests, automation job pods are frequently scheduled on the same OCP worker nodes even when there are other nodes available. One way we can influence this behavior without blocking job pods from being scheduled to prioritize being as far away from other jobs as is possible is to add pod-anti-affinity rules

We saw this issue on a very large OCP cluster with 20 or more worker nodes available, but all jobs were ending up on only 5 worker nodes. CPU utilization on those nodes was 100% and problems with delays in job events arriving to the control plan were encountered, and in general the jobs ran slower than expected.

Steps to Reproduce

Run lots of jobs on a container group with relatively low CPU/memory requests (the default) and many large worker nodes

Actual Behavior

Jobs clump together, are not automatically spread across worker nodes

Expected Behavior

Jobs spread out across available worker nodes to evenly distribute workload

With these changes, for users on small clusters, they will likely not notice much change, but for users with larger number of worker nodes, this will help make sure pods are further spread out and better utilize their resources when running in our default mode of low resource requests and no limits. Notice I used preferredDuringSchedulingIgnoredDuringExecution which means if there is no other host to schedule on, then it gets scheduled anyway if otherwise schedulable. So won't have detrimental impact on small clusters.

ISSUE TYPE
  • Bug, Docs Fix or other nominal change
COMPONENT NAME

AWX default container pod spec

AWX VERSION

devel

ADDITIONAL INFORMATION

I was able to test this locally using https://github.com/ansible/awx/blob/devel/docs/development/kind.md to setup a local kind cluster with a couple worker nodes. Describing the pod output in yaml, I was able to verify the affinity rules being set.

kind: Pod
metadata:
  creationTimestamp: "2024-10-08T18:14:08Z"
  generateName: automation-job-33-
  labels:
    ansible-awx: 35448f85-eadd-4e7c-947e-bfe17a906a12
    ansible-awx-job-id: "33"
    ansible_job: ""
  name: automation-job-33-nlhdm
  namespace: awx
  resourceVersion: "177907"
  uid: 60685262-4c89-4651-bdca-98891aaf59b7
spec:
  affinity:
    podAntiAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - podAffinityTerm:
          labelSelector:
            matchExpressions:
            - key: ansible_job
              operator: Exists
          topologyKey: kubernetes.io/hostname
        weight: 100
  automountServiceAccountToken: false
  containers:
  - args:
    - ansible-runner
    - worker
    - --private-data-dir=/runner
    image: quay.io/ansible/awx-ee:latest
    imagePullPolicy: Always
    name: worker
    resources:
      requests:
        cpu: 250m
        memory: 100Mi
    stdin: true
    stdinOnce: true
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  nodeName: kind-worker2
  preemptionPolicy: PreemptLowerPriority
  priority: 0
  restartPolicy: Never
  schedulerName: default-scheduler
  securityContext: {}
  serviceAccount: default
  serviceAccountName: default
  terminationGracePeriodSeconds: 30
  tolerations:
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2024-10-08T18:14:09Z"
    status: "True"
    type: PodReadyToStartContainers
  - lastProbeTime: null
    lastTransitionTime: "2024-10-08T18:14:08Z"
    status: "True"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: "2024-10-08T18:14:09Z"
    status: "True"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: "2024-10-08T18:14:09Z"
    status: "True"
    type: ContainersReady
  - lastProbeTime: null
    lastTransitionTime: "2024-10-08T18:14:08Z"
    status: "True"
    type: PodScheduled
  containerStatuses:
  - containerID: containerd://364075ca228b6865b1f36c448f96cf73fed40f9b768251bf51ab0f512dda17f9
    image: quay.io/ansible/awx-ee:latest
    imageID: quay.io/ansible/awx-ee@sha256:59e8f1b6624207b2b4d17f3e9bded64232adfd08798377424af0fad6371a76ef
    lastState: {}
    name: worker
    ready: true
    restartCount: 0
    started: true
    state:
      running:
        startedAt: "2024-10-08T18:14:09Z"
  hostIP: 172.18.0.4
  hostIPs:
  - ip: 172.18.0.4
  phase: Running
  podIP: 10.244.2.21
  podIPs:
  - ip: 10.244.2.21
  qosClass: Burstable
  startTime: "2024-10-08T18:14:08Z"```

Copy link

sonarcloud bot commented Oct 8, 2024

@TheRealHaoLiu TheRealHaoLiu merged commit 825a02c into ansible:devel Oct 8, 2024
25 of 27 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants