vgpu cannot perform high-priority preemption scheduling #3186

AshinWu · 2023-11-09T13:17:31Z

What happened:

Use the latest version of Volcano vGPU, hoping that high-priority tasks can preempt low-priority tasks.

Node capacity information:

status:
  capacity:
    volcano.sh/vgpu-number: '2'

volcano-scheduler.conf (configmap)：

actions: "reclaim, allocate, backfill, preempt"
tiers:
- plugins:
  - name: priority
- plugins:
  - name: gang
    enableJobOrder: false
    enablePreemptable: false
    enableJobStarving: false
  - name: predicates
    arguments:
      predicate.GPUSharingEnable: true # enable GPU sharing
  - name: proportion
  - name: nodeorder
  - name: binpack

priorityClass:

apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: high-priority
value: 1000000
globalDefault: false
description: "high priority"

2 low priority tasks:

apiVersion: batch.volcano.sh/v1alpha1
kind: Job
metadata:
  name: job-low1
spec:
  minAvailable: 1
  schedulerName: volcano
  policies:
    - event: PodEvicted
      action: RestartJob
  tasks:
    - replicas: 1
      name: testjob
      policies:
      - event: TaskCompleted
        action: CompleteJob
      template:
        metadata:
          annotations: 
            volcano.sh/preemptable: "true"
        spec:
          containers:
            - command:
              - sleep
              - 8m
              name: cuda-container
              image: nvidia/cuda:10.1-base-ubuntu18.04
              resources:
                limits:
                  volcano.sh/vgpu-number: 1
---
apiVersion: batch.volcano.sh/v1alpha1
kind: Job
metadata:
  name: job-low2
spec:
  minAvailable: 1
  schedulerName: volcano
  policies:
    - event: PodEvicted
      action: RestartJob
  tasks:
    - replicas: 1
      name: testjob
      policies:
      - event: TaskCompleted
        action: CompleteJob
      template:
        metadata:
          annotations: 
            volcano.sh/preemptable: "true"
        spec:
          containers:
            - command:
              - sleep
              - 10m
              name: cuda-container
              image: nvidia/cuda:10.1-base-ubuntu18.04
              resources:
                limits:
                  volcano.sh/vgpu-number: 1

1 High priority task:

apiVersion: batch.volcano.sh/v1alpha1
kind: Job
metadata:
  name: job-high
spec:
  minAvailable: 1
  schedulerName: volcano
  priorityClassName: high-priority
  policies:
    - event: PodEvicted
      action: RestartJob
  tasks:
    - replicas: 1
      name: testjob
      policies:
      - event: TaskCompleted
        action: CompleteJob
      template:
        spec:
          containers:
            - command:
              - sleep
              - 2m
              name: cuda-container
              image: nvidia/cuda:10.1-base-ubuntu18.04
              resources:
                limits:
                  volcano.sh/vgpu-number: 1

What you expected to happen:

Two low-priority tasks have already occupied the entire node's vGPU resources (2 vGPUs). Now, creating a high-priority task that uses 1 vGPU should evict one of the low-priority tasks to allow the high-priority task to run, and the evicted low-priority task will be in a padding state. For example:

---begin---
NAME     STATUS
job-low1 Running
Job-low2 Running

---wait---
NAME     STATUS
job-high Running
job-low1 Running
Job-low2 Padding

---wait---
NAME     STATUS
job-high Completed
job-low1 Running
Job-low2 Running

---end--- 
NAME STATUS 
job-high Completed 
job-low1 Completed
Job-low2 Completed

Is it my configuration error or a bug?

How to reproduce it (as minimally and precisely as possible):

If using CPU or memory, it can trigger priority preemptive scheduling.
Whether using gpushare or vgpu, priority preemption scheduling cannot be performed. The most obvious observation from checking the schedule log is:

I1109 12:18:49.214378 1 preempt.go:43] Enter Preempt ...
I1109 12:18:49.214390 1 job_info.go:728] job job-high-14881f23-c9a4-44b9-a3cf-46e130a51b99/default actual: map[], ji.TaskMinAvailable: map[nginx:1]
I1109 12:18:49.214407 1 preempt.go:58] Job <default/job-high-14881f23-c9a4-44b9-a3cf-46e130a51b99> Queue skip preemption, reason: NotEnoughPodsOfTask, message Not enough valid pods of each task for gang-scheduling
I1109 12:18:49.214463 1 job_info.go:728] job job-low2-1d2e78fa-028d-475e-9ffc-5598d837d80b/default actual: map[nginx:1], ji.TaskMinAvailable: map[testjob:1]
I1109 12:18:49.214488 1 job_info.go:728] job job-low1-b024ff24-37f1-489b-8956-93e78c46a70c/default actual: map[nginx:1], ji.TaskMinAvailable: map[testjob:1]
I1109 12:18:49.214509 1 preempt.go:194] No Preemptors in Queue , break
I1109 12:18:49.214522 1 statement.go:378] Committing operations
I1109 12:18:49.214536 1 preempt.go:194] Leaving Preempt ...

Anything else we need to know?:

Similar problems:
#2547
#2916
...

Environment:

Volcano Version: 1.8.1
Kubernetes version (use kubectl version): 1.19
OS (e.g. from /etc/os-release): Ubuntu 18.04.6 LTS (image)
Kernel (e.g. uname -a): 5.4.0-150-generic

The text was updated successfully, but these errors were encountered:

Monokaix · 2023-11-10T06:20:51Z

Hi, please try to modify volcano-scheduler.conf's actions field to "allocate, preempt, backfill" to see the result.

AshinWu · 2023-11-10T07:41:41Z

Hi, please try to modify volcano-scheduler.conf's actions field to "allocate, preempt, backfill" to see the result.

@Monokaix Thank you for your reply.
According to your prompt, it still doesn't work, and the high-priority job is always in Padding state.
The log is printed as follows:

E1110 07:30:56.874510 1 device_info.go:187] deviceSharing err= not enough gpu fitted on this node
I1110 07:30:56.874524 1 predicate_helper.go:75] Predicates failed for task <default/job-testjob-nginx-0> on node : task default/job-high-testjob-0 on node node-gpu fit failed: not enough gpu fitted on this node
I1110 07:30:56.874588 1 preempt.go:108] No preemptor task in job <default/job-high-cb92f563-74f3-4912-bacd-fe230e57915a>.
I1110 07:30:56.874605 1 statement.go:352] Discarding operations ...
I1110 07:30:56.874629 1 predicates.go:384] pod(default/job--high-testjob-0) affinity require information is nil, plugin InterPodAffinity is skipped
I1110 07:30:56.874676 1 statement.go:378] Committing operations ...
I1110 07:30:56.874683 1 statement.go:378] Committing operations ...
I1110 07:30:56.885269 1 cache.go:262] Updating pod condition for default/job-high-testjob-0 to (PodScheduled==False)
I1110 07:30:56.930763 1 session.go:240] Close Session

AshinWu · 2023-11-11T06:55:16Z

It seems to be the same reason mentioned in #2916, but I noticed that the issue has been fixed and submitted in version 1.8.0.
Because vgpu sharing is enabled, in the device_info.go , the gpu resource check fails in the FilterNode() function, resulting in an "not enough gpu fitted on this node" exception being thrown, which prevents the correct calculation of the nodes in the predicateNodes phase, thus preventing the execution of the logic for preemption scheduling?

Could you please provide me with some solutions or ideas? @wangyang0616 @william-wang

william-wang · 2023-11-15T03:54:55Z

@archlitchi Have you encountered the same issue in your env?

Monokaix · 2023-11-17T08:36:31Z

Hi, please try to modify volcano-scheduler.conf's actions field to "allocate, preempt, backfill" to see the result.

@Monokaix Thank you for your reply. According to your prompt, it still doesn't work, and the high-priority job is always in Padding state. The log is printed as follows:

E1110 07:30:56.874510 1 device_info.go:187] deviceSharing err= not enough gpu fitted on this node
I1110 07:30:56.874524 1 predicate_helper.go:75] Predicates failed for task <default/job-testjob-nginx-0> on node : task default/job-high-testjob-0 on node node-gpu fit failed: not enough gpu fitted on this node
I1110 07:30:56.874588 1 preempt.go:108] No preemptor task in job <default/job-high-cb92f563-74f3-4912-bacd-fe230e57915a>.
I1110 07:30:56.874605 1 statement.go:352] Discarding operations ...
I1110 07:30:56.874629 1 predicates.go:384] pod(default/job--high-testjob-0) affinity require information is nil, plugin InterPodAffinity is skipped
I1110 07:30:56.874676 1 statement.go:378] Committing operations ...
I1110 07:30:56.874683 1 statement.go:378] Committing operations ...
I1110 07:30:56.885269 1 cache.go:262] Updating pod condition for default/job-high-testjob-0 to (PodScheduled==False)
I1110 07:30:56.930763 1 session.go:240] Close Session

Whether can preempt happen using cpu/memory after modify schedule config?

AshinWu · 2023-11-17T09:00:36Z

Whether can preempt happen using cpu/memory after modify schedule config?

That's it. For CPU/memory, preemption works normally, but for vgpu/gpu-share resources, preemption does not work and high-priority tasks are always in Padding state.

Monokaix · 2024-01-19T03:24:44Z

What's the node's allocatable status?

Monokaix · 2024-05-14T06:47:54Z

#3450 and #3458 can solve this, you can try it using the latest version: )

Monokaix · 2024-05-20T12:29:42Z

/close

volcano-sh-bot · 2024-05-20T12:29:46Z

@Monokaix: Closing this issue.

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

AshinWu added the kind/bug Categorizes issue or PR as related to a bug. label Nov 9, 2023

Monokaix mentioned this issue Mar 19, 2024

with preempt or reclaim plugin, the high priority pod can not be placed at some node which meet the conditions for preemption #3329

Closed

volcano-sh-bot closed this as completed May 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vgpu cannot perform high-priority preemption scheduling #3186

vgpu cannot perform high-priority preemption scheduling #3186

AshinWu commented Nov 9, 2023 •

edited

Loading

Monokaix commented Nov 10, 2023 •

edited

Loading

AshinWu commented Nov 10, 2023 •

edited

Loading

AshinWu commented Nov 11, 2023

william-wang commented Nov 15, 2023

Monokaix commented Nov 17, 2023

AshinWu commented Nov 17, 2023

Monokaix commented Jan 19, 2024

Monokaix commented May 14, 2024

Monokaix commented May 20, 2024

volcano-sh-bot commented May 20, 2024

vgpu cannot perform high-priority preemption scheduling #3186

vgpu cannot perform high-priority preemption scheduling #3186

Comments

AshinWu commented Nov 9, 2023 • edited Loading

Monokaix commented Nov 10, 2023 • edited Loading

AshinWu commented Nov 10, 2023 • edited Loading

AshinWu commented Nov 11, 2023

william-wang commented Nov 15, 2023

Monokaix commented Nov 17, 2023

AshinWu commented Nov 17, 2023

Monokaix commented Jan 19, 2024

Monokaix commented May 14, 2024

Monokaix commented May 20, 2024

volcano-sh-bot commented May 20, 2024

AshinWu commented Nov 9, 2023 •

edited

Loading

Monokaix commented Nov 10, 2023 •

edited

Loading

AshinWu commented Nov 10, 2023 •

edited

Loading