-
Notifications
You must be signed in to change notification settings - Fork 970
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
vgpu cannot perform high-priority preemption scheduling #3186
Comments
Hi, please try to modify volcano-scheduler.conf's actions field to "allocate, preempt, backfill" to see the result. |
@Monokaix Thank you for your reply.
|
It seems to be the same reason mentioned in #2916, but I noticed that the issue has been fixed and submitted in version 1.8.0. Could you please provide me with some solutions or ideas? @wangyang0616 @william-wang |
@archlitchi Have you encountered the same issue in your env? |
Whether can preempt happen using cpu/memory after modify schedule config? |
That's it. For CPU/memory, preemption works normally, but for vgpu/gpu-share resources, preemption does not work and high-priority tasks are always in |
What's the node's allocatable status? |
/close |
@Monokaix: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
What happened:
Use the latest version of Volcano vGPU, hoping that high-priority tasks can preempt low-priority tasks.
Node capacity information:
volcano-scheduler.conf (configmap):
priorityClass:
2 low priority tasks:
1 High priority task:
What you expected to happen:
Two low-priority tasks have already occupied the entire node's vGPU resources (2 vGPUs). Now, creating a high-priority task that uses 1 vGPU should evict one of the low-priority tasks to allow the high-priority task to run, and the evicted low-priority task will be in a padding state. For example:
Is it my configuration error or a bug?
How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?:
Similar problems:
#2547
#2916
...
Environment:
kubectl version
): 1.19uname -a
): 5.4.0-150-genericThe text was updated successfully, but these errors were encountered: