-
Notifications
You must be signed in to change notification settings - Fork 971
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
with preempt or reclaim plugin, the high priority pod can not be placed at some node which meet the conditions for preemption #3329
Comments
Would you please supply more informations, such as scheduler configmap, and scheduler logs and jobs config? |
when preempt or reclaim, if one predicate function handler return status with
|
What is the error? |
There is a scene, an unscheduled pod with gpu resources is in the session of volcano/pkg/scheduler/plugins/predicates/predicates.go Lines 530 to 554 in 6e9f4f6
|
It's truly a problem in vgpu preemption, I think we should not reuturn err when vgpu resource insufficient here, if you're interested, welcome to fix that. |
Same problem: #3186. We can fix it to resolve both of them. |
@LivingCcj @lowang-bh You're welcome to fix this: ) |
This phenomenon has recurred when vgpu resource is insufficient.
Vital information:device_info.go:187] deviceSharing err= not enough gpu fitted on this node |
@archlitchi is owned and familar with the vgpu code. @Monokaix |
I might experience a similar issue. |
Maybe you can provide some logs: ) |
logs are in the zip file. |
@dmitsh , I think your case maybe different with this one. According the the log, it seems your case is that: do preemption for gang scheduling. I'd like to a google doc for this case, as we already have several discussion about that; it's time to close it :) |
Your jobs are in same queue, and queue is overused. It need to preemp tasks in same queue. Please update to latest version in master branch which fix the issue high priority job preempt low priority job in same queue. |
@dmitsh We are reproducing this issue and finding the cause. |
Seems it's another issue in your case, and #3230 has fixed it, please check your volcano version whether has included this pr: ) |
/close |
@Monokaix: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
when volcano scheduler open preempt or reclaim plugin,the high prioriry pod is unable to preempt the low priority pod. Although there are some nodes that meet the preemption conditions,beacuse one function in these predicateFns return err (is not nil), the potential node will be ignore
volcano/pkg/scheduler/actions/preempt/preempt.go
Lines 211 to 221 in 94c62a4
Environment:
kubectl version
): v1.20.15The text was updated successfully, but these errors were encountered: