Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support min-max elastic quota scheduling #3702

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

lowang-bh
Copy link
Member

@lowang-bh lowang-bh commented Sep 1, 2024

Image this scene:queue has some guranteed quota called Min. Pods can be scheduled when queue's used quota + requests <= Min. And there is also a limit quota called Max, which means the upper quota a queue's tasks can used. While the quota between Min and Max can only be used by preemtable tasks in a queue, because those quota are borrowed from other queues' Min, and should be returned back when need. So preemtable pods can be scheduled when queue's used quota + requests <= Max.

This feature also called Elastic Quota or Capacity Scheduling.
referance:
capacity-scheduling
Elastic Quota Management

This Pr will do those things base on capacity plugin: Min equals to deserved and Max equals to capability

  1. Add a feature switch to enable or disable this feature, so that origin function will not be effected.
  2. Add an overused function in capacity plugin, this function make sure queue's used will be under Min if job's tasks are not preemptable, or queue's used will be under Max if job's tasks are preemptable when schedule a job.
  3. Change Preemtive function in capacity plugin to support check if queue's future used (a job's request + queue's allocated) is under Min. Only a job in a queue whose futrue used will not exceed its Min can preemt other victims.

relative issues: #3537
fixes #3703

The 1st commit is base on #3649, please merge that PR first.

@volcano-sh-bot volcano-sh-bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Sep 1, 2024
@lowang-bh lowang-bh changed the title Overused check gang enable support min-max elastic quota scheduling Sep 1, 2024
@lowang-bh lowang-bh force-pushed the overusedCheckGangEnable branch 2 times, most recently from 2723632 to 760fe57 Compare September 1, 2024 02:18
@lowang-bh
Copy link
Member Author

@Monokaix
Copy link
Member

Monokaix commented Sep 2, 2024

seems it's a little complex for users to use the capacity plugin or queue capability, and the problem in #3703 is really a common case?

@lowang-bh
Copy link
Member Author

seems it's a little complex for users to use the capacity plugin or queue capability, and the problem in #3703 is really a common case?

Another solution is to add a min-max plugin. But it also need modify some codes in main actions.

@Monokaix
Copy link
Member

Monokaix commented Sep 6, 2024

or queue's used will be under Max if job's tasks are preemptable when schedule a job.

ssn.Allocatable holds the capability check logic now.

@volcano-sh-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please assign hwdef
You can assign the PR to them by writing /assign @hwdef in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@volcano-sh-bot volcano-sh-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 19, 2024
1. if overusedCheckGangEnable in allocate, overused check weather job's request will exceed capability or deserved according to it is preemptable or not;
2. if overusedCheckGangEnable in reclaim, Preemptive check weather job's request will exceed deserved, if true, job in that queue can not reclaim

Signed-off-by: lowang-bh <[email protected]>
@volcano-sh-bot volcano-sh-bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 9, 2024
@lowang-bh lowang-bh closed this Nov 16, 2024
@lowang-bh lowang-bh reopened this Nov 16, 2024
@hwdef
Copy link
Member

hwdef commented Nov 18, 2024

Please rebase the master code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

queue's deserve can not be returned back when other queue's tasks are all not preemptable
6 participants