Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add webhook mutate and controller actions and test case for minAvailable #2896

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

renwenlong-github
Copy link

issues #2895

@volcano-sh-bot volcano-sh-bot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Jun 5, 2023
@lowang-bh
Copy link
Member

how about put the podtemplate/pod genaration in a function so that there are fewer code lines?

@volcano-sh-bot volcano-sh-bot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Jun 6, 2023
@renwenlong-github
Copy link
Author

how about put the podtemplate/pod genaration in a function so that there are fewer code lines?

I have updated it

@volcano-sh-bot volcano-sh-bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Jun 6, 2023
@hwdef
Copy link
Member

hwdef commented Jun 6, 2023

Sorry, I don't understand the purpose of this pr and the description in the issue. Can you describe in more detail, is this PR to fix some bugs, or to enhance some functions?

@renwenlong-github
Copy link
Author

Sorry, I don't understand the purpose of this pr and the description in the issue. Can you describe in more detail, is this PR to fix some bugs, or to enhance some functions?

Sorry, I may not have made it clear. This is an optimization; I want to enhance the minAvailable function. As you know, there is a minavailable attribute in the task, and the job.spec also has the minavailable attribute.

I think users only need to write task.minavailable , Volcano's webhook calculates job.spec.minavailable;

on the other hand, I think there are bugs in the here of code that generate the pod group, especially the dependson task.

func (cc *jobcontroller) calcPGMinResources(job *batch.Job) *v1.ResourceList {
	// sort task by priorityClasses
	var tasksPriority TasksPriority
	for _, task := range job.Spec.Tasks {
		tp := TaskPriority{0, task}
		pc := task.Template.Spec.PriorityClassName

		if pc != "" {
			priorityClass, err := cc.pcLister.Get(pc)
			if err != nil || priorityClass == nil {
				klog.Warningf("Ignore task %s priority class %s: %v", task.Name, pc, err)
			} else {
				tp.priority = priorityClass.Value
			}
		}
		tasksPriority = append(tasksPriority, tp)
	}

	sort.Sort(tasksPriority)     // sort tasks

	minReq := v1.ResourceList{}
	podCnt := int32(0)
	for _, task := range tasksPriority {
		for i := int32(0); i < task.Replicas; i++ {
			if podCnt >= job.Spec.MinAvailable {
				break
			}

			podCnt++
			pod := &v1.Pod{
				Spec: task.Template.Spec,
			}
			minReq = quotav1.Add(minReq, *util.GetPodQuotaUsage(pod))
		}
	}

	return &minReq
}

It first calculates the resources of job.Spec.MinAvailable jobs with high priority. I think this is wrong in many cases, such as dependsOn task, or multi-task jobs with equal Priority levels. If job.Spec.MinAvailable < sum(tasks .MinAvailable), he will randomly calculate the minimum resource.

I think I'm going to write a detailed design document。

@hwdef
Copy link
Member

hwdef commented Jun 7, 2023

I probably see what you mean, thanks for the suggestion, and you are right.
I think it's a more complex change and it has a big impact, if you will write a detailed document, I'll be happy to review it.

@renwenlong-github
Copy link
Author

I probably see what you mean, thanks for the suggestion, and you are right. I think it's a more complex change and it has a big impact, if you will write a detailed document, I'll be happy to review it.

I know this is a very complicated issue, I may give a suggestion, the modification still requires the overall evaluation of the volcano team, and I will provide a design document. give me some time.

@volcano-sh-bot volcano-sh-bot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Jun 26, 2023
@renwenlong-github
Copy link
Author

This problem is difficult. I wrote a design document, which is currently in Chinese and can be changed to English later.

https://github.com/renwenlong-github/volcano/blob/pref-minAvailable/docs/design/task-minAvailable-dependson-design.md

If you have time, we can discuss in detail

@renwenlong-github
Copy link
Author

renwenlong-github commented Jun 26, 2023

issue #2921

@hwdef
Copy link
Member

hwdef commented Jun 26, 2023

Thanks for your design docs, I will read it later.

@hwdef
Copy link
Member

hwdef commented Jun 26, 2023

先用中文沟通了,我看了下文档,大致有下面几个问题,如果能更新下设计文档或者做出一些说明就更好了。


哪些设计是已有的,哪些是新加的,这个最好写清楚


如果有依赖则在job annotation中标注dependent-job=true

这个的目的是什么


把该运行的bind,不该运行的资源预留下来

如何预留资源?例如master先运行,worker后运行,如何保证master运行后,worker的资源不被抢走。如何兼容多调度器场景?例如在volcano做了资源预留,但是k8s原生调度器把剩余资源抢走了


注意:仅设置job.minAvailable不设置task.minAvailable,volcano会计算job.minAvailable=sum(tasks.replaces), 用户设置的job.minAvailable值会被覆盖。

这里不应该覆盖,即,仅设置job.minAvailable不设置task.minAvailable时,则应该按照job.minAvailable的设置为主,若job.minAvailable和task.minAvailable都未设置时,则job.minAvailable=sum(tasks.replaces)

@renwenlong-github
Copy link
Author

issue #2944

@lowang-bh
Copy link
Member

lowang-bh commented Jun 29, 2023

I think it should consider issue #2921 together. We need to discard this feature deeply. Nowadays there is nobady knows the design perpose which is introduced by @shinytang6 in PR #1459. I read the code and find some problems that the code logic is not self-consistent.


我觉得我们需要深度讨论一下这个功能,issue #2921 记录了需要考虑到的情况,包括目前已有的逻辑和还缺少的,自从PR#1459引入任务级别的minAvaiable之后,代码逻辑不再完备。

@renwenlong-github
Copy link
Author

I think it should consider issue #2921 together. We need to discard this feature deeply. Nowadays there is nobady knows the design perpose which is introduced by @shinytang6 in PR #1459. I read the code and find some problems that the code logic is not self-consistent.

我觉得我们需要深度讨论一下这个功能,issue #2921 记录了需要考虑到的情况,包括目前已有的逻辑和还缺少的,自从PR#1459引入任务级别的minAvaiable之后,代码逻辑不再完备。

Yeah, at first I thought it was a small bug, but I found some logic issues with both the controller and the scheduler, and like you said, we need to discuss this feature.
是的,刚开始的时候我以为是一个小bug,但是我发现无论是控制器创建podgroup还是调度器的job ready验证都有一些逻辑问题,就像你说的,我们需要讨论这个功能。

@renwenlong-github
Copy link
Author

先用中文沟通了,我看了下文档,大致有下面几个问题,如果能更新下设计文档或者做出一些说明就更好了。

哪些设计是已有的,哪些是新加的,这个最好写清楚

如果有依赖则在job annotation中标注dependent-job=true

这个的目的是什么

把该运行的bind,不该运行的资源预留下来

如何预留资源?例如master先运行,worker后运行,如何保证master运行后,worker的资源不被抢走。如何兼容多调度器场景?例如在volcano做了资源预留,但是k8s原生调度器把剩余资源抢走了

注意:仅设置job.minAvailable不设置task.minAvailable,volcano会计算job.minAvailable=sum(tasks.replaces), 用户设置的job.minAvailable值会被覆盖。

这里不应该覆盖,即,仅设置job.minAvailable不设置task.minAvailable时,则应该按照job.minAvailable的设置为主,若job.minAvailable和task.minAvailable都未设置时,则job.minAvailable=sum(tasks.replaces)

I modified the design document, help me review it, thank you

@@ -35,12 +35,15 @@ import (

var detectionPeriodOfDependsOntask time.Duration

// TaskMinAvailable todo(renwenlong) move to volcano.sh api
const TaskMinAvailable = "volcano.sh/task-min-available"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can do this now.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok,I will move it

@@ -235,7 +235,10 @@ func (alloc *Action) Execute(ssn *framework.Session) {
}

if ssn.JobReady(job) && !tasks.Empty() {
jobs.Push(job)
// skip dependent job push, because dependent job has other pods will be creat
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// skip dependent job push, because dependent job has other pods will be creat
// skip dependent job push, because dependent job has other pods will be create

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get

Comment on lines +173 to +175
TaskPriorityAnnotation = "volcano.sh/task-priority"
TaskMinAvailable = "volcano.sh/task-min-available"
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these need to be normalized, such as put them in the api

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok,I will move it

Comment on lines +214 to +217
if len(job.Spec.Tasks) > 1 && job.Spec.MinAvailable != totalReplicas {
msg += " Multi-tasks job 'minAvailable' needs to set task 'minAvailable', " +
"otherwise job 'minAvailable' must be equal to the sum of task 'replicas';"
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is needed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If
job.Spec.MinAvailable == 5 , totalReplicas =6 ,I think this is OK

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not explain my idea. If multi-task does not set task.minAvailable, it cannot explain that those tasks are equal to job.minAvailable.

apiVersion: batch.volcano.sh/v1alpha1
kind: Job
metadata:
  name: minavailable-job
spec:
  schedulerName: volcano
  minAvailable: 2
  tasks:
    - replicas: 3
      name: "master"
      template:
        metadata:
          name: master
        spec:
          containers:
            - image: nginx
              name: nginx
              resources:
                requests:
                  cpu: "1"
                  memory: "1Gi"
          restartPolicy: OnFailure
    - replicas: 3
      name: "work"
      template:
        metadata:
          name: web
        spec:
          containers:
            - image: nginx
              name: nginx
              resources:
                requests:
                  cpu: "1"
                  memory: "1Gi"
          restartPolicy: OnFailure

master Running 2 ? work Running 2 ? also master 1 and work 1 is ok?

Comment on lines +296 to +299
if len(new.Spec.Tasks) > 1 && new.Spec.MinAvailable != totalReplicas {
return fmt.Errorf(" Multi-tasks job 'minAvailable' needs to set task 'minAvailable', " +
"otherwise job 'minAvailable' must be equal to total task 'replicas'")
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is needed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If
job.Spec.MinAvailable == 5 , totalReplicas =6 ,I think this is OK.

# user-guide

When the user uses the vcjob minAvailable feature, only should set each `task.minAvailable` value, if not set `task.minAvailable`,
the default `task.minAvailable=task.replaces`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

task.minavailable is not equal to task.replaces if job.minavailable is set

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is only one task in job, job.minavailable or task.minavailable set any one is ok, but Multitasking is ambiguous.


<img src="images/minAvailable-dependsOn-3.png" alt="dependsOn1" width="700"><br>

### 2、task gang
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicated with line 218?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will fix it

```

# TODO
How to optimize dependsOn job to meet gang scheduling?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is most important that we deal with resource reservation, including resource reservation in the case of multiple schedulers, please add this information

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a complicated issue, I can't fix it in this pr, which means that dependency tasks don't support gang like before, but this pr solves the problem of scheduling deadlock

@hwdef
Copy link
Member

hwdef commented Jul 6, 2023

@lowang-bh
If you can, please also review this PR and let's improve the related features together.

@lowang-bh
Copy link
Member

@lowang-bh If you can, please also review this PR and let's improve the related features together.

I just find the others can not see my comments in code review... >_<

@hwdef
Copy link
Member

hwdef commented Jul 14, 2023

@lowang-bh If you can, please also review this PR and let's improve the related features together.

I just find the others can not see my comments in code review... >_<

you should click submit click
image

break
}
if task.MinAvailable == nil {
task.MinAvailable = &task.Replicas
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it reasonable to modify task's attribute here?

And, what's more, it dosn't cover all the cases.

if task.DependsOn == nil {
tmpMinReq := v1.ResourceList{}
// only calc task minAvailable
for i := int32(0); i < *task.MinAvailable; i++ {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we refact those duplicate codes? eg, define in a function.

Pods: work-0、work-1、work-2

the right pods order should be:
master-0、master-1、master-2、work-0、work-1、master-3、master-4、work-2、work-3
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should put in gang plugin.

@@ -188,6 +213,11 @@ func mutateSpec(tasks []v1alpha1.TaskSpec, basePath string, job *v1alpha1.Job) *
// mpi.AddDependsOn(job)
// }
patched := false
if len(tasks) == 1 && tasks[0].MinAvailable == nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is special code, I don't think it is a good change.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agren with you .

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That means i was wrong,I'm going to rethink this question🤔️。

@lowang-bh
Copy link
Member

I just find the others can not see my comments in code review... >_<

you should click submit click

Thanks.

@stale
Copy link

stale bot commented Oct 15, 2023

Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward?

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

@stale stale bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 15, 2023
@stale stale bot closed this Dec 15, 2023
@hwdef
Copy link
Member

hwdef commented Dec 15, 2023

/reopen

@volcano-sh-bot
Copy link
Contributor

@hwdef: Reopened this PR.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@stale stale bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 15, 2023
@volcano-sh-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please assign william-wang
You can assign the PR to them by writing /assign @william-wang in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link

stale bot commented Mar 17, 2024

Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward?

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

@stale stale bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 17, 2024
@volcano-sh-bot volcano-sh-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 9, 2024
@stale stale bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 9, 2024
@volcano-sh-bot
Copy link
Contributor

@renwenlong-github: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants