fix: schedule message burst with nodes number increasion #3051

lowang-bh · 2023-08-13T14:19:34Z

fix #3049, #2975,
fix #3050 by second commit

Signed-off-by: lowang-bh [email protected]

Signed-off-by: lowang-bh <[email protected]>

lowang-bh · 2023-08-13T14:52:10Z

without second commit which return as soon as predicated node didn't fit:

allocate.go handle predicate result as error:

finnally with this PR:

lowang-bh · 2023-08-14T00:49:40Z

/priority critical-urgent

volcano-sh-bot · 2023-08-14T00:49:43Z

@lowang-bh: The label(s) priority/ cannot be applied. These labels are supported: ``

In response to this:

/priority/critical-urgent

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

volcano-sh-bot · 2023-08-14T00:49:59Z

@lowang-bh: The label(s) priority/ cannot be applied. These labels are supported: ``

In response to this:

/priority critical-urgent

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

lowang-bh · 2023-08-14T00:51:12Z

/assign @wangyang0616 @hwdef @Thor-wl @Yikun @william-wang

hwdef

/lgtm
Thanks

lowang-bh · 2023-08-16T08:24:37Z

/assign @Thor-wl @william-wang

1. improve performance 2. fix the pod messages has many ',' Signed-off-by: lowang-bh <[email protected]>

lowang-bh · 2023-08-18T09:34:35Z

commit 3: fix issue that: success predicated nodes's result should not be append to final status;
because succeed predicated node has empty reason, and joined string will has empty string between ','

before fixed:

After fixed:

succeed predicated node has empty reason, and joined string will has empty string between ',' Signed-off-by: lowang-bh <[email protected]>

lowang-bh · 2023-08-19T00:07:39Z

Hi, @hwdef I updated this pr. Please help to review. thanks.

wangyang0616 · 2023-08-19T01:42:28Z

pkg/scheduler/plugins/predicates/predicates.go

 			if err != nil {
 				return predicateStatus, fmt.Errorf("plugin %s predicates failed %s", nodeports.Name, status.Message())
 			}
+			if nodePortStatus.Code != api.Success {


I think there will be problems in judging the return of api.Success, such as the following scenarios:

When preempt filtering candidate nodes, the result returned by the nodeport plug-in filter is Unschedulable, and the result returned by podAffinity filter is UnschedulableAndUnresolvable. If you directly judge non-api.Success and return, the node will be mistakenly added to the candidate for preemption In the node list, but in fact the node does not meet the preemption conditions.

What's the root cause? Both allocate/preemt/backfill has its own function to check the returned predicateStatus set. Preemt action does not check Unschedulable reason.

volcano/pkg/scheduler/actions/preempt/preempt.go

Lines 211 to 225 in 258ad60

predicateFn := func(task *api.TaskInfo, node *api.NodeInfo) ([]*api.Status, error) {

// Allows scheduling to nodes that are in Success or Unschedulable state after filtering by predicate.

var statusSets util.StatusSets

statusSets, err := ssn.PredicateFn(task, node)

if err != nil {

return nil, fmt.Errorf("preempt predicates failed for task <%s/%s> on node <%s>: %v",

task.Namespace, task.Name, node.Name, err)

}

if statusSets.ContainsUnschedulableAndUnresolvable() || statusSets.ContainsErrorSkipOrWait() {

return nil, fmt.Errorf("predicates failed in preempt for task <%s/%s> on node <%s>, status is not success or unschedulable",

task.Namespace, task.Name, node.Name)

}

return nil, nil

}

@wangyang0616 I know what you mean, the root cause is that some predicate plugin does not need to run in preemption, right? There are two solutions:

keep the origin changes: put all predicate results to finial results, and then filter out those reason=="" when joined them, but the continue on same onde with other predicate plugin has some performance effect.

add addtional plugin-enable-switch which independent from each actions, to controll each action should use its own plugins.

For resource quota plug-ins, the predicate will return Unschedulable status, indicating that the current node does not allow allcate, but can preempt;

For plug-ins with inherent types of nodes (such as taint, pro-core), UnschedulableAndUnresolved is returned, indicating that neither allocate nor preempt is possible.

I think plan 1 will be better. The action part formulates predicate rules, and each plug-in realizes its own capabilities according to the rules. It is best for the action to be unaware of the capabilities of each plug-in.

How about other reviewers? @hwdef @Thor-wl @william-wang

I looked at the processing of preempt. During the preemption process, pods on the node will continue to be evicted until certain conditions are met, and the eviction will stop.

The condition for stopping eviction is that extended resources such as cpu, memory, and gpu meet the requirements of the preemptor, and then stop eviction. In fact, there is no determination of the corresponding affinity, antiAffinity, topologyspread and other conditions, so the preempt will think that the node has met the scheduling conditions of the preemptor, but the actual allocation scheduling will still be intercepted by plugins such as affinity and antiAffinity, resulting in preemption failure.

volcano/pkg/scheduler/actions/preempt/preempt.go

Lines 274 to 279 in 36abf1b

for !victimsQueue.Empty() {

// If reclaimed enough resources, break loop to avoid Sub panic.

// If preemptor's queue is overused, it means preemptor can not be allcated. So no need care about the node idle resourace

if !ssn.Overused(currentQueue) && preemptor.InitResreq.LessEqual(node.FutureIdle(), api.Zero) {

break

}

For example issue: #3068 is the problem.

I think there will be problems in judging the return of api.Success, such as the following scenarios:

When preempt filtering candidate nodes, the result returned by the nodeport plug-in filter is Unschedulable, and the result returned by podAffinity filter is UnschedulableAndUnresolvable. If you directly judge non-api.Success and return, the node will be mistakenly added to the candidate for preemption In the node list, but in fact the node does not meet the preemption conditions.

Preemption should be rejected in this scenario. Currently, the preemption function of this field is not supported. Preemption is only effective for resource usage.

wangyang0616 · 2023-08-26T04:08:03Z

I agree with your current modification plan.

wangyang0616 · 2023-08-26T04:08:12Z

/lgtm

wangyang0616 · 2023-08-26T04:15:56Z

pkg/scheduler/plugins/predicates/predicates.go

 			if err != nil {
 				return predicateStatus, fmt.Errorf("plugin %s predicates failed %s", interpodaffinity.Name, status.Message())
 			}
+			if podAffinityStatus.Code != api.Success {
+				predicateStatus = append(predicateStatus, podAffinityStatus)
+				return predicateStatus, nil


According to the analysis of issue: #3068, I think it is better to return error here. The reason is as described in the comments. Currently, Volcano only supports resource preemption, and does not support algorithm preemption functions such as affinity topology.

If it is convenient, you can update it in your pr, or I can mention another pr to fix it

wangyang0616 · 2023-08-26T04:23:33Z

Please help to review. Thanks. @william-wang

william-wang

/lgtm

volcano-sh-bot · 2023-08-26T04:37:25Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: william-wang

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~pkg/scheduler/OWNERS~~ [william-wang]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

volcano-sh-bot requested review from k82cn and shinytang6 August 13, 2023 14:19

volcano-sh-bot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Aug 13, 2023

fix: schedule message burst with nodes number increasion

52f096d

Signed-off-by: lowang-bh <[email protected]>

lowang-bh force-pushed the Hotfix branch 2 times, most recently from e442c01 to cf25a10 Compare August 13, 2023 15:18

volcano-sh-bot added the priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. label Aug 14, 2023

lowang-bh mentioned this pull request Aug 14, 2023

fix: enhanced FitError of JobInfo by logging the insufficient resources #3052

Closed

hwdef reviewed Aug 14, 2023

View reviewed changes

volcano-sh-bot assigned hwdef Aug 14, 2023

volcano-sh-bot added the lgtm Indicates that a PR is ready to be merged. label Aug 14, 2023

volcano-sh-bot assigned Thor-wl and william-wang Aug 16, 2023

stop predicating if current node is not suitable for pod

74ceb9f

1. improve performance 2. fix the pod messages has many ',' Signed-off-by: lowang-bh <[email protected]>

lowang-bh force-pushed the Hotfix branch from cf25a10 to 74ceb9f Compare August 18, 2023 08:16

volcano-sh-bot removed the lgtm Indicates that a PR is ready to be merged. label Aug 18, 2023

fix: success predicated nodes should not append result to final status

258ad60

succeed predicated node has empty reason, and joined string will has empty string between ',' Signed-off-by: lowang-bh <[email protected]>

lowang-bh force-pushed the Hotfix branch from 206bd4b to 258ad60 Compare August 18, 2023 09:46

lowang-bh closed this Aug 18, 2023

lowang-bh reopened this Aug 18, 2023

wangyang0616 reviewed Aug 19, 2023

View reviewed changes

volcano-sh-bot assigned wangyang0616 Aug 26, 2023

volcano-sh-bot added the lgtm Indicates that a PR is ready to be merged. label Aug 26, 2023

wangyang0616 reviewed Aug 26, 2023

View reviewed changes

william-wang approved these changes Aug 26, 2023

View reviewed changes

volcano-sh-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 26, 2023

volcano-sh-bot merged commit c91eb07 into volcano-sh:master Aug 26, 2023
21 of 22 checks passed

wangyang0616 mentioned this pull request Aug 26, 2023

Fix the conflict between preemption and antiAffinity #3070

Merged

lowang-bh mentioned this pull request Aug 27, 2023

[good first issue]add relative unit testcase for merged PRs #3075

Open

4 tasks

wangyang0616 mentioned this pull request Aug 28, 2023

[cherry-pick for release-1.8] msg information optimization; preemption logic optimization #3082

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: schedule message burst with nodes number increasion #3051

fix: schedule message burst with nodes number increasion #3051

lowang-bh commented Aug 13, 2023 •

edited

Loading

lowang-bh commented Aug 13, 2023 •

edited

Loading

lowang-bh commented Aug 14, 2023 •

edited

Loading

volcano-sh-bot commented Aug 14, 2023

volcano-sh-bot commented Aug 14, 2023

lowang-bh commented Aug 14, 2023 •

edited

Loading

hwdef left a comment

lowang-bh commented Aug 16, 2023

lowang-bh commented Aug 18, 2023 •

edited

Loading

lowang-bh commented Aug 19, 2023

wangyang0616 Aug 19, 2023

lowang-bh Aug 19, 2023 •

edited

Loading

lowang-bh Aug 20, 2023 •

edited

Loading

wangyang0616 Aug 20, 2023

wangyang0616 Aug 20, 2023

lowang-bh Aug 21, 2023

wangyang0616 Aug 26, 2023

wangyang0616 Aug 26, 2023

wangyang0616 commented Aug 26, 2023

wangyang0616 commented Aug 26, 2023

wangyang0616 Aug 26, 2023

wangyang0616 commented Aug 26, 2023 •

edited

Loading

william-wang left a comment

volcano-sh-bot commented Aug 26, 2023

	predicateFn := func(task api.TaskInfo, node api.NodeInfo) ([]*api.Status, error) {
	// Allows scheduling to nodes that are in Success or Unschedulable state after filtering by predicate.
	var statusSets util.StatusSets
	statusSets, err := ssn.PredicateFn(task, node)
	if err != nil {
	return nil, fmt.Errorf("preempt predicates failed for task <%s/%s> on node <%s>: %v",
	task.Namespace, task.Name, node.Name, err)
	}

	if statusSets.ContainsUnschedulableAndUnresolvable() \|\| statusSets.ContainsErrorSkipOrWait() {
	return nil, fmt.Errorf("predicates failed in preempt for task <%s/%s> on node <%s>, status is not success or unschedulable",
	task.Namespace, task.Name, node.Name)
	}
	return nil, nil
	}

	for !victimsQueue.Empty() {
	// If reclaimed enough resources, break loop to avoid Sub panic.
	// If preemptor's queue is overused, it means preemptor can not be allcated. So no need care about the node idle resourace
	if !ssn.Overused(currentQueue) && preemptor.InitResreq.LessEqual(node.FutureIdle(), api.Zero) {
	break
	}

fix: schedule message burst with nodes number increasion #3051

fix: schedule message burst with nodes number increasion #3051

Conversation

lowang-bh commented Aug 13, 2023 • edited Loading

lowang-bh commented Aug 13, 2023 • edited Loading

lowang-bh commented Aug 14, 2023 • edited Loading

volcano-sh-bot commented Aug 14, 2023

volcano-sh-bot commented Aug 14, 2023

lowang-bh commented Aug 14, 2023 • edited Loading

hwdef left a comment

Choose a reason for hiding this comment

lowang-bh commented Aug 16, 2023

lowang-bh commented Aug 18, 2023 • edited Loading

lowang-bh commented Aug 19, 2023

wangyang0616 Aug 19, 2023

Choose a reason for hiding this comment

lowang-bh Aug 19, 2023 • edited Loading

Choose a reason for hiding this comment

lowang-bh Aug 20, 2023 • edited Loading

Choose a reason for hiding this comment

wangyang0616 Aug 20, 2023

Choose a reason for hiding this comment

wangyang0616 Aug 20, 2023

Choose a reason for hiding this comment

lowang-bh Aug 21, 2023

Choose a reason for hiding this comment

wangyang0616 Aug 26, 2023

Choose a reason for hiding this comment

wangyang0616 Aug 26, 2023

Choose a reason for hiding this comment

wangyang0616 commented Aug 26, 2023

wangyang0616 commented Aug 26, 2023

wangyang0616 Aug 26, 2023

Choose a reason for hiding this comment

wangyang0616 commented Aug 26, 2023 • edited Loading

william-wang left a comment

Choose a reason for hiding this comment

volcano-sh-bot commented Aug 26, 2023

lowang-bh commented Aug 13, 2023 •

edited

Loading

lowang-bh commented Aug 13, 2023 •

edited

Loading

lowang-bh commented Aug 14, 2023 •

edited

Loading

lowang-bh commented Aug 14, 2023 •

edited

Loading

lowang-bh commented Aug 18, 2023 •

edited

Loading

lowang-bh Aug 19, 2023 •

edited

Loading

lowang-bh Aug 20, 2023 •

edited

Loading

wangyang0616 commented Aug 26, 2023 •

edited

Loading