Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: schedule message burst with nodes number increasion #3051

Merged
merged 3 commits into from
Aug 26, 2023

Conversation

lowang-bh
Copy link
Member

@lowang-bh lowang-bh commented Aug 13, 2023

fix #3049, #2975,
fix #3050 by second commit

Signed-off-by: lowang-bh [email protected]

@volcano-sh-bot volcano-sh-bot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Aug 13, 2023
@lowang-bh
Copy link
Member Author

lowang-bh commented Aug 13, 2023

without second commit which return as soon as predicated node didn't fit:
image

allocate.go handle predicate result as error:
image

finnally with this PR:
image

@lowang-bh lowang-bh force-pushed the Hotfix branch 2 times, most recently from e442c01 to cf25a10 Compare August 13, 2023 15:18
@lowang-bh
Copy link
Member Author

lowang-bh commented Aug 14, 2023

/priority critical-urgent

@volcano-sh-bot
Copy link
Contributor

@lowang-bh: The label(s) priority/ cannot be applied. These labels are supported: ``

In response to this:

/priority/critical-urgent

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@volcano-sh-bot volcano-sh-bot added the priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. label Aug 14, 2023
@volcano-sh-bot
Copy link
Contributor

@lowang-bh: The label(s) priority/ cannot be applied. These labels are supported: ``

In response to this:

/priority critical-urgent

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@lowang-bh
Copy link
Member Author

lowang-bh commented Aug 14, 2023

Copy link
Member

@hwdef hwdef left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
Thanks

@volcano-sh-bot volcano-sh-bot added the lgtm Indicates that a PR is ready to be merged. label Aug 14, 2023
@lowang-bh
Copy link
Member Author

/assign @Thor-wl @william-wang

1. improve performance
2. fix the pod messages has many ','

Signed-off-by: lowang-bh <[email protected]>
@volcano-sh-bot volcano-sh-bot removed the lgtm Indicates that a PR is ready to be merged. label Aug 18, 2023
@lowang-bh
Copy link
Member Author

lowang-bh commented Aug 18, 2023

commit 3: fix issue that: success predicated nodes's result should not be append to final status;
because succeed predicated node has empty reason, and joined string will has empty string between ','

before fixed:
image

After fixed:
image

succeed predicated node has empty reason, and joined string will has empty string between ','

Signed-off-by: lowang-bh <[email protected]>
@lowang-bh
Copy link
Member Author

Hi, @hwdef I updated this pr. Please help to review. thanks.

if err != nil {
return predicateStatus, fmt.Errorf("plugin %s predicates failed %s", nodeports.Name, status.Message())
}
if nodePortStatus.Code != api.Success {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there will be problems in judging the return of api.Success, such as the following scenarios:

When preempt filtering candidate nodes, the result returned by the nodeport plug-in filter is Unschedulable, and the result returned by podAffinity filter is UnschedulableAndUnresolvable. If you directly judge non-api.Success and return, the node will be mistakenly added to the candidate for preemption In the node list, but in fact the node does not meet the preemption conditions.

Copy link
Member Author

@lowang-bh lowang-bh Aug 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the root cause? Both allocate/preemt/backfill has its own function to check the returned predicateStatus set. Preemt action does not check Unschedulable reason.

predicateFn := func(task *api.TaskInfo, node *api.NodeInfo) ([]*api.Status, error) {
// Allows scheduling to nodes that are in Success or Unschedulable state after filtering by predicate.
var statusSets util.StatusSets
statusSets, err := ssn.PredicateFn(task, node)
if err != nil {
return nil, fmt.Errorf("preempt predicates failed for task <%s/%s> on node <%s>: %v",
task.Namespace, task.Name, node.Name, err)
}
if statusSets.ContainsUnschedulableAndUnresolvable() || statusSets.ContainsErrorSkipOrWait() {
return nil, fmt.Errorf("predicates failed in preempt for task <%s/%s> on node <%s>, status is not success or unschedulable",
task.Namespace, task.Name, node.Name)
}
return nil, nil
}

Copy link
Member Author

@lowang-bh lowang-bh Aug 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wangyang0616 I know what you mean, the root cause is that some predicate plugin does not need to run in preemption, right? There are two solutions:

  1. keep the origin changes: put all predicate results to finial results, and then filter out those reason=="" when joined them, but the continue on same onde with other predicate plugin has some performance effect.
  2. add addtional plugin-enable-switch which independent from each actions, to controll each action should use its own plugins.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For resource quota plug-ins, the predicate will return Unschedulable status, indicating that the current node does not allow allcate, but can preempt;

For plug-ins with inherent types of nodes (such as taint, pro-core), UnschedulableAndUnresolved is returned, indicating that neither allocate nor preempt is possible.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think plan 1 will be better. The action part formulates predicate rules, and each plug-in realizes its own capabilities according to the rules. It is best for the action to be unaware of the capabilities of each plug-in.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about other reviewers? @hwdef @Thor-wl @william-wang

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked at the processing of preempt. During the preemption process, pods on the node will continue to be evicted until certain conditions are met, and the eviction will stop.

The condition for stopping eviction is that extended resources such as cpu, memory, and gpu meet the requirements of the preemptor, and then stop eviction. In fact, there is no determination of the corresponding affinity, antiAffinity, topologyspread and other conditions, so the preempt will think that the node has met the scheduling conditions of the preemptor, but the actual allocation scheduling will still be intercepted by plugins such as affinity and antiAffinity, resulting in preemption failure.

for !victimsQueue.Empty() {
// If reclaimed enough resources, break loop to avoid Sub panic.
// If preemptor's queue is overused, it means preemptor can not be allcated. So no need care about the node idle resourace
if !ssn.Overused(currentQueue) && preemptor.InitResreq.LessEqual(node.FutureIdle(), api.Zero) {
break
}

For example issue: #3068 is the problem.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there will be problems in judging the return of api.Success, such as the following scenarios:

When preempt filtering candidate nodes, the result returned by the nodeport plug-in filter is Unschedulable, and the result returned by podAffinity filter is UnschedulableAndUnresolvable. If you directly judge non-api.Success and return, the node will be mistakenly added to the candidate for preemption In the node list, but in fact the node does not meet the preemption conditions.

Preemption should be rejected in this scenario. Currently, the preemption function of this field is not supported. Preemption is only effective for resource usage.

@wangyang0616
Copy link
Member

I agree with your current modification plan.

@wangyang0616
Copy link
Member

/lgtm

@volcano-sh-bot volcano-sh-bot added the lgtm Indicates that a PR is ready to be merged. label Aug 26, 2023
if err != nil {
return predicateStatus, fmt.Errorf("plugin %s predicates failed %s", interpodaffinity.Name, status.Message())
}
if podAffinityStatus.Code != api.Success {
predicateStatus = append(predicateStatus, podAffinityStatus)
return predicateStatus, nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the analysis of issue: #3068, I think it is better to return error here. The reason is as described in the comments. Currently, Volcano only supports resource preemption, and does not support algorithm preemption functions such as affinity topology.

If it is convenient, you can update it in your pr, or I can mention another pr to fix it

@wangyang0616
Copy link
Member

wangyang0616 commented Aug 26, 2023

Please help to review. Thanks. @william-wang

Copy link
Member

@william-wang william-wang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@volcano-sh-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: william-wang

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@volcano-sh-bot volcano-sh-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 26, 2023
@volcano-sh-bot volcano-sh-bot merged commit c91eb07 into volcano-sh:master Aug 26, 2023
21 of 22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
6 participants