Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Any way to know result of other plugins' predicateFn results within another plugin? #3736

Open
qGentry opened this issue Sep 19, 2024 · 4 comments
Labels
kind/question Categorizes issue related to a new question

Comments

@qGentry
Copy link

qGentry commented Sep 19, 2024

Please describe your problem in detail

Hi guys, I've recently created this issue #3712 about how to handle cases when I have multiple InfiniBand clusters within single K8s. Basically, I want for pods in podgroup to schedule only within single IB clusters.

I've implemented such plugin and it seem to work properly on cases I've described here
#3712 (comment)

I'm new to volcano codebase and have limited experience with golang, so it probably far from ideal, here it is:
https://github.com/qGentry/volcano/blob/feature-ibclusters-plugin/pkg/scheduler/plugins/clusters/clusters.go#L53

During implementation I've had to do some weird hack which I don't really like.

Let me first explain how this plugin works, plugin just implements predicateFn as following:

  1. Extract from ssn minAvailable replicas for this task and all nodes in cluster.
  2. Then, based on nodes' labels, create a map {cluster_name: []node}.
  3. Filter clusters so only clusters who has len(nodes) >= minAvaiable are left
  4. Randomly select cluster
  5. For given node, check if it is in randomly selected cluster, if is, allowed scheduling, otherwise - don't allow.

Weird thing here is random cluster selection, it works for me as I have only 3 or 4, but this would really slow down scheduling if I would have 10+ clusters.
If instead of randomly selecting cluster I would do it deterministically, for example, first one in sorted list, this wouldn't work as other plugin might have filtered nodes in first cluster and actually amount of available nodes is smaller than minAvailable.

Ideal Implementation would be if somehow my plugin would already know which nodes are not filtered by other plugins/taints/label/resources selector, and use this information to filter these nodes when creating map {cluster_name: []node}.

Do you guys know to can I improve this plugin and maybe even upstream it?

Any other relevant information

No response

@qGentry qGentry added the kind/question Categorizes issue related to a new question label Sep 19, 2024
@JesseStutler
Copy link
Member

Maybe you can add your logic in

func (ssn *Session) PredicateFn(task *api.TaskInfo, node *api.NodeInfo) error {
for _, tier := range ssn.Tiers {
for _, plugin := range tier.Plugins {
if !isEnabled(plugin.EnabledPredicate) {
continue
}
pfn, found := ssn.predicateFns[plugin.Name]
if !found {
continue
}
err := pfn(task, node)
if err != nil {
return err
}
}
}
return nil
}
to add some field in session, and then get it from your plugin. Or you can modify other plugins' predicateFn to implement your feature.

@qGentry
Copy link
Author

qGentry commented Sep 19, 2024

I believe that it would implicate specific ordering on calling plugins' predicateFn which might be unwanted behavior, because if we'll allow it for one plugin, then we have to allow it for all plugins, and if two plugins have ordering "very last", we'll get ambiguity.

Another way I was thinking of it is the following:
In my plugin's predicateFn I can access ssn.predicateFns and iterate through them just like it happens in the snippet you've provided and check if nodes were already filtered by other plugins (basically run every predicateFn except itself). Problem is that if we'll allow to do so and another plugin will do the same, we'll get infinite recursion.

@Monokaix
Copy link
Member

Monokaix commented Sep 20, 2024

Hi @qGentry, thanks for your feedback.
The best way is modifying the logic in allocate action and allocate can traverse all node groups and then select a best one that can fit all tasks because allocate knows all the final predict result, which we will implement that based on the #3388 in next version(ideally).
The hack way is that you can put your plugin at the end of scheduler configuration in configMap, if scheduler runs your plugin, it means other pluings returned success and have no disturbance to you. And let plugin be aware of other plugin's behavior is not a good way: )

@Monokaix
Copy link
Member

cc @lowang-bh

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/question Categorizes issue related to a new question
Projects
None yet
Development

No branches or pull requests

4 participants
@Monokaix @JesseStutler @qGentry and others