Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] 添加了 Reservation Affinity 的 Pod 通过了调度器 Filte 阶段,但是节点上实际没有 Reservation 可以满足资源条件 #2111

Open
ZiMengSheng opened this issue Jun 18, 2024 · 0 comments
Labels
area/koord-scheduler kind/bug Create a report to help us improve
Milestone

Comments

@ZiMengSheng
Copy link
Contributor

目前的设计是:不同插件在 Filter 阶段分别判断节点上是否有 Reservation 可以满足资源诉求,但是完全有可能出现一个 Reservation 满足 CPU 但是 GPU 不满足,另一个 Reservation 满足 GPU 但是 CPU 不满足,这种节点会在 Filter 阶段会通过,所以这部分需要重新设计。
目前调度器已经支持了 ReservationNominator 接口,该接口由 Reservation 插件实现,NominateReservation 函数的实现中会调用所有实现了 ReservationFilter、ReservationScore 接口的插件尝试在节点上选出资源条件最满足的 Reservation。

// ReservationNominator nominates a more suitable Reservation in the Reserve stage and Pod will bind this Reservation.
// The Reservation will be recorded in CycleState through SetNominatedReservation.
// When executing Reserve, each plugin will obtain the currently used Reservation through GetNominatedReservation,
// and locate the previously returned reusable resources for Pod allocation.
type ReservationNominator interface {
	framework.Plugin
	NominateReservation(ctx context.Context, cycleState *framework.CycleState, pod *corev1.Pod, nodeName string) (*ReservationInfo, *framework.Status)
	AddNominatedReservation(pod *corev1.Pod, nodeName string, rInfo *ReservationInfo)
	RemoveNominatedReservations(pod *corev1.Pod)
	GetNominatedReservation(pod *corev1.Pod, nodeName string) *ReservationInfo
}
// ReservationFilterPlugin is an interface for Filter Reservation plugins.
// These plugins will be called during the Reserve phase to determine whether the Reservation can participate in the Reserve
type ReservationFilterPlugin interface {
	framework.Plugin
	FilterReservation(ctx context.Context, cycleState *framework.CycleState, pod *corev1.Pod, reservationInfo *ReservationInfo, nodeName string) *framework.Status
}

// ReservationScorePlugin is an interface that must be implemented by "ScoreReservation" plugins to rank
// reservations that passed the reserve phase.
type ReservationScorePlugin interface {
	framework.Plugin
	ScoreReservation(ctx context.Context, cycleState *framework.CycleState, pod *corev1.Pod, reservationInfo *ReservationInfo, nodeName string) (int64, *framework.Status)
	// ReservationScoreExtensions returns a ReservationScoreExtensions interface if it implements one, or nil if does not.
	ReservationScoreExtensions() ReservationScoreExtensions
}

// ReservationScoreExtensions is an interface for Score extended functionality.
type ReservationScoreExtensions interface {
	// NormalizeReservationScore is called for all node scores produced by the same plugin's "ScoreReservation"
	// method. A successful run of NormalizeReservationScore will update the scores list and return
	// a success status.
	NormalizeReservationScore(ctx context.Context, cycleState *framework.CycleState, pod *corev1.Pod, scores ReservationScoreList) *framework.Status
}

所以调用 NominateReservation 函数即可判断出节点是否有 Reservation 满足所有的资源诉求,这里考虑修改 Filter 的流程,使得:

  1. 先校验所有无须逐个 Reservation 判断的插件是否 Pass
  2. 再用 NominateReservation 判断是否有 Reservation 满足条件,如果设置了 ReservationAffinity,找不到 Reservation 的情况下返回失败
  3. 在校验没有 Reservation 可用的情况下资源条件是否满足
// RunFilterPluginsWithNominatedPods transforms the Filter phase of framework with filter transformers.
// We don't transform RunFilterPlugins since framework's RunFilterPluginsWithNominatedPods just calls its RunFilterPlugins.
func (ext *frameworkExtenderImpl) RunFilterPluginsWithNominatedPods(ctx context.Context, cycleState *framework.CycleState, pod *corev1.Pod, nodeInfo *framework.NodeInfo) *framework.Status {
	for _, pl := range ext.configuredPlugins.Filter.Enabled {
		transformer := ext.filterTransformers[pl.Name]
		if transformer == nil {
			continue
		}
		newPod, newNodeInfo, transformed, status := transformer.BeforeFilter(ctx, cycleState, pod, nodeInfo)
		if !status.IsSuccess() {
			klog.ErrorS(status.AsError(), "Failed to run BeforeFilter", "pod", klog.KObj(pod), "plugin", transformer.Name())
			return status
		}
		if transformed {
			klog.V(5).InfoS("BeforeFilter transformed", "transformer", transformer.Name(), "pod", klog.KObj(pod))
			pod = newPod
			nodeInfo = newNodeInfo
		}
	}

    // TOOD 校验所有跟 Reservation 无关的插件
	status := ext.Framework.RunFilterPluginsWithNominatedPods(ctx, cycleState, pod, nodeInfo)
	if !status.IsSuccess() && debugFilterFailure {
		klog.Infof("Failed to filter for Pod %q on Node %q, failedPlugin: %s, reason: %s", klog.KObj(pod), klog.KObj(nodeInfo.Node()), status.FailedPlugin(), status.Message())
	}

    // TODO 调用 NominateReservation 判断是否有Reservation 满足所有的资源诉求

    // TOOD 校验没有可用的Reservation 时判断节点剩余资源是否满足的插件
    status := ext.Framework.RunFilterPluginsWithNominatedPods(ctx, cycleState, pod, nodeInfo)
	if !status.IsSuccess() && debugFilterFailure {
		klog.Infof("Failed to filter for Pod %q on Node %q, failedPlugin: %s, reason: %s", klog.KObj(pod), klog.KObj(nodeInfo.Node()), status.FailedPlugin(), status.Message())
	}
	return status
}
@ZiMengSheng ZiMengSheng added the kind/bug Create a report to help us improve label Jun 18, 2024
@zwzhang0107 zwzhang0107 added this to the v1.6 milestone Jul 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/koord-scheduler kind/bug Create a report to help us improve
Projects
None yet
Development

No branches or pull requests

3 participants