dra-evolution: quota mechanism #24

pohly · 2024-06-05T15:31:18Z

The "future extension" proposal in #14 for quota was to check quota at allocation time.

Pros:

Can limit resource consumption based on what actually gets made available to a user, compared to basing it on what is requested (might be a lower limit).
Supports creating more claims and pods than can run at the moment ("batching" - might not be relevant).
Can support "one of" (if X exceeds quota, use Y).

Cons:

All schedulers need to also consider the ResourceQuota when checking devices.
Exceeding quota has to be reported as part of scheduling failures. OTOH, users typically also don't create ResourceClaims manually, so there is some indirection with admission checks, too.

pohly · 2024-06-05T17:00:38Z

@johnbelamaric: let's continue here.

You wrote:

Whether it's memory or devices, as a workload author, I should be notified at admission time that I am asking for more than allowed.

Note that workload authors typically don't create claims and pods directly. They create an app object (Deployment) and the app controller creates the pods. Such a user doesn't get feedback at admission time, do they?

With DRA, such a user creates a ResourceClaimTemplate or uses some future "create claim for me" syntactic sugar. Then the resource claim controller tries to create the ResourceClaim and gets an admission error which needs to be propagated back to the user. Currently that is with an event for the pod.

johnbelamaric · 2024-06-05T17:13:00Z

Note that workload authors typically don't create claims and pods directly. They create an app object (Deployment) and the app controller creates the pods. Such a user doesn't get feedback at admission time, do they?

Hmm. Good point...

pohly · 2024-06-06T06:00:32Z

The alternative for "check at allocation time" is "check at admission time". What exactly should be checked and how would need to be defined further. It cannot rely on ResourcePool information (might not be available yet), so simulating the actual allocation isn't feasible. Counting the usage of a (then mandatory) DeviceClass was mentioned as a possible check.

I already implemented that for the "class per claim" approach from classic DRA, see kubernetes/kubernetes#120611. We haven't merged it because it wasn't clear whether it's the right approach.

Besides not being precise when trying to limit "total GPU memory used by a user" (admission only sees what was requested, not what was actually given to the user) I also don't see how we can support future use cases like "give me all GPUs on a node". At admission time it is unknown how many GPUs that will be, so should such a claim be allowed or denied when there is a maximum for "number of GPUs"?

If it's allowed, we would still need to check the actual number at allocation time, which defeats the purpose of checking at admission time that "different schedulers don't need to know about quota".

pohly · 2024-06-11T08:48:32Z

We now support "give me all GPUs on a node" in #24, so we have to figure out how to do quota for such requests.

Another example where "quota by class at admission time" breaks down is the future "give me device X, otherwise Y" use case.

Suppose at admission time, X is not allowed, but Y is. Should the request be allowed or denied? If it's allowed, the scheduler has to replicate the quota checks, which is what "check at admission time" was meant to avoid. If it's denied, it's a false negative because the claim could have been satisfied.

This is the "allocation time" proposal from kubernetes-sigs/wg-device-management#24.

pohly · 2024-06-11T19:01:03Z

More complete proposal for allocation-time quota checking:

// Quota controls whether a ResourceClaim may get allocated.
// Quota is namespaced and applies to claims within the same namespace.
type Quota struct {
    metav1.TypeMeta
    // Standard object metadata.
    metav1.ObjectMeta

    Spec QuotaSpec
}

type QuotaSpec struct {
    // Controls whether devices may get allocated with admin access
    // (concurrent with normal use, potentially privileged access permissions
    // depending on the driver). If multiple quota objects exist and at least one
    // has a true value, access will be allowed. The default to deny such access.
    //
    // +optional
    AllowAdminAccess bool `json:"allowManagementAccess,omitempty"`

    // The total number of allocated devices matching these selectors must not be
    // exceeded. This has to be checked in addition to other claim constraints
    // when checking whether a device can be allocated for a claim.
    MaxDeviceCounts []MaxDeviceCount

    // The sum of some quantity attribute of allocated devices must not
    // exceed a maximum value. This has to be checked in addition to other claim constraints
    // when checking whether a device can be allocated for a claim.
    MaxQuantity []MaxQuantity

    // Other useful future extensions (>= 1.32):

    // DeviceLimits is a CEL expression which take the currently allocated
    // devices and their attributes and some new allocations as input and
    // returns false if those allocations together are not permitted in the
    // namespace.
    //
    // DeviceLimits string

    // A class listed in DeviceClassDenyList must not be used in this
    // namespace. This can be useful for classes which contain
    // configuration parameters that a user in this namespace should not have
    // access to.
    //
    // DeviceClassDenyList []string

    // A class listed in ResourceClassAllowList may be used in this namespace
    // even when that class is marked as "privileged". Normally classes
    // are not privileged and using them does not require explicit listing
    // here, but some classes may contain more sensitive configuration parameters
    // that not every user should have access to.
    //
    // DeviceClassAllowList []string
}

type MaxDeviceCount struct {
    // The maximum number of allocated devices. May be zero to prevent using
    // certain devices.
    Maximum resource.Quantity

    // Only devices matching all selectors are counted.
    //
    // +listType=atomic
    Selectors []Selector
}

type MaxQuantity struct {
    // The maximum sum of a certain quantity attribute. Only allocated devices which
    // have this attribute as a quantity contribute towards the sum.
    Maximum resource.Quantity

    // The fully-qualified attribute name ("<domain>/<identifier>").
    AttributeName string
}

As it stands now, a single ResourceQuota object per namespace can contain all restrictions.
If we add a new field to the type (like the DeviceLimits or class allow/deny list),
old schedulers will ignore them and thus not apply the new restrictions.

Should we make QuotaSpec a one-of? Exactly one field would need to be set (true
or non-empty) per object. This prevents putting all restrictions into a single
object.

Or shall we nest a one-of inside a slice called "rules" or "settings"?

The semantic in both cases then is that if an old scheduler encounters a
ResourceQuota object with empty one-of, it knows that it cannot handle
allocation inside the namespace.

pohly · 2024-06-12T04:15:08Z

We now support "give me all GPUs on a node" in #24, so we have to figure out how to do quota for such requests.

There are two (crude) options:

Deny admission of claims which use this feature.
Do a worst-case analysis and assume that "all" = as many as allowed per claim = 32 (current limit for allocation result).

For a future "device X or Y" request the admission would have to be denied unless both X and Y are under quota.

I still don't think that we should do quota that way, but at least there would be a way to adapt kubernetes/kubernetes#120611 to per-request device classes.

As I mentioned in kubernetes/enhancements#4709 (comment), that admission check can also enforce that a device class name is set in namespaces where the check is configured. Elsewhere the name can remain optional.

k8s-triage-robot · 2024-09-10T04:32:30Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

pohly · 2024-09-11T13:33:55Z

/close

Let's plan for an allocation-time quota mechanism in kubernetes/enhancements#4840.

k8s-ci-robot · 2024-09-11T13:34:01Z

@pohly: Closing this issue.

In response to this:

/close

Let's plan for an allocation-time quota mechanism in kubernetes/enhancements#4840.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

pohly mentioned this issue Jun 10, 2024

KEP-4381: DRA update for 1.31 kubernetes/enhancements#4709

Merged

4 tasks

pohly added a commit to pohly/enhancements that referenced this issue Jun 11, 2024

DRA: include additional ResourceQuota fields

b244dda

This is the "allocation time" proposal from kubernetes-sigs/wg-device-management#24.

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 10, 2024

klueska mentioned this issue Sep 11, 2024

DRA: Improved Quota mechanism for DRA resources kubernetes/enhancements#4840

Open

4 tasks

k8s-ci-robot closed this as completed Sep 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dra-evolution: quota mechanism #24

dra-evolution: quota mechanism #24

pohly commented Jun 5, 2024

pohly commented Jun 5, 2024

johnbelamaric commented Jun 5, 2024

pohly commented Jun 6, 2024

pohly commented Jun 11, 2024

pohly commented Jun 11, 2024

pohly commented Jun 12, 2024

k8s-triage-robot commented Sep 10, 2024

pohly commented Sep 11, 2024

k8s-ci-robot commented Sep 11, 2024

dra-evolution: quota mechanism #24

dra-evolution: quota mechanism #24

Comments

pohly commented Jun 5, 2024

pohly commented Jun 5, 2024

johnbelamaric commented Jun 5, 2024

pohly commented Jun 6, 2024

pohly commented Jun 11, 2024

pohly commented Jun 11, 2024

pohly commented Jun 12, 2024

k8s-triage-robot commented Sep 10, 2024

pohly commented Sep 11, 2024

k8s-ci-robot commented Sep 11, 2024