-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dra-evolution: quota mechanism #24
Comments
@johnbelamaric: let's continue here. You wrote:
Note that workload authors typically don't create claims and pods directly. They create an app object (Deployment) and the app controller creates the pods. Such a user doesn't get feedback at admission time, do they? With DRA, such a user creates a ResourceClaimTemplate or uses some future "create claim for me" syntactic sugar. Then the resource claim controller tries to create the ResourceClaim and gets an admission error which needs to be propagated back to the user. Currently that is with an event for the pod. |
Hmm. Good point... |
The alternative for "check at allocation time" is "check at admission time". What exactly should be checked and how would need to be defined further. It cannot rely on ResourcePool information (might not be available yet), so simulating the actual allocation isn't feasible. Counting the usage of a (then mandatory) DeviceClass was mentioned as a possible check. I already implemented that for the "class per claim" approach from classic DRA, see kubernetes/kubernetes#120611. We haven't merged it because it wasn't clear whether it's the right approach. Besides not being precise when trying to limit "total GPU memory used by a user" (admission only sees what was requested, not what was actually given to the user) I also don't see how we can support future use cases like "give me all GPUs on a node". At admission time it is unknown how many GPUs that will be, so should such a claim be allowed or denied when there is a maximum for "number of GPUs"? If it's allowed, we would still need to check the actual number at allocation time, which defeats the purpose of checking at admission time that "different schedulers don't need to know about quota". |
We now support "give me all GPUs on a node" in #24, so we have to figure out how to do quota for such requests. Another example where "quota by class at admission time" breaks down is the future "give me device X, otherwise Y" use case. Suppose at admission time, X is not allowed, but Y is. Should the request be allowed or denied? If it's allowed, the scheduler has to replicate the quota checks, which is what "check at admission time" was meant to avoid. If it's denied, it's a false negative because the claim could have been satisfied. |
This is the "allocation time" proposal from kubernetes-sigs/wg-device-management#24.
More complete proposal for allocation-time quota checking: // Quota controls whether a ResourceClaim may get allocated.
// Quota is namespaced and applies to claims within the same namespace.
type Quota struct {
metav1.TypeMeta
// Standard object metadata.
metav1.ObjectMeta
Spec QuotaSpec
}
type QuotaSpec struct {
// Controls whether devices may get allocated with admin access
// (concurrent with normal use, potentially privileged access permissions
// depending on the driver). If multiple quota objects exist and at least one
// has a true value, access will be allowed. The default to deny such access.
//
// +optional
AllowAdminAccess bool `json:"allowManagementAccess,omitempty"`
// The total number of allocated devices matching these selectors must not be
// exceeded. This has to be checked in addition to other claim constraints
// when checking whether a device can be allocated for a claim.
MaxDeviceCounts []MaxDeviceCount
// The sum of some quantity attribute of allocated devices must not
// exceed a maximum value. This has to be checked in addition to other claim constraints
// when checking whether a device can be allocated for a claim.
MaxQuantity []MaxQuantity
// Other useful future extensions (>= 1.32):
// DeviceLimits is a CEL expression which take the currently allocated
// devices and their attributes and some new allocations as input and
// returns false if those allocations together are not permitted in the
// namespace.
//
// DeviceLimits string
// A class listed in DeviceClassDenyList must not be used in this
// namespace. This can be useful for classes which contain
// configuration parameters that a user in this namespace should not have
// access to.
//
// DeviceClassDenyList []string
// A class listed in ResourceClassAllowList may be used in this namespace
// even when that class is marked as "privileged". Normally classes
// are not privileged and using them does not require explicit listing
// here, but some classes may contain more sensitive configuration parameters
// that not every user should have access to.
//
// DeviceClassAllowList []string
}
type MaxDeviceCount struct {
// The maximum number of allocated devices. May be zero to prevent using
// certain devices.
Maximum resource.Quantity
// Only devices matching all selectors are counted.
//
// +listType=atomic
Selectors []Selector
}
type MaxQuantity struct {
// The maximum sum of a certain quantity attribute. Only allocated devices which
// have this attribute as a quantity contribute towards the sum.
Maximum resource.Quantity
// The fully-qualified attribute name ("<domain>/<identifier>").
AttributeName string
} As it stands now, a single ResourceQuota object per namespace can contain all restrictions. Should we make QuotaSpec a one-of? Exactly one field would need to be set (true Or shall we nest a one-of inside a slice called "rules" or "settings"? The semantic in both cases then is that if an old scheduler encounters a |
There are two (crude) options:
For a future "device X or Y" request the admission would have to be denied unless both X and Y are under quota. I still don't think that we should do quota that way, but at least there would be a way to adapt kubernetes/kubernetes#120611 to per-request device classes. As I mentioned in kubernetes/enhancements#4709 (comment), that admission check can also enforce that a device class name is set in namespaces where the check is configured. Elsewhere the name can remain optional. |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/close Let's plan for an allocation-time quota mechanism in kubernetes/enhancements#4840. |
@pohly: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
The "future extension" proposal in #14 for quota was to check quota at allocation time.
Pros:
Cons:
The text was updated successfully, but these errors were encountered: