Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update API for 1.31 without partitionable model #36

Closed
wants to merge 3 commits into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
53 changes: 18 additions & 35 deletions dra-evolution/pkg/api/capacity_types.go
Original file line number Diff line number Diff line change
Expand Up @@ -59,14 +59,13 @@ type ResourcePoolSpec struct {
// vendor of the driver.
DriverName string `json:"driverName" protobuf:"bytes,3,name=driverName"`

// SharedCapacity defines the set of shared capacity consumable by
// devices in this ResourceSlice.
// SliceContent represents the type of data in this slice. Currently the only
// valid values is `Devices`. As we add features, new values will be added
// to this discriminator. This allows older schedulers to know if they should
// ignore the slice, due to it being of a type they do not understand.
//
// Must not have more than 128 entries.
//
// +listType=atomic
// +optional
SharedCapacity []SharedCapacity `json:"sharedCapacity,omitempty"`
// +required
SliceContent string `json:"sliceContent"`
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be an enum instead, @pohly ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the scheduler will decode this object, read this field, and then decide if it knows how to work with it or not? an enum seems appropriate then, since this should only be allowed to have a specific set of values.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this?

If a current-rev scheduler only knows how to work with "devices", it uses the slice.devices field to find those.

If a future API adds "gizmos" or some variant of "devices" (remember, I wanted to keep the current field named as "namedDevices"...), then it adds a slice.gizmos field. A current-rev scheduler will see an empty slice and ignore it because it contains no devices.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or is the intent to add additional fields to the Device struct? In that case such a SliceContent discriminator would be useful because an unknown value tells old clients that there is something in the slice.devices that they don't know about, but should know.

I just wouldn't call it SliceContent, but rather DeviceType: it's a description of the slice.devices.

If we add slice.gizmos (separate field), then they can have a slice.gizmoType.

Copy link
Contributor Author

@johnbelamaric johnbelamaric Jun 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We likely would add another field in addition to "Devices" to qualify or add onto the devices.

If we want to avoid the discriminator, I would avoid the term "Devices". But I don't think "NamedDevices" works because "named" isn't a differentiating feature. Maybe "SimpleDevices" or something?

If we keep the discriminatory, I do not think "DeviceType" is right - because it wouldn't make sense for "gizmos", for example.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having said that, I am okay with adding the discriminator. It may be redundant when adding a second slice with a different type, but for the case where the same type gets extended (https://github.com/kubernetes-sigs/wg-device-management/pull/36/files/3a7aa4f9789c7979426ea3c65497545136428f55#r1646662809) it is useful.

My question about validation after a downgrade applies also here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the bullet on "enumerated types" here, we should be OK as long as we document that schedulers that see unknown slice types should ignore those slices. I think older clients doing a write would be subject to the API validation of the API they are using, meaning they would not be able to use the new enum. I think if there is a field associated with the new enum value, we should clear it (the client doesn't know about the new enum, but we do). I think that's consistent with the logic described here, but we should discuss with api reviewers.

One question: Should (can?) we ensure all slices of a pool are of the same slice type? If not, old schedulers would have to ignore the whole pool, at least for "all", if ANY slice is of an unknown type...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would seem to me that adding an enum value should require a bump in the API version.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once we have reached v1, we cannot bump it any further?

I'm still not exactly clear on how to implement validation, but there must have been precedences, so I guess it is okay.

Copy link

@thockin thockin Jun 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A discriminator is required when you want the absence of config to imply a default, so that version skew isn't perceived as the default case.

For cases that are truly one-of (exactly one field, empty lists don't count), then a discriminator is not needed, because any actor can perceive the lack of a value, and choose not to act. A discriminator can make for better error messages, but that's about all.

Whether we extend Device to encompass partitions (assuming we can do that safely) or we add PartitionableDevice here, it shouldn't matter -- both are viable. Likewise, whether we want devices/gizmos/widgets/partitionables to co-exist in a slice or to be mutually exclusive, I think both are viable. There may be other things to consider to make those decisions, but I don't think this aspect matters.


// Devices lists all available devices in this pool.
//
Expand All @@ -78,7 +77,7 @@ type ResourcePoolSpec struct {
// them) empty pool.
}

const ResourcePoolMaxSharedCapacity = 128
const ResourcePoolMaxCommonData = 16
const ResourcePoolMaxDevices = 128

// Device represents one individual hardware instance that can be selected based
Expand All @@ -95,20 +94,22 @@ type Device struct {
//
// +listType=atomic
// +optional
//
Attributes []DeviceAttribute `json:"attributes,omitempty" protobuf:"bytes,3,opt,name=attributes"`

// SharedCapacityConsumed defines the set of shared capacity consumed by
// this device.
// Capacity defines the capacity values for this device.
// The name of each capacity must be unique.
//
// Must not have more than 32 entries.
//
// +listType=atomic
// +optional
SharedCapacityConsumed []SharedCapacity `json:"sharedCapacityConsumed,omitempty"`
//
Capacity []DeviceCapacity `json:"capacity,omitempty"`
}

const ResourcePoolMaxAttributesPerDevice = 32
const ResourcePoolMaxSharedCapacityConsumedPerDevice = 32
const ResourcePoolMaxCapacityPerDevice = 32

// ResourcePoolMaxDevices and ResourcePoolMaxAttributesPerDevice where chosen
// so that with the maximum attribute length of 96 characters the total size of
Expand Down Expand Up @@ -143,8 +144,6 @@ type DeviceAttribute struct {
// field "String" and the corresponding method. That method is required.
// The Kubernetes API is defined without that suffix to keep it more natural.

// QuantityValue is a quantity.
QuantityValue *resource.Quantity `json:"quantity,omitempty" protobuf:"bytes,2,opt,name=quantity"`
// BoolValue is a true/false value.
BoolValue *bool `json:"bool,omitempty" protobuf:"bytes,3,opt,name=bool"`
// StringValue is a string. Must not be longer than 64 characters.
Expand All @@ -154,40 +153,24 @@ type DeviceAttribute struct {
VersionValue *string `json:"version,omitempty" protobuf:"bytes,5,opt,name=version"`
}

type SharedCapacity struct {
// Name is a unique identifier among all shared capacities managed by the
type DeviceCapacity struct {
// Name is a unique identifier among all capacities managed by the
// driver in the pool.
//
// It is referenced both when defining the total amount of shared capacity
// that is available, as well as by individual devices when declaring
// how much of this shared capacity they consume.
//
// SharedCapacity names must be a C-style identifier (e.g. "the_name") with
// a maximum length of 32.
//
// By limiting these names to a C-style identifier, the same validation can
// be used for both these names and the identifier portion of a
// DeviceAttribute name.
//
// +required
Name string `json:"name"`

// Capacity is the total capacity of the named resource.
// This can either represent the total *available* capacity, or the total
// capacity *consumed*, depending on the context where it is referenced.
//
// +required
Capacity resource.Quantity `json:"capacity"`
johnbelamaric marked this conversation as resolved.
Show resolved Hide resolved
}

// CStyleIdentifierMaxLength is the maximum length of a c-style identifier used for naming.
const CStyleIdentifierMaxLength = 32

// DeviceAttributeMaxIDLength is the maximum length of the identifier in a device attribute name (`<domain>/<ID>`).
const DeviceAttributeMaxIDLength = CStyleIdentifierMaxLength
const DeviceAttributeMaxIDLength = 32

// DeviceAttributeMaxValueLength is the maximum length of a string or version attribute value.
const DeviceAttributeMaxValueLength = 64

// SharedCapacityMaxNameLength is the maximum length of a shared capacity name.
const SharedCapacityMaxNameLength = CStyleIdentifierMaxLength
// DeviceCapacityMaxNameLength is the maximum length of a shared capacity name.
const DeviceCapacityMaxNameLength = DeviceAttributeMaxIDLength