Update API for 1.31 without partitionable model #36

johnbelamaric · 2024-06-18T16:28:47Z

This attempts to update the API to:

Remove partitionable model
Allow extension points for future addition of partitionable model, in line with the discussions in dra-evolution: partitioning of devices #20.

This is rough, not exactly what would go into k/k. But I think it has what we need. It allows us to do common attrs and common capacity (common, not shared). It looks little like some of Kevin's earlier prototypes and a sort of simplified version of #31 and #35.

cc @pohly @klueska

k8s-ci-robot · 2024-06-18T16:28:51Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: johnbelamaric

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [johnbelamaric]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

dra-evolution/pkg/api/capacity_types.go

klueska · 2024-06-18T16:56:37Z

dra-evolution/pkg/api/capacity_types.go

-	// SharedCapacity defines the set of shared capacity consumable by
-	// devices in this ResourceSlice.
+	// CommonData represents information that is common across various devices,
+	// to reduce redundancy in the devices list.
 	//
-	// Must not have more than 128 entries.
-	//
-	// +listType=atomic
 	// +optional
-	SharedCapacity []SharedCapacity `json:"sharedCapacity,omitempty"`
+	CommonData []CommonDeviceData `json:"commonData,omitempty"`


I wouldn't include this. If we are ripping out the model that supports partitioning, I think we should should just go back to an explicit list of attributes per device.

We need an extension point to insert the things we are debating in #20. That's the primary reason that is here. Adding the common things was just a way to make it have some meaning and not be a stub. With this, an old scheduler an see that there is an entry in this list, but when it unmarshals it, all the fields the scheduler is aware of are nil, so it knows "ooo, I should ignore this slice, I don't know what to do with it".

If we don't have this, then we need some other way for an older scheduler to know that "it doesn't understand this object".

The alternative would be to have a discriminator field, like SliceContent: "SimpleDevices" or something. Then we would add new values to that so old schedulers know if they should handle the slice or not.

My current thinking is that we would not have any extension points directly in the device at this point. Instead every device will have the ability to point to one (and exactly one) DeviceShape/DeviceTemplate. That is what will have all of these extra bells and whistles. The device itself is still just a named list of attributes (and now capacities).

Ok, that works for me. The thing is, we still need a way for old schedulers to know "this slice is not something I understand". So a "SliceType" discriminator perhaps? Or we have to have a field like I have done here, that has a one-of.

dra-evolution/pkg/api/capacity_types.go

johnbelamaric · 2024-06-18T18:25:38Z

dra-evolution/pkg/api/capacity_types.go

-	// +optional
-	SharedCapacity []SharedCapacity `json:"sharedCapacity,omitempty"`
+	// +required
+	SliceContent string `json:"sliceContent"`


Should this be an enum instead, @pohly ?

So the scheduler will decode this object, read this field, and then decide if it knows how to work with it or not? an enum seems appropriate then, since this should only be allowed to have a specific set of values.

Do we need this?

If a current-rev scheduler only knows how to work with "devices", it uses the slice.devices field to find those.

If a future API adds "gizmos" or some variant of "devices" (remember, I wanted to keep the current field named as "namedDevices"...), then it adds a slice.gizmos field. A current-rev scheduler will see an empty slice and ignore it because it contains no devices.

Or is the intent to add additional fields to the Device struct? In that case such a SliceContent discriminator would be useful because an unknown value tells old clients that there is something in the slice.devices that they don't know about, but should know.

I just wouldn't call it SliceContent, but rather DeviceType: it's a description of the slice.devices.

If we add slice.gizmos (separate field), then they can have a slice.gizmoType.

We likely would add another field in addition to "Devices" to qualify or add onto the devices.

If we want to avoid the discriminator, I would avoid the term "Devices". But I don't think "NamedDevices" works because "named" isn't a differentiating feature. Maybe "SimpleDevices" or something?

If we keep the discriminatory, I do not think "DeviceType" is right - because it wouldn't make sense for "gizmos", for example.

Having said that, I am okay with adding the discriminator. It may be redundant when adding a second slice with a different type, but for the case where the same type gets extended (https://github.com/kubernetes-sigs/wg-device-management/pull/36/files/3a7aa4f9789c7979426ea3c65497545136428f55#r1646662809) it is useful.

My question about validation after a downgrade applies also here.

According to the bullet on "enumerated types" here, we should be OK as long as we document that schedulers that see unknown slice types should ignore those slices. I think older clients doing a write would be subject to the API validation of the API they are using, meaning they would not be able to use the new enum. I think if there is a field associated with the new enum value, we should clear it (the client doesn't know about the new enum, but we do). I think that's consistent with the logic described here, but we should discuss with api reviewers.

One question: Should (can?) we ensure all slices of a pool are of the same slice type? If not, old schedulers would have to ignore the whole pool, at least for "all", if ANY slice is of an unknown type...

It would seem to me that adding an enum value should require a bump in the API version.

Once we have reached v1, we cannot bump it any further?

I'm still not exactly clear on how to implement validation, but there must have been precedences, so I guess it is okay.

A discriminator is required when you want the absence of config to imply a default, so that version skew isn't perceived as the default case.

For cases that are truly one-of (exactly one field, empty lists don't count), then a discriminator is not needed, because any actor can perceive the lack of a value, and choose not to act. A discriminator can make for better error messages, but that's about all.

Whether we extend Device to encompass partitions (assuming we can do that safely) or we add PartitionableDevice here, it shouldn't matter -- both are viable. Likewise, whether we want devices/gizmos/widgets/partitionables to co-exist in a slice or to be mutually exclusive, I think both are viable. There may be other things to consider to make those decisions, but I don't think this aspect matters.

johnbelamaric · 2024-06-18T18:26:24Z

Ok, I simplifed it and now it's basically just a discriminator with a single valid value in 1.31, but more will come.

dra-evolution/pkg/api/capacity_types.go

Based on kubernetes-sigs/wg-device-management#30. Instead of a discriminator (kubernetes-sigs/wg-device-management#36 (comment)), a versioning scheme gets added.

k8s-triage-robot · 2024-09-26T20:39:23Z

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

Mark this PR as fresh with /remove-lifecycle stale
Close this PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot · 2024-10-26T21:22:49Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

Mark this PR as fresh with /remove-lifecycle rotten
Close this PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

Update API for 1.31 without partitionable model

fbc7e07

k8s-ci-robot requested review from klueska and pohly June 18, 2024 16:28

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. approved Indicates a PR has been approved by an approver from all required OWNERS files. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jun 18, 2024

johnbelamaric commented Jun 18, 2024

View reviewed changes

dra-evolution/pkg/api/capacity_types.go Outdated Show resolved Hide resolved

klueska reviewed Jun 18, 2024

View reviewed changes

dra-evolution/pkg/api/capacity_types.go Show resolved Hide resolved

klueska reviewed Jun 18, 2024

View reviewed changes

dra-evolution/pkg/api/capacity_types.go Outdated Show resolved Hide resolved

Replace all common things with a single discriminator

3a7aa4f

johnbelamaric commented Jun 18, 2024

View reviewed changes

klueska reviewed Jun 18, 2024

View reviewed changes

dra-evolution/pkg/api/capacity_types.go Outdated Show resolved Hide resolved

klueska reviewed Jun 18, 2024

View reviewed changes

dra-evolution/pkg/api/capacity_types.go Outdated Show resolved Hide resolved

Remove left over comments

6187d27

pohly mentioned this pull request Jul 8, 2024

DRA for 1.31 kubernetes/kubernetes#125488

Merged

9 tasks

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 26, 2024

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Oct 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update API for 1.31 without partitionable model #36

Update API for 1.31 without partitionable model #36

johnbelamaric commented Jun 18, 2024

k8s-ci-robot commented Jun 18, 2024

klueska Jun 18, 2024

johnbelamaric Jun 18, 2024

johnbelamaric Jun 18, 2024

klueska Jun 18, 2024

johnbelamaric Jun 18, 2024

johnbelamaric Jun 18, 2024

klueska Jun 19, 2024

pohly Jun 19, 2024

pohly Jun 19, 2024

johnbelamaric Jun 20, 2024 •

edited

Loading

pohly Jun 20, 2024

johnbelamaric Jun 20, 2024

johnbelamaric Jun 20, 2024

pohly Jun 20, 2024

thockin Jun 28, 2024 •

edited

Loading

johnbelamaric commented Jun 18, 2024

k8s-triage-robot commented Sep 26, 2024

k8s-triage-robot commented Oct 26, 2024

Update API for 1.31 without partitionable model #36

Are you sure you want to change the base?

Update API for 1.31 without partitionable model #36

Conversation

johnbelamaric commented Jun 18, 2024

k8s-ci-robot commented Jun 18, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

johnbelamaric Jun 20, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

thockin Jun 28, 2024 • edited Loading

Choose a reason for hiding this comment

johnbelamaric commented Jun 18, 2024

k8s-triage-robot commented Sep 26, 2024

k8s-triage-robot commented Oct 26, 2024

johnbelamaric Jun 20, 2024 •

edited

Loading

thockin Jun 28, 2024 •

edited

Loading