-
Notifications
You must be signed in to change notification settings - Fork 614
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Docs] Add catalog of labels to README #928
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Chip Zoller <[email protected]>
Signed-off-by: Chip Zoller <[email protected]>
Signed-off-by: Chip Zoller <[email protected]>
Signed-off-by: Chip Zoller <[email protected]>
Signed-off-by: chipzoller <[email protected]>
Signed-off-by: chipzoller <[email protected]>
Signed-off-by: chipzoller <[email protected]>
Signed-off-by: chipzoller <[email protected]>
Signed-off-by: chipzoller <[email protected]>
Signed-off-by: chipzoller <[email protected]>
Signed-off-by: chipzoller <[email protected]>
Signed-off-by: chipzoller <[email protected]>
Signed-off-by: chipzoller <[email protected]>
Signed-off-by: chipzoller <[email protected]>
Signed-off-by: chipzoller <[email protected]>
Signed-off-by: chipzoller <[email protected]>
Signed-off-by: chipzoller <[email protected]>
Signed-off-by: chipzoller <[email protected]>
Signed-off-by: chipzoller <[email protected]>
Could I get a review from you please, @mikemckiernan? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you again for the assist! PLMK what I can clarify.
@@ -148,7 +160,7 @@ With the daemonset deployed, NVIDIA GPUs can now be requested by a container | |||
using the `nvidia.com/gpu` resource type: | |||
|
|||
```yaml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically shell
, I think.
@@ -570,34 +586,56 @@ total memory and compute resources of the GPU. | |||
**Note**: As of now, the only supported resource available for MPS are `nvidia.com/gpu` | |||
resources and only with full GPUs. | |||
|
|||
## Catalog of Labels | |||
|
|||
The NVIDIA device plugin reads and writes a number of different labels which it uses as either |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not the grammar nerd that I should be, but I think s/which/that/ in this case. (It's a lazy guideline, but "that" is almost never wrong.)
configuration elements or informational elements. The below table documents and describes each | ||
along with their use. See the related table [here](/docs/gpu-feature-discovery/README.md#generated-labels) for GFD labels. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pls: "The following table identifies and describes each label."
- "following" is preferred instead of "below." (I can't recall when I learned that or why, it just is.)
- Restate the noun, "label," for clarity.
Please move (and tweak) the last sentence after the table. I'd prefer that readers read this table and then be presented with the option to read the GFD labels rather than have the teensy cognitive load to decide whether to read the GFD labels before reading the table.
Maybe: "GPU Feature Discovery also adds labels. Refer to generated labels in the GFD README for more information."
> [!NOTE] | ||
> Label values in Kubernetes are always of type string. The table's value type describes the type within string formatting. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My pref is to avoid the NOTE to reduce visual busyness. After scrutinizing the message a bit, I think we can remove the this para and the Value Type column (to sidestep explaining a concept that is rarely an obstacle) and indicate in the description something like "Values are true
or false
."
> [!NOTE] | ||
> Label values in Kubernetes are always of type string. The table's value type describes the type within string formatting. | ||
|
||
| Label Name | Value Type | Meaning | Example | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls s/Meaning/Description/
| ----------------------------------| ---------- |----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -------------- | | ||
| nvidia.com/device-plugin.config | String | The name of the configuration to apply to the node. Not automatically written by the device plugin; must be manually assigned by the user with the specified config. See [here](#updating-per-node-configuration-with-a-node-label) for details. | my-mps-config | | ||
| nvidia.com/gpu.sharing-strategy | String | The sharing strategy in use. Will be set to `none` if not sharing a GPU. Additional values are `mps` and `time-slicing`. | time-slicing | | ||
| nvidia.com/mig.capable | Boolean | If a device is currently in MIG mode. | false | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is true. Need SME verification.
| nvidia.com/mig.capable | Boolean | If a device is currently in MIG mode. | false | | |
| nvidia.com/mig.capable | Boolean | Specifies if any device on the node supports MIG. | false | |
| nvidia.com/device-plugin.config | String | The name of the configuration to apply to the node. Not automatically written by the device plugin; must be manually assigned by the user with the specified config. See [here](#updating-per-node-configuration-with-a-node-label) for details. | my-mps-config | | ||
| nvidia.com/gpu.sharing-strategy | String | The sharing strategy in use. Will be set to `none` if not sharing a GPU. Additional values are `mps` and `time-slicing`. | time-slicing | | ||
| nvidia.com/mig.capable | Boolean | If a device is currently in MIG mode. | false | | ||
| nvidia.com/mps.capable | Boolean | If a device is currently in MPS mode. | false | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto above, need SME review.
| nvidia.com/mps.capable | Boolean | If a device is currently in MPS mode. | false | | |
| nvidia.com/mps.capable | Boolean | Specifies if devices on the node are configured for MPS. | false | |
| nvidia.com/gpu.sharing-strategy | String | The sharing strategy in use. Will be set to `none` if not sharing a GPU. Additional values are `mps` and `time-slicing`. | time-slicing | | ||
| nvidia.com/mig.capable | Boolean | If a device is currently in MIG mode. | false | | ||
| nvidia.com/mps.capable | Boolean | If a device is currently in MPS mode. | false | | ||
| nvidia.com/vgpu.present | Boolean | If vGPU is in use. | false | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| nvidia.com/vgpu.present | Boolean | If vGPU is in use. | false | | |
| nvidia.com/vgpu.present | Boolean | Specifies if devices on the node use vGPU. | false | |
| nvidia.com/mig.capable | Boolean | If a device is currently in MIG mode. | false | | ||
| nvidia.com/mps.capable | Boolean | If a device is currently in MPS mode. | false | | ||
| nvidia.com/vgpu.present | Boolean | If vGPU is in use. | false | | ||
| nvidia.com/vgpu.host-driver-version | String | Version of the vGPU host driver in use on the underlying hypervisor. | 10.11.12 | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| nvidia.com/vgpu.host-driver-version | String | Version of the vGPU host driver in use on the underlying hypervisor. | 10.11.12 | | |
| nvidia.com/vgpu.host-driver-version | String | Specifies the vGPU host driver version on the underlying hypervisor. | 550.54.16 | |
| nvidia.com/mps.capable | Boolean | If a device is currently in MPS mode. | false | | ||
| nvidia.com/vgpu.present | Boolean | If vGPU is in use. | false | | ||
| nvidia.com/vgpu.host-driver-version | String | Version of the vGPU host driver in use on the underlying hypervisor. | 10.11.12 | | ||
| nvidia.com/vgpu.host-driver-branch | String | Branch of the vGPU host driver in use on the underlying hypervisor. | main | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| nvidia.com/vgpu.host-driver-branch | String | Branch of the vGPU host driver in use on the underlying hypervisor. | main | | |
| nvidia.com/vgpu.host-driver-branch | String | Specifies the vGPU host driver branch on the underlying hypervisor. | r550_40 | |
Thank you for your review, @mikemckiernan. Could we get technical review here first so we ensure the content is accurate before we do any word smithing? |
Requesting technical review, please. |
@chipzoller Could you also document if it is possible to not deploy GFD, and the minimum set of labels you would have to apply manually (for example using Karpenter)? |
The docs here already cover opting in to deploying GFD. As for the other part, not sure I follow. |
@chipzoller I wonder if it would be possible to not deploy GFD - but that requires knowing what labels you are required to set manually. |
GFD is an optional component hence why it is an opt-in setting. There are no "required" labels here unless you have a specific use case or other piece of software which depends on one or more labels set by GFD. That seems like it's out of scope for what I'm trying to accomplish in this PR. |
@chipzoller OK, thanks, then we probably misunderstood something while trying to upgrade to 0.15 (which failed), we'll go back to debugging some more, sorry for the confusion! |
Signed-off-by: chipzoller [email protected]
Closes #846
This PR adds a new "catalog of labels" section to the device plugin's README and updates the same table for GFD. It also makes a few minor improvements: