-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support device mapping in kube play similar do Kubernetes device plugins #17833
Comments
@umohnani8 PTAL |
A friendly reminder that this issue had no activity for 30 days. |
just want to note that this support would be welcomed, I ran into this issue as noted here: #18266 |
Hello, I would like to voice support in favor of this feature as well. (damn, I wish I knew enough about Go and podman's codebase to implement this 😢) |
@ygalblum @umohnani8 Anyone else PTAL |
I'm trying to understand this issue to see if I would be up to the task and I have a few questions and I want to also make sure I understand what is going on here. Basically we want to make One point of confusion I have is that in the original comment it says:
Is this accurate or does it actually depend on the device itself and whether or not elevated permissions are required to interact with it. When tracing through the code I wasn't really seeing that the |
--device works much differently in rootless and rootful containers. In rootful containers, it actually creates the device within the containers mountspace with the correct major/minor number and labels the new device with the correct SELinux label. This requires CAP_SYS_MKNOD so it only available to rootful containers. In rootless containers we just bind mount the device from the host into the container, which is all we can. In some cases the device has an SELinux label, that would prevent it from being used in the container, so you need to disable SELinux use of the device. So in rootless mode there is little different from In certain cases the UID/GID access to the device is not available to processes inside of the container. For example if you have group access based on being in the |
Okay I see. Here is how I understand this issue so far and I'm still tracing through the code so let me know if I am missing something as I explain my thoughts. It seems to me that for this change I wouldn't actually need to worry too much about how the devices are setup by the underlying runtimes and instead the task mostly just consists of:
As long as the information correctly makes it into the SpecGen I think all of the privileged/nonprivileged dev mapping magic is all taken care of for me already My next question would be since the K8s device plugins work by registering a name to kubelet and there is the whole notion of kubelet + the rpc plugin provider that podman does not have:
|
I started to try to implement this but I do think I need some clarification on my last question. Without that information I feel that it will be hard to modify the ResourceLimits in a way that makes sense. In order to hit the
part of this issue the resource limit type, specifically the resource name part, will need to carry extra meaning so that something like
gives the podman runtime enough information when generating the container spec to resolve the name to devices on the system. My thought would be to allow the podman user to define a device map of some sort in the system podman configuration so that Also from what I understand podman's configuration files are parsed with code from other repositories (https://github.com/containers/podman/blob/9c954739e9555c0940238f71ba3cc205deaa0e5e/docs/tutorials/rootless_tutorial.md?plain=1#L139C58-L139C94) so that to me means that in order to add the extra device mapping information to podman the configuration file structure would have to be modified in those other repos first Maybe I'm way off here but thats why I'm seeking clarification :) |
@bblenard I think the CDI configuration would contain the necessary info to map the name to the device. So for example, if we use CDI support from Nvidia
What's not yet entirely clear to me is how the CDI devices are reflected in the If we say that we allow to directly specify the unique name, as in the example above, we can map it 1:1 to the |
@bachp -- I need to familiarize myself with the things you referenced, just letting you know that I saw your message ( finally ;) ) |
Okay so update with what I currently understand about how these things relate. As @bachp pointed out the CDI seemingly contains all the information the runtime would need to take a CDI kind(?) and map it to a device path on the system. I also think I see what @bachp is saying with it not quite fitting into the limits section. I could be wrong but it looks like podman's type for PodSpec.Containers[*].Resources has diverged from Kubernetes' Container.Resources type. Kubernetes' ResourceClaim has an addition field that seems to contain the "name" described in the KEP-4009 referenced above. I've tried to confirm my understanding by tracing how I believe Kubernetes handles the So currently I would potentially purpose re-syncing Podman's I'm not sure if additional support needs to be added so podman can handle CDI things (parsing the config under phew... with all that being said I'd love some input on all that |
@bblenard Podman already supports CDI device. You can pass a CDI device name via the It also works in podman-compose via the
|
I'm gonna at @rhatdan b/c he is one of the Containers folks that had some activity on this issue previously. Before I dig in and start on this issue properly I just want to make sure someone official is able to add their 2 cents. Now that we've sorted out the podman support side and the major question (as I see it) is just how implement this into the Podman Resource Spec. Assuming I understand everything correctly I figure we can either:
|
This SGTM. |
Okay I have a way forward I think. I'll give this feature my best shot :) |
Update: Still working on this just have been busy lately :) I have code that I think works in a way that makes sense, I want to clean it up a bit though. Once its in a halfway decent state I'll probably push it up to my fork of podman. |
@bblenard If you like to push something early I would love to review and give some feedback. |
@bachp -- I added some code here: https://github.com/bblenard/podman/tree/issues-17833-draft The bulk of the code I wrote is here: My idea was to make the My "work around" to this was to add an additional argument to the Let me know what you think. I was finding it hard to keep things straight in my mind while reviewing PS: One big thing I have to do still is write tests but I have to figure that out still |
Any planned movement on this? I recently started down the path of setting up a JellyFin server with HW acceleration (via nvidia and CDI) and prefer the descriptiveness of the kube files for my deployments. Interesting that |
@bblenard i just took a look at your code and rebased it onto main feel free to just take the commits from here: https://github.com/robertgzr/podman/tree/issues-17833-draft--main if i understand correctly, you implemented support for DRA as documented here: https://kubernetes.io/docs/concepts/scheduling-eviction/dynamic-resource-allocation/, which looks like this when expressed in YAML: apiVersion: simpledevice.resource.podman.io/v1
kind: ResourceClaimParameters
metadata:
name: kmsg-parameters
spec:
hostpath: /dev/kmsg
---
apiVersion: resource.k8s.io/v1alpha2
kind: ResourceClaimTemplate
metadata:
name: kmsg-template
spec:
resourceClassName: PodmanResourceClass
parametersRef: kmsg-parameters
---
apiVersion: v1
kind: Pod
[...]
resources:
claims:
- name: kmsg
resourceClaims:
- name: kmsg
resourceClaimTemplateName: kmsg-template That is quite verbose compared to what I imagined (using a podman-specific CDI string): resources:
limits:
io.podman.device/kmsg: 1 the bits that support the configuration as shared by @bachp #17833 (comment) are also still missing, correct? |
As a workaround for this and future similar cases, would it be acceptable to have a podman specific annotation that would allow to specify argument directly to podman? Basically, an annotation that could be set and allows to set parameters like This would provide an escape hatch until features are properly implemented in the yaml. |
Discussed in #14934
Originally posted by bachp July 14, 2022
Kubernetes provides ways to map devices into pods and containers.
One way is to map devices into containers via volumes but this requires privileged containers.
The more flexible way to add devices tvia Kubernetes Device Plugins
It abstracts the provisioning and mapping of devices to containers. So when specifying a Pod it's only needed to add an abstract name of the hardware, e.g.
hardware-vendor.example/foo
and the amount of it. The device plugin will then take care of making this available to containers. In YAML this will look like:For
podman kube play
it would be useful to also support thisresources.limits
to map devices. As this would allow to deploy the same pod spec on both Kubernetes and podman and it would be transparent to the user.However I don't think it makes sense to implement the full device plugin interface as it assumes a daemon (kubelet) running.
So for podman I think a better option is to implement a kind of hook mechanism that would get
resource.limits
and outputs the required--devices
parameters, that would be applied before the pod is run.The text was updated successfully, but these errors were encountered: