Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Use intel-gpu-plugin with intel-gpu-fakedev generated devices #1118

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: intel-gpu-plugin
spec:
template:
spec:
containers:
- name: intel-gpu-nfd
# convert generated sysfs content to NFD feature labels file
image: intel/intel-gpu-initcontainer:devel
imagePullPolicy: IfNotPresent
securityContext:
readOnlyRootFilesystem: true
allowPrivilegeEscalation: false
capabilities:
drop: [ "ALL" ]
volumeMounts:
- name: nfd-features
mountPath: /nfd
readOnly: false
workingDir: /usr/local/bin/gpu-sw
# needed until GPU plugin drops NFD hook usage due to:
# https://github.com/kubernetes-sigs/node-feature-discovery/issues/856
command: ["sh", "-c", "while true; do ./intel-gpu-nfdhook | tee /nfd/fake-gpu; sleep 99999; done"]
volumes:
- name: nfd-features
hostPath:
path: /etc/kubernetes/node-feature-discovery/features.d/
type: DirectoryOrCreate
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: intel-gpu-plugin
spec:
template:
spec:
initContainers:
- name: intel-gpu-initcontainer
$patch: delete
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: intel-gpu-plugin
spec:
template:
spec:
initContainers:
- name: fakedev-generator
# container runtime prevents writing to /sys & /dev,
# so volumes need to be mounted elsewhere
volumeMounts:
- name: devfs
mountPath: /tmp/fakedev/dev
readOnly: false
- name: sysfs
mountPath: /tmp/fakedev/sys
readOnly: false
# files are generated under CWD
workingDir: /tmp/fakedev
containers:
- name: intel-gpu-nfd
# expects sysfs here
volumeMounts:
- name: sysfs
mountPath: /host-sys
readOnly: true
- name: intel-gpu-plugin
args: [
"-prefix=/tmp/fakedev",
"-shared-dev-num=2",
"-enable-monitoring",
"-resource-manager"
]
# devfs host & container paths must match for everything to work
volumeMounts:
- name: devfs
mountPath: /tmp/fakedev/dev
readOnly: true
- name: sysfs
mountPath: /tmp/fakedev/sys
readOnly: true
volumes:
- name: devfs
hostPath:
path: /tmp/fakedev/dev
type: DirectoryOrCreate
- name: sysfs
emptyDir: {}
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
{
"Info": "8x 4 GiB DG1 [Iris Xe MAX Graphics] GPUs",
"DevCount": 8,
"DevMemSize": 4294967296,
"Capabilities": {
"platform": "fake_DG1"
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: intel-gpu-plugin
spec:
template:
spec:
volumes:
- name: fake-conf
configMap:
name: fakedev-config
initContainers:
- name: fakedev-generator
image: intel/intel-gpu-fakedev:devel
securityContext:
runAsUser: 0
readOnlyRootFilesystem: false
allowPrivilegeEscalation: false
volumeMounts:
- name: fake-conf
mountPath: /config
readOnly: true
# generate fake sysfs / devfs files for GPU plugin based on config
command: ["/generator", "-json", "/config/fakedev.json", "-verbose"]
15 changes: 15 additions & 0 deletions deployments/gpu_plugin/overlays/fake_devices/kustomization.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
kind: Kustomization
kind: Kustomization
nameSuffix: -fake

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mythi I'm not really sure about adding -fake suffix to all objects: service, serviceAccount, clusterRole, clusterRoleBinding, configMap, daemonSet.

I think it would be rare to run both fake and real GPU plugin at the same time in the same cluster, but even if you would:

  • Only daemonSet would need a different name
  • Generator configMap is unique to fake device plugin deployment, so it does not need a new name
  • Rest are fine to be shared between real and fake GPU plugins

What if it would use a different namespace instead?

E.g. if one would want operator to support also fake plugin, wouldn't different namespace be easier for that, than every object having a different name?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the suggestion since the "standalone" daemonset was named with that -fake. I'm OK not to use this suggestion if you think it make more sense as it is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to change just the deployment name with kustomize?

Even if one would use separate namespace for the whole thing, it may still make sense to have -fake suffix for the deployment name, just to make sure nobody confuses it with the real thing e.g. in kubectl get pods -A output.

In my own fake GPU plugin tests, I've used "validation" namespace for it, but something like "fake-validation" would make it clearer that it's not the real thing...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on that namespace topic, one idea could be to add namespace: somefake into kustomization.yaml

bases:
- ../fractional_resources
configMapGenerator:
- name: fakedev-config
files:
- fakedev-config.json
patches:
- fake-device-volumes.yaml
- generate-fake-devices.yaml
# NFD feature file changes is obsolete after GPU plugin moves away from NFD hooks
# https://github.com/kubernetes-sigs/node-feature-discovery/issues/856
- del-intel-gpu-initcontainer.yaml
- add-nfd-feature-file.yaml
Loading