You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hey,
Going through live system configuration I have noticed, that network-operator-node-feature-discovery-worker-conf contains incorrect device class whitelist:
According to PCI-SIG specifications, base class 03 is Display controller, 00 subclass of 03 class is VGA-compatible controller, and 02 subclass of 03 class is 3D controller . So provided configuration with operator translates to:
With such filters it seems like network-operator-node-feature-discovery is configured to gather GPU data (that should be done with f.e. https://github.com/NVIDIA/gpu-feature-discovery, which have similar configuration issue I will link here once it's created). In my opinion, deviceClassWhitelist should contain entries only from 02 classes (Network).
Hi @fprzewozny we use NFD (Node Feature Discovery) NodeFeature API[1] and deploy a NodeFeatureRule[2][3] obj that will trigger NFD to label the node with expected labels required for network-operator (feature.node.kubernetes.io/pci-15b3.present).
we keep GPUs in deviceClassWhitelist expose the default GPU related labels by NFD. thats needed when using NVIDIA GPU Operator.
reason being we expect only one instance of NFD deployed in the cluster.
Hey,
Going through live system configuration I have noticed, that
network-operator-node-feature-discovery-worker-conf
contains incorrect device class whitelist:According to PCI-SIG specifications, base class
03
isDisplay controller
,00
subclass of03
class isVGA-compatible controller
, and02
subclass of03
class is3D controller
. So provided configuration with operator translates to:With such filters it seems like
network-operator-node-feature-discovery
is configured to gather GPU data (that should be done with f.e. https://github.com/NVIDIA/gpu-feature-discovery, which have similar configuration issue I will link here once it's created). In my opinion,deviceClassWhitelist
should contain entries only from02
classes (Network).In code repo it can be found here:
network-operator/hack/templates/values/values.template
Line 49 in 17d04f5
network-operator/deployment/network-operator/values.yaml
Line 49 in 17d04f5
In my opinion,
deviceClassWhitelist
for network-operator should contain only0200
, and0207
entries.Thank you,
Franciszek
The text was updated successfully, but these errors were encountered: