Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

still need help install sriov-network-operator #672

Open
hymgg opened this issue Apr 2, 2024 · 57 comments
Open

still need help install sriov-network-operator #672

hymgg opened this issue Apr 2, 2024 · 57 comments

Comments

@hymgg
Copy link

hymgg commented Apr 2, 2024

Continuing from issue #584,

@adrianchiris Sorry for the late followup.

Install using helm was much easier than following the quick start steps. However, it only brought up the sriov-network-operator pod, according to quick start guide, there should be a sriov-network-config-daemon too?

`$ ls
Chart.yaml crds README.md templates values.yaml

$ helm3 install -n sriov-network-operator --create-namespace --wait sriov-network-operator ./

$ kubectl get all -n sriov-network-operator
NAME READY STATUS RESTARTS AGE
pod/sriov-network-operator-845dc5dffc-4hvsb 1/1 Running 0 20m

NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/sriov-network-operator 1/1 1 1 20m

NAME DESIRED CURRENT READY AGE
replicaset.apps/sriov-network-operator-845dc5dffc 1 1 1 20m

$ kubectl logs deployment.apps/sriov-network-operator -n sriov-network-operator|tail -5
2024-03-29T05:02:53.668128868Z INFO controller/controller.go:119 default SriovOperatorConfig object not found, cannot reconcile SriovNetworkNodePolicies. Requeue. {"controller": "sriovnetworknodepolicy", "controllerGroup": "sriovnetwork.openshift.io", "controllerKind": "SriovNetworkNodePolicy", "SriovNetworkNodePolicy": {"name":"node-policy-sync-event"}, "namespace": "", "name": "node-policy-sync-event", "reconcileID": "ed902977-3a07-4cea-bb20-0cefbff5ea9e"}
2024-03-29T05:02:58.668612364Z INFO controller/controller.go:119 Reconciling {"controller": "sriovnetworknodepolicy", "controllerGroup": "sriovnetwork.openshift.io", "controllerKind": "SriovNetworkNodePolicy", "SriovNetworkNodePolicy": {"name":"node-policy-sync-event"}, "namespace": "", "name": "node-policy-sync-event", "reconcileID": "98591413-4718-4d3c-abaf-14d3dcf1c43c"}
2024-03-29T05:02:58.668676704Z INFO controller/controller.go:119 default SriovOperatorConfig object not found, cannot reconcile SriovNetworkNodePolicies. Requeue. {"controller": "sriovnetworknodepolicy", "controllerGroup": "sriovnetwork.openshift.io", "controllerKind": "SriovNetworkNodePolicy", "SriovNetworkNodePolicy": {"name":"node-policy-sync-event"}, "namespace": "", "name": "node-policy-sync-event", "reconcileID": "98591413-4718-4d3c-abaf-14d3dcf1c43c"}
2024-03-29T05:03:03.669236989Z INFO controller/controller.go:119 Reconciling {"controller": "sriovnetworknodepolicy", "controllerGroup": "sriovnetwork.openshift.io", "controllerKind": "SriovNetworkNodePolicy", "SriovNetworkNodePolicy": {"name":"node-policy-sync-event"}, "namespace": "", "name": "node-policy-sync-event", "reconcileID": "2a0835ad-a117-4caa-8ace-9afc525b6d70"}
2024-03-29T05:03:03.669309844Z INFO controller/controller.go:119 default SriovOperatorConfig object not found, cannot reconcile SriovNetworkNodePolicies. Requeue. {"controller": "sriovnetworknodepolicy", "controllerGroup": "sriovnetwork.openshift.io", "controllerKind": "SriovNetworkNodePolicy", "SriovNetworkNodePolicy": {"name":"node-policy-sync-event"}, "namespace": "", "name": "node-policy-sync-event", "reconcileID": "2a0835ad-a117-4caa-8ace-9afc525b6d70"}

Additional info, may not be relevant.

$ kubectl label ns sriov-network-operator pod-security.kubernetes.io/enforce=privileged
$ kubectl get node -l node-role.kubernetes.io/worker
NAME STATUS ROLES AGE VERSION
mtx-dell4-bld01.dc1.matrixxsw.com Ready worker 264d v1.26.6
mtx-dell4-bld02.dc1.matrixxsw.com Ready worker 264d v1.26.6
mtx-dell4-bld03.dc1.matrixxsw.com Ready worker 264d v1.26.6
`

Shall we / how do we get sriov-network-config-daemon installed?
Thanks. -Jessica

Originally posted by @hymgg in #584 (comment)

@hymgg
Copy link
Author

hymgg commented Apr 8, 2024

Hello,
Can somebody help complete the sriov-network-operator installation?
Is there another way?

@rollandf
Copy link
Contributor

rollandf commented Apr 9, 2024

Hey, seems that you don't have the required SriovOperatorConfig named default.

It can be created with helm using the following parameters:
https://github.com/k8snetworkplumbingwg/sriov-network-operator/tree/master/deployment/sriov-network-operator#sr-iov-operator-configuration-parameters

@hymgg
Copy link
Author

hymgg commented Apr 9, 2024

@rollandf Thank you. set sriovOperatorConfig.deploy to true in default values.yaml, ran helm upgrade, the config daemon is up.

Compared to the example in quick-start, we're still missing the service obj, is that expected? shall we create the svc manually?

`$ kubectl --context dell4 get all -n sriov-network-operator
NAME READY STATUS RESTARTS AGE
pod/sriov-network-config-daemon-sxf4b 1/1 Running 0 25s
pod/sriov-network-config-daemon-vzzg2 1/1 Running 0 25s
pod/sriov-network-config-daemon-xn9rq 1/1 Running 0 25s
pod/sriov-network-operator-845dc5dffc-4hvsb 1/1 Running 0 11d

NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
daemonset.apps/sriov-network-config-daemon 3 3 3 3 3 kubernetes.io/os=linux,node-role.kubernetes.io/worker= 25s

NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/sriov-network-operator 1/1 1 1 11d

NAME DESIRED CURRENT READY AGE
replicaset.apps/sriov-network-operator-845dc5dffc 1 1 1 11d
`

@rollandf
Copy link
Contributor

I don't think that the service is needed. Seems an issue in doc actually.

@hymgg
Copy link
Author

hymgg commented Apr 11, 2024

@rollandf Thank you.

Next, with initial sriovnetworknodestates.sriovnetwork.openshift.io as:
spec:
dpConfigVersion: 2ea02bc305b6b7849ae7535c713eeb8e
status:
interfaces:

  • deviceID: 158a
    driver: i40e
    linkSpeed: 25000 Mb/s
    linkType: ETH
    mac: "12:21:04:20:01:02"
    mtu: 1500
    name: p1p1
    pciAddress: 0000:3b:00.0
    vendor: "8086"
  • deviceID: 158a
    driver: i40e
    linkSpeed: 25000 Mb/s
    linkType: ETH
    mac: "12:21:04:20:01:03"
    mtu: 1500
    name: ens1f1
    pciAddress: 0000:3b:00.1
    vendor: "8086"

I created a SriovNetworkNodePolicy,

apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
name: policy-ens1f1
namespace: sriov-network-operator
spec:
nodeSelector:
node-role.kubernetes.io/worker:
#feature.node.kubernetes.io/network-sriov.capable: "true"
resourceName: ens1f1
priority: 99
#mtu: 9000
numVfs: 8
nicSelector:
deviceID: "158a"
rootDevices:
- 0000:3b:00.1
vendor: "8086"
deviceType: netdevice

It triggered creation of sriov-device-plugin, but the operator pod went into CrashLoopBackOff state, logs reported "panic: runtime error: invalid memory address or nil pointer dereference" and "[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x1a7004d]"

How to fix this?

[mtx@mtx-dell4-bld08 sriov-network-operator]$ kubectl get all -n sriov-network-operator
NAME READY STATUS RESTARTS AGE
pod/sriov-device-plugin-mqr84 1/1 Running 0 13m
pod/sriov-device-plugin-rc5jh 1/1 Running 0 13m
pod/sriov-device-plugin-zl5m6 1/1 Running 0 13m
pod/sriov-network-config-daemon-sxf4b 1/1 Running 0 27h
pod/sriov-network-config-daemon-vzzg2 1/1 Running 0 27h
pod/sriov-network-config-daemon-xn9rq 1/1 Running 0 27h
pod/sriov-network-operator-845dc5dffc-4hvsb 0/1 CrashLoopBackOff 8 (3m37s ago) 12d

NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
daemonset.apps/sriov-device-plugin 3 3 3 3 3 kubernetes.io/os=linux,node-role.kubernetes.io/worker= 13m
daemonset.apps/sriov-network-config-daemon 3 3 3 3 3 kubernetes.io/os=linux,node-role.kubernetes.io/worker= 27h

NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/sriov-network-operator 0/1 1 0 12d

NAME DESIRED CURRENT READY AGE
replicaset.apps/sriov-network-operator-845dc5dffc 1 1 0 12d

New state of sriovnetworknodestates.sriovnetwork.openshift.io:

spec:
interfaces:

  • name: ens1f1
    numVfs: 8
    pciAddress: 0000:3b:00.1
    vfGroups:
    • deviceType: netdevice
      policyName: policy-ens1f1
      resourceName: ens1f1
      vfRange: 0-7
      status:
      interfaces:
  • deviceID: 158a
    driver: i40e
    linkSpeed: 25000 Mb/s
    linkType: ETH
    mac: "12:21:04:20:01:02"
    mtu: 1500
    name: p1p1
    pciAddress: 0000:3b:00.0
    vendor: "8086"
  • deviceID: 158a
    driver: i40e
    linkSpeed: 25000 Mb/s
    linkType: ETH
    mac: "12:21:04:20:01:03"
    mtu: 1500
    name: ens1f1
    pciAddress: 0000:3b:00.1
    vendor: "8086"
    lastSyncError: cannot configure sriov interfaces
    syncStatus: Failed

sriov-network-operator-845dc5dffc-4hvsb.log

Thanks. -Jessica

@SchSeba
Copy link
Collaborator

SchSeba commented Apr 11, 2024

Hi @hymgg there is a bug you can see the PR #679

in general please check your nodeSelector in the sriovNetworkNodePolicy can you do

kubectl -n sriov-network-operator get sriovnetworknodepolicy -oyaml

I think your node selector is not right and it gets empty selector that triggers the bug

@hymgg
Copy link
Author

hymgg commented Apr 11, 2024

@SchSeba The SriovNetworkNodePolicy spec was pasted in my last comment.
original nodeSelector from quickstart example was feature.node.kubernetes.io/network-sriov.capable: "true", but I didn't find any node with that label, so changed it to node-role.kubernetes.io/worker:, then daemonset.apps/sriov-device-plugin found 3 nodes and started 3 pods. Shall I instead keep the original nodeSelector and label the nodes accordingly?

Thanks. -Jessica
policy-ens1f1.yaml.txt

@SchSeba
Copy link
Collaborator

SchSeba commented Apr 11, 2024

The yaml you shared is from a local file I want you to show me the one that is in the k8s api server.

please run kubectl -n sriov-network-operator get sriovnetworknodepolicy -oyaml and show me the output

@hymgg
Copy link
Author

hymgg commented Apr 11, 2024

sriovnetworknodepolicy.yaml.txt

nodeSelector is empty in attached output

@SchSeba
Copy link
Collaborator

SchSeba commented Apr 11, 2024

yep that was my expectation

I think the label you wanted is something like:

nodeSelector:
  node-role.kubernetes.io/worker: ""

@hymgg
Copy link
Author

hymgg commented Apr 11, 2024

@SchSeba Thank you. corrected nodeSelector in sriovnetworknodepolicy, the operator pod is back in Running state, but the sriovnetworknodestates still "cannot configure sriov interfaces", no VFs. Anything to check on hardware side?

`apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodeState
metadata:
annotations:
sriovnetwork.openshift.io/current-state: Idle
sriovnetwork.openshift.io/desired-state: Idle
creationTimestamp: "2024-04-09T22:46:21Z"
generation: 7
name: mtx-dell4-bld01.dc1.matrixxsw.com
namespace: sriov-network-operator
ownerReferences:

  • apiVersion: sriovnetwork.openshift.io/v1
    blockOwnerDeletion: true
    controller: true
    kind: SriovOperatorConfig
    name: default
    uid: ea87a10e-5906-4d3b-a8b0-f67783bc36b6
    resourceVersion: "172487551"
    uid: 52ebc9f9-d110-4915-ba9c-65b53b79c4b0
    spec:
    interfaces:
  • name: ens1f1
    numVfs: 8
    pciAddress: 0000:3b:00.1
    vfGroups:
    • deviceType: netdevice
      policyName: policy-ens1f1
      resourceName: ens1f1
      vfRange: 0-7
      status:
      interfaces:
  • deviceID: 158a
    driver: i40e
    linkSpeed: 25000 Mb/s
    linkType: ETH
    mac: "12:21:04:20:01:02"
    mtu: 1500
    name: p1p1
    pciAddress: 0000:3b:00.0
    vendor: "8086"
  • deviceID: 158a
    driver: i40e
    linkSpeed: 25000 Mb/s
    linkType: ETH
    mac: "12:21:04:20:01:03"
    mtu: 1500
    name: ens1f1
    pciAddress: 0000:3b:00.1
    vendor: "8086"
    lastSyncError: cannot configure sriov interfaces
    syncStatus: Failed
    `

One of the sriov-device-plugin pod log (the other 2 are similar):

I0411 19:59:15.007473 1 manager.go:57] Using Kubelet Plugin Registry Mode
I0411 19:59:15.007514 1 main.go:46] resource manager reading configs
I0411 19:59:15.007539 1 manager.go:86] raw ResourceList: {"resourceList":[{"resourceName":"ens1f1","selectors":{"vendors":["8086"],"devices":["154c"],"rootDevices":["0000:3b:00.1"],"IsRdma":false,"NeedVhostNet":false},"SelectorObj":null}]}
I0411 19:59:15.007632 1 factory.go:211] *types.NetDeviceSelectors for resource ens1f1 is [0xc0004ad7a0]
I0411 19:59:15.007641 1 manager.go:106] unmarshalled ResourceList: [{ResourcePrefix: ResourceName:ens1f1 DeviceType:netDevice ExcludeTopology:false Selectors:0xc0004b3350 AdditionalInfo:map[] SelectorObjs:[0xc0004ad7a0]}]
I0411 19:59:15.007677 1 manager.go:217] validating resource name "openshift.io/ens1f1"
I0411 19:59:15.007682 1 main.go:62] Discovering host devices
I0411 19:59:15.087670 1 netDeviceProvider.go:67] netdevice AddTargetDevices(): device found: 0000:3b:00.0 02 Intel Corporation Ethernet Controller XXV710 for 25GbE ...
I0411 19:59:15.088179 1 netDeviceProvider.go:67] netdevice AddTargetDevices(): device found: 0000:3b:00.1 02 Intel Corporation Ethernet Controller XXV710 for 25GbE ...
I0411 19:59:15.088509 1 auxNetDeviceProvider.go:84] auxnetdevice AddTargetDevices(): device found: 0000:3b:00.0 02 Intel Corporation Ethernet Controller XXV710 for 25GbE ...
I0411 19:59:15.088540 1 auxNetDeviceProvider.go:84] auxnetdevice AddTargetDevices(): device found: 0000:3b:00.1 02 Intel Corporation Ethernet Controller XXV710 for 25GbE ...
I0411 19:59:15.088549 1 main.go:68] Initializing resource servers
I0411 19:59:15.088555 1 manager.go:117] number of config: 1
I0411 19:59:15.088560 1 manager.go:121] Creating new ResourcePool: ens1f1
I0411 19:59:15.088565 1 manager.go:122] DeviceType: netDevice
I0411 19:59:15.088807 1 manager.go:138] initServers(): selector index 0 will register 0 devices
I0411 19:59:15.088820 1 manager.go:142] no devices in device pool, skipping creating resource server for ens1f1
I0411 19:59:15.088826 1 main.go:74] Starting all servers...
I0411 19:59:15.088832 1 main.go:79] All servers started.
I0411 19:59:15.088839 1 main.go:80] Listening for term signals

@hymgg
Copy link
Author

hymgg commented Apr 11, 2024

sriov-network-operator-845dc5dffc-4hvsb (2).log

operator pod log.

@hymgg
Copy link
Author

hymgg commented Apr 15, 2024

Is there a way to debug this issue? failed to configure sriov on interface. Worker nodes are running k8s 1.26, RH8.6

@SchSeba
Copy link
Collaborator

SchSeba commented Apr 16, 2024

Hi as you can see it's an intel nic and in the status of the sriovNetworkNodeState there is no maxVf that points me out that you didn't enable sriov in the bios of the machine.

@hymgg
Copy link
Author

hymgg commented Apr 16, 2024

@SchSeba Thanks! checking with lab on this.

@hymgg
Copy link
Author

hymgg commented Apr 17, 2024

Lab team enabled sriov on the NICs. SriovNetworkNodeState now reports totalvfs: 64, but still "cannot configure sriov interfaces", tried delete and apply the same SriovNetworkNodePolicy, didn't help.

`$ kubectl get sriovnetworknodestates.sriovnetwork.openshift.io -n sriov-network-operator mtx-dell4-bld01.dc1.matrixxsw.com -o yaml
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodeState
metadata:
annotations:
sriovnetwork.openshift.io/current-state: Idle
sriovnetwork.openshift.io/desired-state: Idle
creationTimestamp: "2024-04-09T22:46:21Z"
generation: 9
name: mtx-dell4-bld01.dc1.matrixxsw.com
namespace: sriov-network-operator
ownerReferences:

  • apiVersion: sriovnetwork.openshift.io/v1
    blockOwnerDeletion: true
    controller: true
    kind: SriovOperatorConfig
    name: default
    uid: ea87a10e-5906-4d3b-a8b0-f67783bc36b6
    resourceVersion: "178734156"
    uid: 52ebc9f9-d110-4915-ba9c-65b53b79c4b0
    spec:
    interfaces:
  • name: ens1f1
    numVfs: 8
    pciAddress: 0000:3b:00.1
    vfGroups:
    • deviceType: netdevice
      policyName: policy-ens1f1
      resourceName: ens1f1
      vfRange: 0-7
      status:
      interfaces:
  • deviceID: 158a
    driver: i40e
    eSwitchMode: legacy
    linkSpeed: 25000 Mb/s
    linkType: ETH
    mac: "12:21:04:20:01:02"
    mtu: 1500
    name: p1p1
    pciAddress: 0000:3b:00.0
    totalvfs: 64
    vendor: "8086"
  • deviceID: 158a
    driver: i40e
    eSwitchMode: legacy
    linkSpeed: 25000 Mb/s
    linkType: ETH
    mac: "12:21:04:20:01:03"
    mtu: 1500
    name: ens1f1
    pciAddress: 0000:3b:00.1
    totalvfs: 64
    vendor: "8086"
    lastSyncError: cannot configure sriov interfaces
    syncStatus: Failed
    `

@hymgg
Copy link
Author

hymgg commented Apr 19, 2024

Anything else we should check?

@rollandf
Copy link
Contributor

Can you provide new logs from config daemon?

@hymgg
Copy link
Author

hymgg commented Apr 22, 2024

Config daemon says cannot allocate memory.
Uploading logs from config daemon, device plugin and operator.

2024-04-22T20:43:36.502722528Z ERROR sriov/sriov.go:992 SetSriovNumVfs(): fail to set NumVfs file {"path": "/sys/bus/pci/devices/0000:3b:00.1/sriov_numvfs", "error": "write /sys/bus/pci/devices/0000:3b:00.1/sriov_numvfs: cannot allocate memory"} 2024-04-22T20:43:36.502748646Z ERROR sriov/sriov.go:545 configSriovPFDevice(): fail to set NumVfs for device {"device": "0000:3b:00.1", "error": "write /sys/bus/pci/devices/0000:3b:00.1/sriov_numvfs: cannot allocate memory"} 2024-04-22T20:43:36.502758263Z ERROR sriov/sriov.go:594 configSriovInterfaces(): fail to configure sriov interface. resetting interface. {"address": "0000:3b:00.1", "error": "write /sys/bus/pci/devices/0000:3b:00.1/sriov_numvfs: cannot allocate memory"} 2024-04-22T20:43:36.503045188Z ERROR generic/generic_plugin.go:183 cannot configure sriov interfaces {"error": "write /sys/bus/pci/devices/0000:3b:00.1/sriov_numvfs: cannot allocate memory"} 2024-04-22T20:43:36.503061016Z ERROR daemon/daemon.go:259 nodeStateSyncHandler(): generic plugin fail to apply {"error": "cannot configure sriov interfaces"}
sriov-network-operator-845dc5dffc-4hvsb.log
sriov-device-plugin-pnld9.log
sriov-network-config-daemon-sxf4b.log

@rollandf
Copy link
Contributor

Seems that you need to add the following kernel arg: pci=realloc to your server.

@SchSeba
Copy link
Collaborator

SchSeba commented Apr 24, 2024

@hymgg can you please check I think there is on the bios something called 4M memory or something like that

@hymgg
Copy link
Author

hymgg commented Apr 25, 2024

@rollandf @SchSeba The VFs showed up after adding pci=realloc to kernel. Thanks!
But, according to quickstart guide, next the VFs should be reported as node allocatable resources, that didn't happen. The device-plugin pods also take turn to terminate and recreate. Their logs reported "error creating new device"

`$ kubectl get sriovnetworknodestates.sriovnetwork.openshift.io -n sriov-network-operator mtx-dell4-bld01.dc1.matrixxsw.com -o yaml
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodeState
metadata:
annotations:
sriovnetwork.openshift.io/current-state: Idle
sriovnetwork.openshift.io/desired-state: Drain_Required
creationTimestamp: "2024-04-09T22:46:21Z"
generation: 9
name: mtx-dell4-bld01.dc1.matrixxsw.com
namespace: sriov-network-operator
ownerReferences:

  • apiVersion: sriovnetwork.openshift.io/v1
    blockOwnerDeletion: true
    controller: true
    kind: SriovOperatorConfig
    name: default
    uid: ea87a10e-5906-4d3b-a8b0-f67783bc36b6
    resourceVersion: "186573701"
    uid: 52ebc9f9-d110-4915-ba9c-65b53b79c4b0
    spec:
    interfaces:
  • name: ens1f1
    numVfs: 8
    pciAddress: 0000:3b:00.1
    vfGroups:
    • deviceType: netdevice
      policyName: policy-ens1f1
      resourceName: ens1f1
      vfRange: 0-7
      status:
      interfaces:
  • deviceID: 158a
    driver: i40e
    eSwitchMode: legacy
    linkSpeed: 25000 Mb/s
    linkType: ETH
    mac: "12:21:04:20:01:02"
    mtu: 1500
    name: p1p1
    pciAddress: 0000:3b:00.0
    totalvfs: 64
    vendor: "8086"
  • Vfs:
    • deviceID: 154c
      pciAddress: 0000:3b:0a.0
      vendor: "8086"
      vfID: 0
    • deviceID: 154c
      pciAddress: 0000:3b:0a.1
      vendor: "8086"
      vfID: 1
    • deviceID: 154c
      pciAddress: 0000:3b:0a.2
      vendor: "8086"
      vfID: 2
    • deviceID: 154c
      pciAddress: 0000:3b:0a.3
      vendor: "8086"
      vfID: 3
    • deviceID: 154c
      pciAddress: 0000:3b:0a.4
      vendor: "8086"
      vfID: 4
    • deviceID: 154c
      pciAddress: 0000:3b:0a.5
      vendor: "8086"
      vfID: 5
    • deviceID: 154c
      pciAddress: 0000:3b:0a.6
      vendor: "8086"
      vfID: 6
    • deviceID: 154c
      pciAddress: 0000:3b:0a.7
      vendor: "8086"
      vfID: 7
      deviceID: 158a
      driver: i40e
      eSwitchMode: legacy
      linkSpeed: 25000 Mb/s
      linkType: ETH
      mac: "12:21:04:20:01:03"
      mtu: 1500
      name: ens1f1
      numVfs: 8
      pciAddress: 0000:3b:00.1
      totalvfs: 64
      vendor: "8086"
      syncStatus: InProgress
      `

$ kubectl get no -o json | jq -r '[.items[] | {name:.metadata.name, allocable:.status.allocatable}]'
[
{
"name": "mtx-dell4-bld01.dc1.matrixxsw.com",
"allocable": {
"cpu": "64",
"ephemeral-storage": "213255452729",
"hugepages-1Gi": "0",
"hugepages-2Mi": "0",
"memory": "394453236Ki",
"pods": "110"
}
},
...

$ kubectl get all -n sriov-network-operator
NAME READY STATUS RESTARTS AGE
pod/sriov-device-plugin-kczqc 1/1 Terminating 0 11s
pod/sriov-device-plugin-pq2xz 1/1 Running 0 10m
pod/sriov-device-plugin-txb5j 1/1 Running 0 1s
...

$ kubectl logs sriov-device-plugin-4mdbw -n sriov-network-operator
I0425 02:48:15.236595 1 manager.go:57] Using Kubelet Plugin Registry Mode
I0425 02:48:15.236654 1 main.go:46] resource manager reading configs
I0425 02:48:15.236679 1 manager.go:86] raw ResourceList: {"resourceList":[{"resourceName":"ens1f1","selectors":{"vendors":["8086"],"devices":["154c"],"rootDevices":["0000:3b:00.1"],"IsRdma":false,"NeedVhostNet":false},"SelectorObj":null}]}
I0425 02:48:15.236752 1 factory.go:211] *types.NetDeviceSelectors for resource ens1f1 is [0xc0001eaa20]
I0425 02:48:15.236770 1 manager.go:106] unmarshalled ResourceList: [{ResourcePrefix: ResourceName:ens1f1 DeviceType:netDevice ExcludeTopology:false Selectors:0xc00053a180 AdditionalInfo:map[] SelectorObjs:[0xc0001eaa20]}]
I0425 02:48:15.236799 1 manager.go:217] validating resource name "openshift.io/ens1f1"
I0425 02:48:15.236807 1 main.go:62] Discovering host devices
I0425 02:48:15.312076 1 netDeviceProvider.go:67] netdevice AddTargetDevices(): device found: 0000:3b:00.0 02 Intel Corporation Ethernet Controller XXV710 for 25GbE ...
I0425 02:48:15.312341 1 netDeviceProvider.go:67] netdevice AddTargetDevices(): device found: 0000:3b:00.1 02 Intel Corporation Ethernet Controller XXV710 for 25GbE ...
I0425 02:48:15.312536 1 netDeviceProvider.go:67] netdevice AddTargetDevices(): device found: 0000:3b:0a.0 02 Intel Corporation Ethernet Virtual Function 700 Series
I0425 02:48:15.312560 1 netDeviceProvider.go:67] netdevice AddTargetDevices(): device found: 0000:3b:0a.1 02 Intel Corporation Ethernet Virtual Function 700 Series
I0425 02:48:15.312574 1 netDeviceProvider.go:67] netdevice AddTargetDevices(): device found: 0000:3b:0a.2 02 Intel Corporation Ethernet Virtual Function 700 Series
I0425 02:48:15.312589 1 netDeviceProvider.go:67] netdevice AddTargetDevices(): device found: 0000:3b:0a.3 02 Intel Corporation Ethernet Virtual Function 700 Series
I0425 02:48:15.312602 1 netDeviceProvider.go:67] netdevice AddTargetDevices(): device found: 0000:3b:0a.4 02 Intel Corporation Ethernet Virtual Function 700 Series
I0425 02:48:15.312615 1 netDeviceProvider.go:67] netdevice AddTargetDevices(): device found: 0000:3b:0a.5 02 Intel Corporation Ethernet Virtual Function 700 Series
I0425 02:48:15.312628 1 netDeviceProvider.go:67] netdevice AddTargetDevices(): device found: 0000:3b:0a.6 02 Intel Corporation Ethernet Virtual Function 700 Series
I0425 02:48:15.312642 1 netDeviceProvider.go:67] netdevice AddTargetDevices(): device found: 0000:3b:0a.7 02 Intel Corporation Ethernet Virtual Function 700 Series
I0425 02:48:15.312673 1 auxNetDeviceProvider.go:84] auxnetdevice AddTargetDevices(): device found: 0000:3b:00.0 02 Intel Corporation Ethernet Controller XXV710 for 25GbE ...
I0425 02:48:15.312690 1 auxNetDeviceProvider.go:84] auxnetdevice AddTargetDevices(): device found: 0000:3b:00.1 02 Intel Corporation Ethernet Controller XXV710 for 25GbE ...
I0425 02:48:15.312695 1 auxNetDeviceProvider.go:84] auxnetdevice AddTargetDevices(): device found: 0000:3b:0a.0 02 Intel Corporation Ethernet Virtual Function 700 Series
I0425 02:48:15.312699 1 auxNetDeviceProvider.go:84] auxnetdevice AddTargetDevices(): device found: 0000:3b:0a.1 02 Intel Corporation Ethernet Virtual Function 700 Series
I0425 02:48:15.312704 1 auxNetDeviceProvider.go:84] auxnetdevice AddTargetDevices(): device found: 0000:3b:0a.2 02 Intel Corporation Ethernet Virtual Function 700 Series
I0425 02:48:15.312707 1 auxNetDeviceProvider.go:84] auxnetdevice AddTargetDevices(): device found: 0000:3b:0a.3 02 Intel Corporation Ethernet Virtual Function 700 Series
I0425 02:48:15.312711 1 auxNetDeviceProvider.go:84] auxnetdevice AddTargetDevices(): device found: 0000:3b:0a.4 02 Intel Corporation Ethernet Virtual Function 700 Series
I0425 02:48:15.312714 1 auxNetDeviceProvider.go:84] auxnetdevice AddTargetDevices(): device found: 0000:3b:0a.5 02 Intel Corporation Ethernet Virtual Function 700 Series
I0425 02:48:15.312717 1 auxNetDeviceProvider.go:84] auxnetdevice AddTargetDevices(): device found: 0000:3b:0a.6 02 Intel Corporation Ethernet Virtual Function 700 Series
I0425 02:48:15.312722 1 auxNetDeviceProvider.go:84] auxnetdevice AddTargetDevices(): device found: 0000:3b:0a.7 02 Intel Corporation Ethernet Virtual Function 700 Series
I0425 02:48:15.312727 1 main.go:68] Initializing resource servers
I0425 02:48:15.312734 1 manager.go:117] number of config: 1
I0425 02:48:15.312739 1 manager.go:121] Creating new ResourcePool: ens1f1
I0425 02:48:15.312742 1 manager.go:122] DeviceType: netDevice
E0425 02:48:15.312843 1 netDeviceProvider.go:50] netdevice GetDevices(): error creating new device: "error getting driver info for device 0000:3b:0a.0 readlink /sys/bus/pci/devices/0000:3b:0a.0/driver: no such file or directory"
E0425 02:48:15.312854 1 netDeviceProvider.go:50] netdevice GetDevices(): error creating new device: "error getting driver info for device 0000:3b:0a.1 readlink /sys/bus/pci/devices/0000:3b:0a.1/driver: no such file or directory"
E0425 02:48:15.312863 1 netDeviceProvider.go:50] netdevice GetDevices(): error creating new device: "error getting driver info for device 0000:3b:0a.2 readlink /sys/bus/pci/devices/0000:3b:0a.2/driver: no such file or directory"
E0425 02:48:15.312870 1 netDeviceProvider.go:50] netdevice GetDevices(): error creating new device: "error getting driver info for device 0000:3b:0a.3 readlink /sys/bus/pci/devices/0000:3b:0a.3/driver: no such file or directory"
E0425 02:48:15.312878 1 netDeviceProvider.go:50] netdevice GetDevices(): error creating new device: "error getting driver info for device 0000:3b:0a.4 readlink /sys/bus/pci/devices/0000:3b:0a.4/driver: no such file or directory"
E0425 02:48:15.312885 1 netDeviceProvider.go:50] netdevice GetDevices(): error creating new device: "error getting driver info for device 0000:3b:0a.5 readlink /sys/bus/pci/devices/0000:3b:0a.5/driver: no such file or directory"
E0425 02:48:15.312893 1 netDeviceProvider.go:50] netdevice GetDevices(): error creating new device: "error getting driver info for device 0000:3b:0a.6 readlink /sys/bus/pci/devices/0000:3b:0a.6/driver: no such file or directory"
E0425 02:48:15.312902 1 netDeviceProvider.go:50] netdevice GetDevices(): error creating new device: "error getting driver info for device 0000:3b:0a.7 readlink /sys/bus/pci/devices/0000:3b:0a.7/driver: no such file or directory"
I0425 02:48:15.312910 1 manager.go:138] initServers(): selector index 0 will register 0 devices
I0425 02:48:15.312916 1 manager.go:142] no devices in device pool, skipping creating resource server for ens1f1
I0425 02:48:15.312931 1 main.go:74] Starting all servers...
I0425 02:48:15.312935 1 main.go:79] All servers started.
I0425 02:48:15.312939 1 main.go:80] Listening for term signals

@hymgg
Copy link
Author

hymgg commented Apr 26, 2024

Forgot to mention, while the device plugin pods terminate/create, the nodes take turn to go into SchedulingDisabled state too.

$ kubectl get node
NAME STATUS ROLES AGE VERSION
mtx-dell4-bld01.dc1.matrixxsw.com Ready worker 291d v1.26.6
mtx-dell4-bld02.dc1.matrixxsw.com Ready,SchedulingDisabled worker 291d v1.26.6
mtx-dell4-bld03.dc1.matrixxsw.com Ready worker 291d v1.26.6

@hymgg
Copy link
Author

hymgg commented Apr 29, 2024

Hello, any other ideas to investigate?

@hymgg
Copy link
Author

hymgg commented May 2, 2024

still looking for remedy to the situation...

@ns-rlewkowicz
Copy link

@hymgg I ran into this almost 8 months ago. Almost everything in your post. On a single node, clean test cluster this thing works. But our nodes have hundreds labels.

Are your clusters rke?

Generally this project did not seem great. Just even that the labels are hard coded is absolutely terrible.

I'm already not looking forward to trudging down this path again.

@hymgg
Copy link
Author

hymgg commented May 3, 2024

@ns-rlewkowicz Thanks for sharing your experience. Is there an alternative that works better?

This is a vanilla k8s on bare metal rh8 nodes, installed with kubeadm. Just the essentials, nothing fancy.

@adrianchiris
Copy link
Collaborator

GetDevices(): error creating new device: "error getting driver info for device 0000:3b:0a.0 readlink /sys/bus/pci/devices/0000:3b:0a.0/driver: no such file or directory"

are the created SR-IOV virtual functions bound to intel driver ? from the logs it doesnt seem so.
is the VF driver installed in your OS ?
does cat /sys/bus/pci/devices/0000\:3b\:00.0/sriov_drivers_autoprobe return 1 ?

@SchSeba
Copy link
Collaborator

SchSeba commented May 6, 2024

@hymgg can you please run

lspci -v -nn -mm -k -s 0000:3b:00.0
lspci -vvv 0000:3b:00.0

and also please check again in the bios configuration about

Memory Mapped I/O above 4GB : enable

@SchSeba
Copy link
Collaborator

SchSeba commented May 6, 2024

Hi @ns-rlewkowicz any specific issue that the community can help with?
we have a large number of users using the operator on large clusters

@ns-rlewkowicz
Copy link

ns-rlewkowicz commented May 6, 2024

@SchSeba One of my biggest issue with that labels were hardcoded in the code. It didn't have a configurable option. I can hack around, I'm a good dev when I need to be. It was just a pain.

Then I hit the segfaults same as this post and I called it. I just took a look at issues before I started again. For our current configurations we laid down the plugin manually. We have new node configurations coming as well as some other requirements so it was just more manual config or trying this.

I should add too, same node, minikube, no segfaults. Production cluster config, seg faults.

I'll track it down, im just not excited about it.

#507

I'm sorry I've not been good with responses. I'm a little selfish in this. I at least do give some break fix

edit:
There's a helm chart: https://github.com/k8snetworkplumbingwg/sriov-network-operator/tree/master/deployment/sriov-network-operator#sr-iov-operator-configuration-parameters

Idk why the quick start doesn't lead with this, and instead uses this custom make system. Seems like it's just a docs gap.

@ns-rlewkowicz
Copy link

@hymgg You can lay the plugin down manually if you have limited node configurations.

@hymgg
Copy link
Author

hymgg commented May 6, 2024

@adrianchiris autoprobe is enabled, is it possible something with this (version) of driver? is there alternative?

[root@mtx-dell4-bld01 ~]# cat /sys/bus/pci/devices/0000:3b:00.0/sriov_drivers_autoprobe
1

@hymgg
Copy link
Author

hymgg commented May 6, 2024

@SchSeba this is set to enable in the bios:
Memory Mapped I/O above 4GB : enable

lspci -v -nn -mm -k -s 0000:3b:00.0

Slot: 3b:00.0
Class: Ethernet controller [0200]
Vendor: Intel Corporation [8086]
Device: Ethernet Controller XXV710 for 25GbE backplane [158a]
SVendor: Intel Corporation [8086]
SDevice: Ethernet 25G 2P XXV710 Mezz [000a]
Rev: 02
Driver: i40e
Module: i40e
NUMANode: 0

lspci -vvv -s 0000:3b:00.0

3b:00.0 Ethernet controller: Intel Corporation Ethernet Controller XXV710 for 25GbE backplane (rev 02)
Subsystem: Intel Corporation Ethernet 25G 2P XXV710 Mezz
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- SERR- <PERR- INTx-
Latency: 0
Interrupt: pin A routed to IRQ 35
NUMA node: 0
Region 0: Memory at 388000000000 (64-bit, prefetchable) [size=16M]
Region 3: Memory at 388002800000 (64-bit, prefetchable) [size=32K]
Expansion ROM at ad200000 [disabled] [size=512K]
Capabilities: [40] Power Management version 3
Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=1 PME-
Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+
Address: 0000000000000000 Data: 0000
Masking: 00000000 Pending: 00000000
Capabilities: [70] MSI-X: Enable+ Count=129 Masked-
Vector table: BAR=3 offset=00000000
PBA: BAR=3 offset=00001000
Capabilities: [a0] Express (v2) Endpoint, MSI 00
DevCap: MaxPayload 2048 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us
ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 0.000W
DevCtl: CorrErr- NonFatalErr+ FatalErr+ UnsupReq+
RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop- FLReset-
MaxPayload 256 bytes, MaxReadReq 512 bytes
DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq+ AuxPwr- TransPend-
LnkCap: Port #0, Speed 8GT/s, Width x8, ASPM L1, Exit Latency L1 <16us
ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 8GT/s (ok), Width x8 (ok)
TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Range ABCD, TimeoutDis+ NROPrPrP- LTR-
10BitTagComp- 10BitTagReq- OBFF Not Supported, ExtFmt- EETLPPrefix-
EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
FRS- TPHComp- ExtTPHComp-
AtomicOpsCap: 32bit- 64bit- 128bitCAS-
DevCtl2: Completion Timeout: 65ms to 210ms, TimeoutDis- LTR- OBFF Disabled,
AtomicOpsCtl: ReqEn-
LnkCap2: Supported Link Speeds: 2.5-8GT/s, Crosslink- Retimer- 2Retimers- DRS-
LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
Compliance De-emphasis: -6dB
LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete+ EqualizationPhase1+
EqualizationPhase2+ EqualizationPhase3+ LinkEqualizationRequest-
Retimer- 2Retimers- CrosslinkRes: unsupported
Capabilities: [e0] Vital Product Data
Product Name: XXV710 25GbE Controller\x00
Read-only fields:
[V0] Vendor specific: FFV22.5.7
[PN] Part number: H9NTY
[MN] Manufacture ID: 1028
[V1] Vendor specific: DSV1028VPDR.VER2.1
[V3] Vendor specific: DTINIC
[V4] Vendor specific: DCM1001FFFFFF2101FFFFFF
[V5] Vendor specific: NPY2
[V6] Vendor specific: PMTC
[V7] Vendor specific: NMVIntel Corp
[V8] Vendor specific: L1D0
[RV] Reserved: checksum good, 5 byte(s) reserved
Read/write fields:
[Y1] System specific: CCF1
End
Capabilities: [100 v2] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt+ UnxCmplt+ RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UESvrt: DLP+ SDES+ TLP+ FCP+ CmpltTO+ CmpltAbrt+ UnxCmplt- RxOF+ MalfTLP+ ECRC+ UnsupReq- ACSViol-
CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
CEMsk: RxErr+ BadTLP+ BadDLLP+ Rollover+ Timeout+ AdvNonFatalErr+
AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn+ ECRCChkCap+ ECRCChkEn+
MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
HeaderLog: 00000000 00000000 00000000 00000000
Capabilities: [140 v1] Device Serial Number 3e-b4-1c-ff-ff-44-ac-78
Capabilities: [150 v1] Alternative Routing-ID Interpretation (ARI)
ARICap: MFVC- ACS-, Next Function: 1
ARICtl: MFVC- ACS-, Function Group: 0
Capabilities: [160 v1] Single Root I/O Virtualization (SR-IOV)
IOVCap: Migration-, Interrupt Message Number: 000
IOVCtl: Enable- Migration- Interrupt- MSE- ARIHierarchy+
IOVSta: Migration-
Initial VFs: 64, Total VFs: 64, Number of VFs: 0, Function Dependency Link: 00
VF offset: 16, stride: 1, Device ID: 154c
Supported Page Size: 00000553, System Page Size: 00000001
Region 0: Memory at 0000388002000000 (64-bit, prefetchable)
Region 3: Memory at 0000388002810000 (64-bit, prefetchable)
VF Migration: offset: 00000000, BIR: 0
Capabilities: [1a0 v1] Transaction Processing Hints
Device specific mode supported
No steering table available
Capabilities: [1b0 v1] Access Control Services
ACSCap: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-
ACSCtl: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-
Capabilities: [1d0 v1] Secondary PCI Express
LnkCtl3: LnkEquIntrruptEn- PerformEqu-
LaneErrStat: 0
Kernel driver in use: i40e
Kernel modules: i40e

@hymgg
Copy link
Author

hymgg commented May 6, 2024

@ns-rlewkowicz Thank you, how to lay the plugin down manually?

@hymgg
Copy link
Author

hymgg commented May 10, 2024

Any other ideas?

@SchSeba
Copy link
Collaborator

SchSeba commented May 22, 2024

Hi @hymgg,

Sorry for the late response.
To be sure there is no problem from the operator system point of view please try

echo 5 > /sys/bus/pci/devices/0000\:3b\:00.0/sriov_numvfs

if that returns and error then the problem is something in the operating system/drivers and the operator will also not be able to configure the virtual functions.

you can try to check with dmesg after the echo command to see if the drivers print any error.

@hymgg
Copy link
Author

hymgg commented May 24, 2024

@SchSeba echo came back successfully. Not sure what to look for in dmesg output? Pls help review attached dmesg.log

cat /sys/bus/pci/devices/0000:3b:00.0/sriov_numvfs

0

echo 5 > /sys/bus/pci/devices/0000:3b:00.0/sriov_numvfs

cat /sys/bus/pci/devices/0000:3b:00.0/sriov_numvfs

5
dmesg.log

@hymgg
Copy link
Author

hymgg commented May 30, 2024

Anything else to check / try?

@zeeke
Copy link
Member

zeeke commented May 31, 2024

Hi @hymgg, Looking at the dmesg.log you shared, it looks like the driver correctly creates the VFs, but network devices do not spawn.

[201505.484417] i40e 0000:3b:00.0: FW LLDP is enabled
[201505.484423] i40e 0000:3b:00.0: Allocating 5 VFs.
[201505.591061] pci 0000:3b:02.0: [8086:154c] type 00 class 0x020000
[201505.591073] pci 0000:3b:02.0: enabling Extended Tags
[201505.591254] pci 0000:3b:02.1: [8086:154c] type 00 class 0x020000
[201505.591267] pci 0000:3b:02.1: enabling Extended Tags
[201505.591364] pci 0000:3b:02.2: [8086:154c] type 00 class 0x020000
[201505.591373] pci 0000:3b:02.2: enabling Extended Tags
[201505.591462] pci 0000:3b:02.3: [8086:154c] type 00 class 0x020000
[201505.591471] pci 0000:3b:02.3: enabling Extended Tags
[201505.591561] pci 0000:3b:02.4: [8086:154c] type 00 class 0x020000
[201505.591570] pci 0000:3b:02.4: enabling Extended Tags

I would double check if following the Intel guide produce a correct system configuration, as the problem seems to be out of the operator's scope.

Can you also share the a kubectl cluster-info dump -n <operator-namespace> --output-directory <xxx>?
I think you made some relevant progress since the latest pod logs

@hymgg
Copy link
Author

hymgg commented Jun 3, 2024

@zeeke after manually echo 5 > .../sriov_numvfs, the operator pod has been crashing,

[mtx@mtx-dell4-bld08 sriov-network-operator]$ kubectl get all -n sriov-network-operator
NAME                                          READY   STATUS             RESTARTS        AGE
pod/sriov-network-config-daemon-xn9rq         1/1     Running            5 (12d ago)     54d
pod/sriov-network-operator-845dc5dffc-zmlgj   0/1     CrashLoopBackOff   8 (2m52s ago)   19m

NAME                                         DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR                                            AGE
daemonset.apps/sriov-network-config-daemon   1         1         1       1            1           kubernetes.io/os=linux,node-role.kubernetes.io/worker=   54d

NAME                                     READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/sriov-network-operator   0/1     1            0           66d

NAME                                                DESIRED   CURRENT   READY   AGE
replicaset.apps/sriov-network-operator-845dc5dffc   1         1         0       66d

[mtx@mtx-dell4-bld08 sriov-network-operator]$ kubectl logs deployment.apps/sriov-network-operator -n sriov-network-operator
2024-06-03T05:56:54.03918188Z   ERROR   setup   runtime/proc.go:267     unable to create index field for cache  {"error": "no matches for kind \"OVSNetwork\" in version \"sriovnetwork.openshift.io/v1\""}

Attached current ns dump W/O any SriovNetworkNodePolicy

cluster-info-dump.tar.gz

@zeeke
Copy link
Member

zeeke commented Jun 3, 2024

The operator's pod is crashing due to a missing CRD in the cluster:

2024-06-03T06:17:20.02177684Z	ERROR	setup	runtime/proc.go:267	unable to create index field for cache	{"error": "no matches for kind \"OVSNetwork\" in version \"sriovnetwork.openshift.io/v1\""}

It sounds like the installation is not 100% clean. Can you try uninstall the operator completely and deploying it again?

@hymgg
Copy link
Author

hymgg commented Jun 4, 2024

@zeeke sure.
Ran helm uninstall, deleted ns sriov-network-operator, pulled latest source. ran helm install successfully, no pods crashing.

Tried to create the policy again, the device plugin pod soon went to Terminating state, and the node to SchedulingDisabled.

apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
  name: policy-ens1f1
  namespace: sriov-network-operator
spec:
  nodeSelector:
    node-role.kubernetes.io/worker: ""
    #feature.node.kubernetes.io/network-sriov.capable: "true"
  resourceName: ens1f1
  priority: 99
  #mtu: 9000
  numVfs: 8
  nicSelector:
      deviceID: "158a"
      rootDevices:
      - 0000:3b:00.1
      vendor: "8086"
  deviceType: netdevice

$ kubectl get node -l node-role.kubernetes.io/worker; kubectl --context dell4 get all -n sriov-network-operator
NAME                                STATUS                     ROLES    AGE    VERSION
mtx-dell4-bld01.dc1.matrixxsw.com   Ready,SchedulingDisabled   worker   331d   v1.26.6

NAME                                          READY   STATUS        RESTARTS   AGE
pod/sriov-device-plugin-wdvqb                 1/1     Terminating   0          3s
pod/sriov-network-config-daemon-8s8z5         1/1     Running       0          5m33s
pod/sriov-network-operator-7c897b487b-dhfch   1/1     Running       0          5m34s

NAME                                         DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR                                            AGE
daemonset.apps/sriov-device-plugin           1         1         1       1            1           kubernetes.io/os=linux,node-role.kubernetes.io/worker=   40s
daemonset.apps/sriov-network-config-daemon   1         1         1       1            1           kubernetes.io/os=linux,node-role.kubernetes.io/worker=   5m33s

NAME                                     READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/sriov-network-operator   1/1     1            1           5m34s

NAME                                                DESIRED   CURRENT   READY   AGE
replicaset.apps/sriov-network-operator-7c897b487b   1         1         1       5m34s

latest cluster-info dump,
cluster-info-dump.tar.gz

@zeeke
Copy link
Member

zeeke commented Jun 4, 2024

Hi @hymgg, looking at the archive you shared, I see the config daemon is reporting the error:


2024-06-04T00:16:38.820984334Z	LEVEL(-2)	sriov/sriov.go:436	BindDefaultDriver(): bind device to default driver	{"device": "0000:3b:0a.0"}
2024-06-04T00:16:38.82100001Z	LEVEL(-2)	kernel/kernel.go:142	getDriverByBusAndDevice(): driver path for device not exist	{"bus": "pci", "device": "0000:3b:0a.0", "driver": ""}
2024-06-04T00:16:38.821009457Z	LEVEL(-2)	kernel/kernel.go:156	setDriverOverride(): reset driver override for device	{"bus": "pci", "device": "0000:3b:0a.0"}
2024-06-04T00:16:38.821029785Z	LEVEL(-2)	kernel/kernel.go:159	probeDriver(): drivers probe	{"bus": "pci", "device": "0000:3b:0a.0"}
2024-06-04T00:16:38.821220047Z	LEVEL(-2)	kernel/kernel.go:241	getDriverByBusAndDevice(): driver path for device not exist	{"bus": "pci", "device": "0000:3b:0a.0", "driver": ""}
...
2024-06-04T00:16:41.932597014Z	ERROR	sriov/sriov.go:296	getVfInfo(): unable to parse device driver	{"device": "0000:3b:0a.0", "error": "error getting driver info for device 0000:3b:0a.0 readlink /sys/bus/pci/devices/0000:3b:0a.0/driver: no such file or directory"}

And I think the device-plugin failure depends on the Virtual Functions not having the driver installed.

so, the problem here is that on the VF driver. Can you check that when you manually create VF (echo 5 > /sys/bus/pci/devices/0000:3b:00.0/sriov_numvfs) the network devices appear in the system (i.e. you can see them via ip link show).

If they don't spawn, there is a problem with the drivers and we have to check the kernel journal.

@hymgg
Copy link
Author

hymgg commented Jun 4, 2024

@zeeke thanks for the followup.

Removed SriovNetworkNodePolicy to restore node/pod to healthy state.

5 VF spawned OK.

[root@mtx-dell4-bld01 ~]# ip link show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: p1p1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether 12:21:04:20:01:02 brd ff:ff:ff:ff:ff:ff
3: ens1f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether 12:21:04:20:01:03 brd ff:ff:ff:ff:ff:ff
4: vlan236@p1p1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 12:21:04:20:01:02 brd ff:ff:ff:ff:ff:ff
5: tunl0@NONE: <NOARP,UP,LOWER_UP> mtu 1480 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/ipip 0.0.0.0 brd 0.0.0.0
48: calie21016d8cbd@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1480 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netns cni-28eb33f7-3e2b-a7af-3a7c-09f18cca9f8c
[root@mtx-dell4-bld01 ~]# cat /sys/bus/pci/devices/0000:3b:00.0/sriov_numvfs
0
[root@mtx-dell4-bld01 ~]# echo 5 > /sys/bus/pci/devices/0000:3b:00.0/sriov_numvfs
[root@mtx-dell4-bld01 ~]# cat /sys/bus/pci/devices/0000:3b:00.0/sriov_numvfs
5
[root@mtx-dell4-bld01 ~]# ip link show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: p1p1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether 12:21:04:20:01:02 brd ff:ff:ff:ff:ff:ff
    vf 0     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off
    vf 1     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off
    vf 2     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off
    vf 3     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off
    vf 4     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off
3: ens1f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether 12:21:04:20:01:03 brd ff:ff:ff:ff:ff:ff
4: vlan236@p1p1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 12:21:04:20:01:02 brd ff:ff:ff:ff:ff:ff
5: tunl0@NONE: <NOARP,UP,LOWER_UP> mtu 1480 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/ipip 0.0.0.0 brd 0.0.0.0
48: calie21016d8cbd@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1480 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netns cni-28eb33f7-3e2b-a7af-3a7c-09f18cca9f8c

@zeeke
Copy link
Member

zeeke commented Jun 5, 2024

your command echo 5 > /sys/bus/pci/devices/0000:3b:00.0/sriov_numvfs created 5 VFs but they are not bound to a VF driver (for XX710 NIC and i40e PF driver, it should be iavf). AFAIK the driver should be bounded to the created VF by default and in that case you would see 5 other network devices in the ip link command.

Please, try running

$ modinfo iavf

If you can't configure SR-IOV devices manually, I'm afraid the problem is out of the operator's scope

@hymgg
Copy link
Author

hymgg commented Jun 5, 2024

@zeeke Does modinfo output look OK? what else do we need to do to enable sriov?

[root@mtx-dell4-bld01 ~]# modinfo iavf
filename:       /lib/modules/4.18.0-372.9.1.el8.x86_64/kernel/drivers/net/ethernet/intel/iavf/iavf.ko.xz
version:        4.18.0-372.9.1.el8.x86_64
license:        GPL v2
description:    Intel(R) Ethernet Adaptive Virtual Function Network Driver
author:         Intel Corporation, <[email protected]>
alias:          i40evf
rhelversion:    8.6
srcversion:     0BCAA13A5A41977F55708FC
alias:          pci:v00008086d00001889sv*sd*bc*sc*i*
alias:          pci:v00008086d000037CDsv*sd*bc*sc*i*
alias:          pci:v00008086d00001571sv*sd*bc*sc*i*
alias:          pci:v00008086d0000154Csv*sd*bc*sc*i*
depends:
intree:         Y
name:           iavf
vermagic:       4.18.0-372.9.1.el8.x86_64 SMP mod_unload modversions
sig_id:         PKCS#7
signer:         Red Hat Enterprise Linux kernel signing key
sig_key:        34:77:C7:8A:A3:9A:7F:6E:A2:A0:7A:AA:0F:54:4A:0F:44:76:FF:1F
sig_hashalgo:   sha256
signature:      A8:2A:C7:03:E9:80:ED:65:66:FC:73:1C:10:1A:7E:7D:02:DD:96:FB:
                0D:3E:D6:22:89:EF:FD:D2:9C:35:EE:B9:F3:E8:0B:71:91:19:8E:7A:
                B9:2D:76:A4:22:AD:C9:81:D3:BE:69:01:20:D5:BD:CA:4F:A1:DA:A5:
                2C:B4:33:6A:55:38:50:E1:1A:7F:37:79:F0:47:1B:C5:73:0D:96:F6:
                AD:D6:11:3A:97:F7:6A:8E:0A:D1:15:23:75:54:B7:89:63:71:EC:13:
                9E:76:94:F6:A8:51:6D:5E:7A:C0:82:B6:10:C8:4A:0F:84:84:F2:26:
                AA:58:3A:7B:F4:7C:A5:7A:69:33:30:61:69:0E:FE:23:E7:70:FA:5B:
                0B:B7:C1:70:D0:76:00:75:67:70:71:AF:16:9D:0C:A4:D6:4A:AA:49:
                9E:11:E3:6B:05:BF:59:DE:61:99:72:7F:2A:E7:7F:2A:58:08:AC:2A:
                97:CB:FC:13:25:2C:16:89:B3:CD:57:34:D7:93:93:B1:B9:D7:06:35:
                68:C9:24:7C:1E:9D:F7:48:C8:13:98:05:1D:BF:4E:B4:0E:64:D2:ED:
                6F:9E:60:8D:BD:A3:DD:5E:09:E9:B4:3D:74:7A:E7:6D:B0:93:02:A0:
                81:62:15:89:8A:61:5E:CF:C6:F7:BB:27:19:29:8D:68:6F:DA:4F:C0:
                CA:8C:6C:1A:AC:AA:ED:93:8C:93:AB:4B:DC:79:6B:ED:45:27:47:AB:
                2A:1E:3E:74:8F:71:C9:0A:45:DA:33:B8:82:26:29:98:2E:77:37:93:
                8F:BB:D2:16:45:A5:62:DD:82:19:5F:03:A3:31:50:64:85:40:6A:F7:
                57:85:B1:6C:1F:DB:0A:C6:99:DF:23:09:6E:5A:FA:67:56:C7:8C:F0:
                35:15:82:F1:C5:28:B7:6F:8E:15:FA:95:44:67:06:03:4A:4D:F9:7A:
                57:1C:60:AF:45:91:2F:7D:2A:E9:2C:10:27:2D:60:A2:57:30:02:98:
                57:D6:8F:EC

@hymgg
Copy link
Author

hymgg commented Jun 5, 2024

Is it because we didn't add this to kernel? we don't have VM in this lab.
intel_iommu=on iommu=pt

@zeeke
Copy link
Member

zeeke commented Jun 6, 2024

intel_iommu=on iommu=pt kernel args are supposed to be added when using vfio-pci devices:

func (p *GenericPlugin) addVfioDesiredKernelArg(state *sriovnetworkv1.SriovNetworkNodeState) {

BTW, as the intel guide [1] suggests to add them, please give it a try.
https://www.intel.com/content/www/us/en/developer/articles/technical/using-sr-iov-to-share-an-ethernet-port-among-multiple-vms.html

@hymgg
Copy link
Author

hymgg commented Jun 11, 2024

intel_iommu=on iommu=pt didn't help, added to kernel and rebooted node

[root@mtx-dell4-bld01 ~]# cat /boot/grub2/grubenv
# GRUB Environment Block
saved_entry=84ccff2f178d4fe0928beb7c962530f5-4.18.0-372.9.1.el8.x86_64
kernelopts=root=/dev/mapper/vg_mtx_rhel8-lv_root ro crashkernel=auto resume=/dev/mapper/vg_mtx_rhel8-lv_swap rd.lvm.lv=vg_mtx_rhel8/lv_root rd.lvm.lv=vg_mtx_rhel8/lv_swap nompath rhgb quiet pci=realloc intel_iommu=on iommu=pt
boot_success=0

[mtx@mtx-dell4-bld08 sriov-network-operator]$ kubectl get node -l node-role.kubernetes.io/worker;kubectl --context dell4 get all -n sriov-network-operator
NAME                                STATUS                     ROLES    AGE    VERSION
mtx-dell4-bld01.dc1.matrixxsw.com   Ready,SchedulingDisabled   worker   339d   v1.26.6

NAME                                          READY   STATUS        RESTARTS      AGE
pod/sriov-device-plugin-ztwhp                 0/1     Terminating   0             36s
pod/sriov-network-config-daemon-8s8z5         1/1     Running       1 (11m ago)   7d22h
pod/sriov-network-operator-7c897b487b-dhfch   1/1     Running       0             7d22h

NAME                                         DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR                                            AGE
daemonset.apps/sriov-network-config-daemon   1         1         1       1            1           kubernetes.io/os=linux,node-role.kubernetes.io/worker=   7d22h

NAME                                     READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/sriov-network-operator   1/1     1            1           7d22h

NAME                                                DESIRED   CURRENT   READY   AGE
replicaset.apps/sriov-network-operator-7c897b487b   1         1         1       7d22h

cluster-info-dump uploaded.
cluster-info-dump.iommu.tar.gz

@SchSeba
Copy link
Collaborator

SchSeba commented Aug 19, 2024

Hi @hymgg can you please run lspci find the virtual functions and run lspci -vv -nn -mm -k -s <vf-pci-addr> and can you check that do didn't disable the iavf kernel module with a blacklist or something like that

@hymgg
Copy link
Author

hymgg commented Aug 26, 2024

@SchSeba thanks for the followup, will reinstall the operator and check with lspci.

@SchSeba
Copy link
Collaborator

SchSeba commented Aug 27, 2024

great I will wait for an update :)

@hymgg
Copy link
Author

hymgg commented Aug 29, 2024

@SchSeba Found iavf in a blacklist.conf, talking to lab team about this.

`

grep iavf /etc/modprobe.d/*

/etc/modprobe.d/anaconda-blacklist.conf:blacklist iavf

lspci|grep "Virtual Function"

3b:0a.0 Ethernet controller: Intel Corporation Ethernet Virtual Function 700 Series (rev 02)
3b:0a.1 Ethernet controller: Intel Corporation Ethernet Virtual Function 700 Series (rev 02)
3b:0a.2 Ethernet controller: Intel Corporation Ethernet Virtual Function 700 Series (rev 02)
3b:0a.3 Ethernet controller: Intel Corporation Ethernet Virtual Function 700 Series (rev 02)
3b:0a.4 Ethernet controller: Intel Corporation Ethernet Virtual Function 700 Series (rev 02)
3b:0a.5 Ethernet controller: Intel Corporation Ethernet Virtual Function 700 Series (rev 02)
3b:0a.6 Ethernet controller: Intel Corporation Ethernet Virtual Function 700 Series (rev 02)
3b:0a.7 Ethernet controller: Intel Corporation Ethernet Virtual Function 700 Series (rev 02)

lspci -vv -nn -mm -k -s 3b:0a.0

Slot: 3b:0a.0
Class: Ethernet controller [0200]
Vendor: Intel Corporation [8086]
Device: Ethernet Virtual Function 700 Series [154c]
SVendor: Intel Corporation [8086]
SDevice: Device [0000]
Rev: 02
Module: iavf
NUMANode: 0
IOMMUGroup: 152

lspci -vv -nn -mm -k -s 3b:0a.1

Slot: 3b:0a.1
Class: Ethernet controller [0200]
Vendor: Intel Corporation [8086]
Device: Ethernet Virtual Function 700 Series [154c]
SVendor: Intel Corporation [8086]
SDevice: Device [0000]
Rev: 02
Module: iavf
NUMANode: 0
IOMMUGroup: 153
`

@hymgg
Copy link
Author

hymgg commented Sep 6, 2024

Removed iavf from blacklist. After reapply the SriovNetworkNodePolicy, pods/node stay alive, node allocatable resource list has "openshift.io/ens1f1": "8", so it's good.

$ cat policy-ens1f1.yaml
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
  name: policy-ens1f1
  namespace: sriov-network-operator
spec:
  nodeSelector:
    node-role.kubernetes.io/worker: ""
    #feature.node.kubernetes.io/network-sriov.capable: "true"
  resourceName: ens1f1
  priority: 99
  #mtu: 9000
  numVfs: 8
  nicSelector:
      deviceID: "158a"
      rootDevices:
      - 0000:3b:00.1
      vendor: "8086"
  deviceType: netdevice

$ kubectl get node -l node-role.kubernetes.io/worker;kubectl --context dell4 get all -n sriov-network-operator
NAME                                STATUS   ROLES    AGE   VERSION
mtx-dell4-bld01.dc1.matrixxsw.com   Ready    worker   50d   v1.29.6
NAME                                          READY   STATUS    RESTARTS        AGE
pod/sriov-device-plugin-z7qxr                 1/1     Running   0               15s
pod/sriov-network-config-daemon-td8h8         1/1     Running   1 (7m25s ago)   4d20h
pod/sriov-network-operator-55dbb4c9df-q48f4   1/1     Running   0               4d20h

NAME                                         DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR                                            AGE
daemonset.apps/sriov-device-plugin           1         1         1       1            1           kubernetes.io/os=linux,node-role.kubernetes.io/worker=   19s
daemonset.apps/sriov-network-config-daemon   1         1         1       1            1           kubernetes.io/os=linux,node-role.kubernetes.io/worker=   4d20h

NAME                                     READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/sriov-network-operator   1/1     1            1           4d20h

NAME                                                DESIRED   CURRENT   READY   AGE
replicaset.apps/sriov-network-operator-55dbb4c9df   1         1         1       4d20h

$ kubectl get no -o json | jq -r '[.items[] | {name:.metadata.name, allocable:.status.allocatable}]'
[
  {
    "name": "mtx-dell4-bld01.dc1.matrixxsw.com",
    "allocable": {
      "cpu": "64",
      "ephemeral-storage": "213255452729",
      "hugepages-1Gi": "0",
      "hugepages-2Mi": "0",
      "memory": "394187256Ki",
      "openshift.io/ens1f1": "8",
      "pods": "110"
    }
  },
...

Created a SriovNetwork sriovnetwork-ens1f1 using host-local ipam,
verified a NetworkAttachmentDefinition with same name auto created,
then I created a pod with annotation k8s.v1.cni.cncf.io/networks: sriovnetwork-ens1f1, pod started ok too.

$ cat sriovnetwork-ens1f1.yaml
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetwork
metadata:
  name: sriovnetwork-ens1f1
  namespace: sriov-network-operator
spec:
  ipam: |
    {
      "type": "host-local",
      "subnet": "100.100.20.0/24",
      "rangeStart": "100.100.20.100",
      "rangeEnd": "100.100.20.200",
      "routes": [{
        "dst": "0.0.0.0/0"
      }],
      "gateway": "100.100.20.1"
    }
  vlan: 20
  resourceName: ens1f1

@hymgg
Copy link
Author

hymgg commented Sep 6, 2024

Next 2 questions,

1.) do we support whereabouts ipam? or what ipam should we use so pods on the same sriov network can talk to each other?

After above success, I deleted test pod, and the SriovNetwork, changed its ipam from host-local to whereabouts, and recreated it. but the pod failed to create, error from describe pod:

ERRORED: error configuring pod [sriov-network-operator/test1] networking: [sriov-network-operator/test1/44964362-090f-4ed3-aff6-21d42757a3aa:sriovnetwork-ens1f1]: error adding container to network "sriovnetwork-ens1f1": IPAM plugin returned missing IP config

2.) how do I create a SriovNetwork in a difference namespace? I tried modify namespace in above SriovNetwork yaml and apply, found nothing in new ns.

Thanks. -Jessica

@hymgg
Copy link
Author

hymgg commented Sep 17, 2024

@SchSeba Could you guide us on the 2 questions above?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants