- Prerequisites
- Creating SR-IOV Networks for Worker Nodes
- Creating SR-IOV Worker Nodes in IPI
- Install the SRIOV Network Operator and configure a network device
- Attach the OVS HW offload network
- Deploy a testpmd pod
- Deploy a testpmd pod with OVS Hardware Offload
- Creating SR-IOV Worker Nodes in UPI
Single Root I/O Virtualization (SR-IOV) networking in OpenShift can benefit applications that require high bandwidth and low latency. To plan an OpenStack deployment that uses SR-IOV network interface cards (NICs), refer to the OSP 16.1 installation documentation. you install an OpenShift cluster on OpenStack, make sure that the NICs that your OpenStack nodes use are supported for use with SR-IOV in OpenShift, and that your tenant has access to them. Your OpenStack cluster must meet the following quota requirements for each OpenShift node that has an attached SR-IOV NIC:
- One instance from the RHOSP quota
- One port attached to the machines subnet
- One port for each SR-IOV Virtual Function
- A flavor with at least 16 GB memory, 4 vCPUs, and 100 GB storage space
For all clusters that use single-root input/output virtualization (SR-IOV), RHOSP compute nodes require a flavor that supports huge pages. Deploying worker nodes with SR-IOV networks is supported as a post-install operation for both IPI and UPI workflows. After you verify that your OpenStack cluster can support SR-IOV in OpenShift and you install an OpenShift cluster that meets the minimum requirements, use the following steps and examples to create worker nodes with SR-IOV NICs.
If you need to configure your deployment for real-time or low latency workloads, you'll need to create a PerformanceProfile.
After your OpenShift control plane is running, you must install the SR-IOV Network Operator. To install the Operator, you will need access to an account on your OpenShift cluster that has cluster-admin
privileges. After you log in to the account, install the Operator. Then, configure your SR-IOV network device.
You must create SR-IOV networks to attach to worker nodes before you create the nodes. Reference the following example of how to create radio and uplink provider networks in OpenStack:
# Create Networks
openstack network create radio --provider-physical-network radio --provider-network-type vlan --provider-segment 120
openstack network create uplink --provider-physical-network uplink --provider-network-type vlan --external
# Create Subnets
openstack subnet create --network radio --subnet-range <radio_network_subnet_range> radio
openstack subnet create --network uplink --subnet-range <uplink_network_subnet_range> uplink
You can create worker nodes as a post-IPI-install operation by using the machine API. To create a new set of worker nodes, create a new machineSet in OpenShift.
oc get machineset -n openshift-machine-api <machineset_name> -o yaml > sriov_machineset.yaml
When editing an existing machineSet (or a copy of one) to create SR-IOV worker nodes, add each subnet that is configured for SR-IOV to the machineSet's providerSpec
. The following example attaches ports from the radio
and uplink
subnets, which were created in the previous example, to all of the worker nodes in the machineSet. For all SR-IOV ports, you must set the following parameters:
nicType: direct
portSecurity:false
Note that security groups or allowedAddressPairs can not be set on a port if portSecurity
is disabled. If you are using a network with port security disabled, then allowed address pairs and security groups cannot be used for any port in that network. Setting security groups on the instance will apply that security group to all ports attached to it, be aware of this when using networks with port security disabled. Right now, trunking is not enabled on ports defined in the ports
list, only the ports created by entries in the networks
or subnets
lists. The name of the port will be <machine-name>-<nameSuffix>
, and the nameSuffix
is required field in the port definition. Optionally, you can add tags to ports by adding them to the tags
list. The following example shows how a machineset can be created that creates SR-IOV capable ports on the Radio
and Uplink
networks and subnets that were defined in a previous example:
apiVersion: machine.openshift.io/v1beta1
kind: MachineSet
metadata:
labels:
machine.openshift.io/cluster-api-cluster: <infrastructure_ID>
machine.openshift.io/cluster-api-machine-role: <node_role>
machine.openshift.io/cluster-api-machine-type: <node_role>
name: <infrastructure_ID>-<node_role>
namespace: openshift-machine-api
spec:
replicas: <number_of_replicas>
selector:
matchLabels:
machine.openshift.io/cluster-api-cluster: <infrastructure_ID>
machine.openshift.io/cluster-api-machineset: <infrastructure_ID>-<node_role>
template:
metadata:
labels:
machine.openshift.io/cluster-api-cluster: <infrastructure_ID>
machine.openshift.io/cluster-api-machine-role: <node_role>
machine.openshift.io/cluster-api-machine-type: <node_role>
machine.openshift.io/cluster-api-machineset: <infrastructure_ID>-<node_role>
spec:
metadata:
providerSpec:
value:
apiVersion: openstackproviderconfig.openshift.io/v1alpha1
cloudName: openstack
cloudsSecret:
name: openstack-cloud-credentials
namespace: openshift-machine-api
flavor: <nova_flavor>
image: <glance_image_name_or_location>
serverGroupID: <optional_UUID_of_server_group>
kind: OpenstackProviderSpec
networks:
- subnets:
- uuid: <machines_subnet_uuid>
ports:
- networkID: <radio_network_uuid>
nameSuffix: radio
fixedIPs:
- subnetID: <radio_subnet_uuid>
tags:
- sriov
- radio
vnicType: direct
portSecurity: false
- networkID: <uplink_network_uuid>
nameSuffix: uplink
fixedIPs:
- subnetID: <uplink_subnet_uuid>
tags:
- sriov
- uplink
vnicType: direct
portSecurity: false
primarySubnet: <machines_subnet_uuid>
securityGroups:
- filter: {}
name: <infrastructure_ID>-<node_role>
serverMetadata:
Name: <infrastructure_ID>-<node_role>
openshiftClusterID: <infrastructure_ID>
tags:
- openshiftClusterID=<infrastructure_ID>
trunk: true
userDataSecret:
name: <node_role>-user-data
availabilityZone: <optional_openstack_availability_zone>
If your port is leveraging OVS Hardware Offload, then its configuration must be the following, so the port in Neutron will be created with the right capabilites:
(...)
ports:
- fixedIPs:
- subnetID: <radio_subnet_uuid>
nameSuffix: sriov
networkID: <radio_network_uuid>
portSecurity: false
profile:
capabilities: '[switchdev]'
tags:
- sriov
- radio
vnicType: direct
(...)
After you finish editing your machineSet, upload it to your OpenShift cluster:
oc create -f sriov_machineset.yaml
To create SR-IOV ports on a network with the port security disabled, you need to make additional changes to your machineSet due to security groups being set on the instance by default, and allowed address pairs automatically getting added to ports created through the networks
and subnets
interfaces. The solution is to define all of your ports with the ports
interface in your machineSet. Remember that the port for the machines subnet needs:
- allowed address pairs for your API and ingress vip ports
- the worker security group
- to be attached to the machines network and subnet
apiVersion: machine.openshift.io/v1beta1
kind: MachineSet
metadata:
labels:
machine.openshift.io/cluster-api-cluster: <infrastructure_ID>
machine.openshift.io/cluster-api-machine-role: <node_role>
machine.openshift.io/cluster-api-machine-type: <node_role>
name: <infrastructure_ID>-<node_role>
namespace: openshift-machine-api
spec:
replicas: <number_of_replicas>
selector:
matchLabels:
machine.openshift.io/cluster-api-cluster: <infrastructure_ID>
machine.openshift.io/cluster-api-machineset: <infrastructure_ID>-<node_role>
template:
metadata:
labels:
machine.openshift.io/cluster-api-cluster: <infrastructure_ID>
machine.openshift.io/cluster-api-machine-role: <node_role>
machine.openshift.io/cluster-api-machine-type: <node_role>
machine.openshift.io/cluster-api-machineset: <infrastructure_ID>-<node_role>
spec:
metadata: {}
providerSpec:
value:
apiVersion: openstackproviderconfig.openshift.io/v1alpha1
cloudName: openstack
cloudsSecret:
name: openstack-cloud-credentials
namespace: openshift-machine-api
flavor: <nova_flavor>
image: <glance_image_name_or_location>
kind: OpenstackProviderSpec
ports:
- allowedAddressPairs:
- ipAddress: <api_vip_port_IP>
- ipAddress: <ingress_vip_port_IP>
fixedIPs:
- subnetID: <machines_subnet_UUID>
nameSuffix: nodes
networkID: <machines_network_UUID>
securityGroups:
- <worker_security_group_UUID>
- networkID: <sriov_network_UUID>
nameSuffix: sriov
fixedIPs:
- subnetID: <sriov_subnet_UUID>
tags:
- sriov
vnicType: direct
portSecurity: False
primarySubnet: <machines_subnet_UUID>
serverMetadata:
Name: <infrastructure_ID>-<node_role>
openshiftClusterID: <infrastructure_ID>
tags:
- openshiftClusterID=<infrastructure_ID>
trunk: false
userDataSecret:
name: worker-user-data
Once the workers are deployed, you must label them as SR-IOV capable:
oc label node <node-name> feature.node.kubernetes.io/network-sriov.capable="true"
You must install the SR-IOV Network Operator. To install the Operator, you will need access to an account on your OpenShift cluster that has cluster-admin
privileges. After you log in to the account, install the Operator.
Then, configure your SR-IOV network device. Note that only netFilter
needs to be used from the nicSelector
, as we'll give the Neutron network ID used for SR-IOV traffic.
Example of SriovNetworkNodePolicy
named sriov1
:
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
name: sriov1
namespace: openshift-sriov-network-operator
spec:
deviceType: vfio-pci
isRdma: false
nicSelector:
netFilter: openstack/NetworkID:9144121f-bf90-4891-b061-323e4cd990ed
nodeSelector:
feature.node.kubernetes.io/network-sriov.capable: 'true'
numVfs: 1
priority: 99
resourceName: sriov1
Note: If the network device plugged to the network is not from Intel and is from Mellanox, then deviceType
must be set to netdevice
and isRdma
set to true
.
The SR-IOV network operator will automatically discover the devices connected on that network for each worker, and make them available for use by the CNF pods later.
This step can be skipped when not doing OVS Hardware offload. For OVS Hardware Offload, the network has to be attached via a host-device.
Create a file named network.yaml
:
spec:
additionalNetworks:
- name: hwoffload1
namespace: cnf
rawCNIConfig: '{ "cniVersion": "0.3.1", "name": "hwoffload1", "type": "host-device","pciBusId": "0000:00:05.0", "ipam": {}}'
type: Raw
And then run:
oc patch network.operator cluster --patch "$(cat network.yaml)" --type=merge
It usually takes about 15 seconds to apply the configuration.
Note: 0000:00:05.0
is the PCI Bus ID that corresponds to the device connected to OVS HW Offload, this can be discovered by running oc describe SriovNetworkNodeState -n openshift-sriov-network-operator
.
This pod is an example of how we can create a container that uses the hugepages, the reserved CPUs and the SR-IOV port:
apiVersion: v1
kind: Pod
metadata:
name: testpmd-sriov
namespace: mynamespace
spec:
containers:
- name: testpmd
command: ["sleep", "99999"]
image: registry.redhat.io/openshift4/dpdk-base-rhel8:v4.9
securityContext:
capabilities:
add: ["IPC_LOCK","SYS_ADMIN"]
privileged: true
runAsUser: 0
resources:
requests:
memory: 1000Mi
hugepages-1Gi: 1Gi
cpu: '2'
openshift.io/sriov1: 1
limits:
hugepages-1Gi: 1Gi
cpu: '2'
memory: 1000Mi
openshift.io/sriov1: 1
volumeMounts:
- mountPath: /dev/hugepages
name: hugepage
readOnly: False
volumes:
- name: hugepage
emptyDir:
medium: HugePages
More examples are documented here.
The same example as before, except this time we use the network for OVS Hardware Offload:
apiVersion: v1
kind: Pod
metadata:
name: testpmd-sriov
namespace: mynamespace
annotations:
k8s.v1.cni.cncf.io/networks: hwoffload1
spec:
containers:
- name: testpmd
command: ["sleep", "99999"]
image: registry.redhat.io/openshift4/dpdk-base-rhel8:v4.9
securityContext:
capabilities:
add: ["IPC_LOCK","SYS_ADMIN"]
privileged: true
runAsUser: 0
resources:
requests:
memory: 1000Mi
hugepages-1Gi: 1Gi
cpu: '2'
limits:
hugepages-1Gi: 1Gi
cpu: '2'
memory: 1000Mi
volumeMounts:
- mountPath: /dev/hugepages
name: hugepage
readOnly: False
volumes:
- name: hugepage
emptyDir:
medium: HugePages
Because UPI implementation depends largely on your deployment environment and requirements, there is no official script for deploying SR-IOV worker nodes. However, we can share a verified example that is based on the compute-nodes.yaml script to help you understand the process. To use the script, open up a terminal to the location of the inventory.yaml
and common.yaml
UPI Ansible scripts. In the following example, we add provider networks named radio
and uplink
to the inventory.yaml
file. Note that the count parameter specifies the number of virtual functions (VFs) to attach to each worker node. This code can also be found on github.
....
# If this value is non-empty, the corresponding floating IP will be
# attached to the bootstrap machine. This is needed for collecting logs
# in case of install failure.
os_bootstrap_fip: '203.0.113.20'
additionalNetworks:
- id: radio
count: 4
type: direct
port_security_enabled: no
- id: uplink
count: 4
type: direct
port_security_enabled: no
Next, create a file called compute-nodes.yaml
with this Ansible script:
- import_playbook: common.yaml
- hosts: all
gather_facts: no
vars:
worker_list: []
port_name_list: []
nic_list: []
tasks:
# Create the SDN/primary port for each worker node
- name: 'Create the Compute ports'
os_port:
name: "{{ item.1 }}-{{ item.0 }}"
network: "{{ os_network }}"
security_groups:
- "{{ os_sg_worker }}"
allowed_address_pairs:
- ip_address: "{{ os_ingressVIP }}"
with_indexed_items: "{{ [os_port_worker] * os_compute_nodes_number }}"
register: ports
# Tag each SDN/primary port with cluster name
- name: 'Set Compute ports tag'
command:
cmd: "openstack port set --tag {{ cluster_id_tag }} {{ item.1 }}-{{ item.0 }}"
with_indexed_items: "{{ [os_port_worker] * os_compute_nodes_number }}"
- name: 'List the Compute Trunks'
command:
cmd: "openstack network trunk list"
when: os_networking_type == "Kuryr"
register: compute_trunks
- name: 'Create the Compute trunks'
command:
cmd: "openstack network trunk create --parent-port {{ item.1.id }} {{ os_compute_trunk_name }}-{{ item.0 }}"
with_indexed_items: "{{ ports.results }}"
when:
- os_networking_type == "Kuryr"
- "os_compute_trunk_name|string not in compute_trunks.stdout"
- name: ‘Call additional-port processing’
include_tasks: additional-ports.yaml
# Create additional ports in OpenStack
- name: ‘Create additionalNetworks ports’
os_port:
name: "{{ item.0 }}-{{ item.1.name }}"
vnic_type: "{{ item.1.type }}"
network: "{{ item.1.uuid }}"
port_security_enabled: "{{ item.1.port_security_enabled|default(omit) }}"
no_security_groups: "{{ 'true' if item.1.security_groups is not defined else omit }}"
security_groups: "{{ item.1.security_groups | default(omit) }}"
with_nested:
- "{{ worker_list }}"
- "{{ port_name_list }}"
# Tag the ports with the cluster info
- name: 'Set additionalNetworks ports tag'
command:
cmd: "openstack port set --tag {{ cluster_id_tag }} {{ item.0 }}-{{ item.1.name }}"
with_nested:
- "{{ worker_list }}"
- "{{ port_name_list }}"
# Build the nic list to use for server create
- name: Build nic list
set_fact:
nic_list: "{{ nic_list | default([]) + [ item.name ] }}"
with_items: "{{ port_name_list }}"
# Create the servers
- name: 'Create the Compute servers'
vars:
worker_nics: "{{ [ item.1 ] | product(nic_list) | map('join','-') | map('regex_replace', '(.*)', 'port-name=\\1') | list }}"
os_server:
name: "{{ item.1 }}"
image: "{{ os_image_rhcos }}"
flavor: "{{ os_flavor_worker }}"
auto_ip: no
userdata: "{{ lookup('file', 'worker.ign') | string }}"
security_groups: []
nics: "{{ [ 'port-name=' + os_port_worker + '-' + item.0|string ] + worker_nics }}"
with_indexed_items: "{{ worker_list }}"
Create a new Ansible script named additional-ports.yaml
:
Build a list of worker nodes with indexes
- name: ‘Build worker list’
set_fact:
worker_list: "{{ worker_list | default([]) + [ item.1 + '-' + item.0 | string ] }}"
with_indexed_items: "{{ [ os_compute_server_name ] * os_compute_nodes_number }}"
# Ensure that each network specified in additionalNetworks exists
- name: ‘Verify additionalNetworks’
os_networks_info:
name: "{{ item.id }}"
with_items: "{{ additionalNetworks }}"
register: network_info
# Expand additionalNetworks by the count parameter in each network definition
- name: ‘Build port and port index list for additionalNetworks’
set_fact:
port_list: "{{ port_list | default([]) + [ {
'net_name' : item.1.id,
'uuid' : network_info.results[item.0].openstack_networks[0].id,
'type' : item.1.type|default('normal'),
'security_groups' : item.1.security_groups|default(omit),
'port_security_enabled' : item.1.port_security_enabled|default(omit)
} ] * item.1.count|default(1) }}"
index_list: "{{ index_list | default([]) + range(item.1.count|default(1)) | list }}"
with_indexed_items: "{{ additionalNetworks }}"
# Calculate and save the name of the port
# The format of the name is cluster_name-worker-workerID-networkUUID(partial)-count
# i.e. fdp-nz995-worker-1-99bcd111-1
- name: ‘Calculate port name’
set_fact:
port_name_list: "{{ port_name_list | default([]) + [ item.1 | combine( {'name' : item.1.uuid | regex_search('([^-]+)') + '-' + index_list[item.0]|string } ) ] }}"
with_indexed_items: "{{ port_list }}"
when: port_list is defined
Finally, run the compute-nodes.yaml
script as you normally would:
ansible-playbook -i inventory.yaml compute-nodes.yaml
Make sure to follow the documentation to approve the CSRs for your worker nodes, and to wait for the installation to complete to finalize your deployment.