Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle mdev devices in libvirt 7.7+ #59

Draft
wants to merge 6 commits into
base: stackhpc/wallaby
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/admin/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -112,6 +112,7 @@ instance for these kind of workloads.
virtual-gpu
file-backed-memory
ports-with-resource-requests
vdpa
virtual-persistent-memory
emulated-tpm
uefi
Expand Down
92 changes: 92 additions & 0 deletions doc/source/admin/vdpa.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
============================
Using ports vnic_type='vdpa'
============================
.. versionadded:: 23.0.0 (Wallaby)

Introduced support for vDPA.

.. important::
The functionality described below is only supported by the
libvirt/KVM virt driver.

The kernel vDPA (virtio Data Path Acceleration) framework
provides a vendor independent framework for offloading data-plane
processing to software or hardware virtio device backends.
While the kernel vDPA framework supports many types of vDPA devices,
at this time nova only support ``virtio-net`` devices
using the ``vhost-vdpa`` front-end driver. Support for ``virtio-blk`` or
``virtio-gpu`` may be added in the future but is not currently planned
for any specific release.

vDPA device tracking
~~~~~~~~~~~~~~~~~~~~
When implementing support for vDPA based neutron ports one of the first
decisions nova had to make was how to model the availability of vDPA devices
and the capability to virtualize vDPA devices. As the initial use-case
for this technology was to offload networking to hardware offload OVS via
neutron ports the decision was made to extend the existing PCI tracker that
is used for SR-IOV and pci-passthrough to support vDPA devices. As a result
a simplification was made to assume that the parent device of a vDPA device
is an SR-IOV Virtual Function (VF). As a result software only vDPA device such
as those created by the kernel ``vdpa-sim`` sample module are not supported.

To make vDPA device available to be scheduled to guests the operator should
include the device using the PCI address or vendor ID and product ID of the
parent VF in the PCI ``device_spec``.
See: :nova-doc:`pci-passthrough <admin/pci-passthrough>` for details.

Nova will not create the VFs or vDPA devices automatically. It is expected
that the operator will allocate them before starting the nova-compute agent.
While no specific mechanisms is prescribed to do this udev rules or systemd
service files are generally the recommended approach to ensure the devices
are created consistently across reboots.

.. note::
As vDPA is an offload only for the data plane and not the control plane a
vDPA control plane is required to properly support vDPA device passthrough.
At the time of writing only hardware offloaded OVS is supported when using
vDPA with nova. Because of this vDPA devices cannot be requested using the
PCI alias. While nova could allow vDPA devices to be requested by the
flavor using a PCI alias we would not be able to correctly configure the
device as there would be no suitable control plane. For this reason vDPA
devices are currently only consumable via neutron ports.

Virt driver support
~~~~~~~~~~~~~~~~~~~

Supporting neutron ports with ``vnic_type=vdpa`` depends on the capability
of the virt driver. At this time only the ``libvirt`` virt driver with KVM
is fully supported. QEMU may also work but is untested.

vDPA support depends on kernel 5.7+, Libvirt 6.9.0+ and QEMU 5.1+.

vDPA lifecycle operations
~~~~~~~~~~~~~~~~~~~~~~~~~

At this time vDPA ports can only be added to a VM when it is first created.
To do this the normal SR-IOV workflow is used where by the port is first created
in neutron and passed into nova as part of the server create request.

.. code-block:: bash

openstack port create --network <my network> --vnic-type vdpa vdpa-port
openstack server create --flavor <my-flavor> --image <my-image> --port <vdpa-port uuid> vdpa-vm

When vDPA support was first introduced no move operations were supported.
As this documentation was added in the change that enabled some move operations
The following should be interpreted both as a retrospective and future looking
viewpoint and treated as a living document which will be updated as functionality evolves.

23.0.0: initial support is added for creating a VM with vDPA ports, move operations
are blocked in the API but implemented in code.
26.0.0: support for all move operation except live migration is tested and api blocks are removed.
25.x.y: (planned) api block removal backported to stable/Yoga
24.x.y: (planned) api block removal backported to stable/Xena
23.x.y: (planned) api block removal backported to stable/wallaby
26.0.0: (in progress) interface attach/detach, suspend/resume and hot plug live migration
are implemented to fully support all lifecycle operations on instances with vDPA ports.

.. note::
The ``(planned)`` and ``(in progress)`` qualifiers will be removed when those items are
completed. If your current version of the document contains those qualifiers then those
lifecycle operations are unsupported.
8 changes: 0 additions & 8 deletions nova/compute/api.py
Original file line number Diff line number Diff line change
Expand Up @@ -3969,9 +3969,6 @@ def _validate_host_for_cold_migrate(

# TODO(stephenfin): This logic would be so much easier to grok if we
# finally split resize and cold migration into separate code paths
# FIXME(sean-k-mooney): Cold migrate and resize to different hosts
# probably works but they have not been tested so block them for now
@reject_vdpa_instances(instance_actions.RESIZE)
@block_accelerators()
@check_instance_lock
@check_instance_state(vm_state=[vm_states.ACTIVE, vm_states.STOPPED])
Expand Down Expand Up @@ -4190,9 +4187,6 @@ def _allow_resize_to_same_host(self, cold_migrate, instance):
allow_same_host = CONF.allow_resize_to_same_host
return allow_same_host

# FIXME(sean-k-mooney): Shelve works but unshelve does not due to bug
# #1851545, so block it for now
@reject_vdpa_instances(instance_actions.SHELVE)
@reject_vtpm_instances(instance_actions.SHELVE)
@block_accelerators(until_service=54)
@check_instance_lock
Expand Down Expand Up @@ -5273,8 +5267,6 @@ def live_migrate_abort(self, context, instance, migration_id,
self.compute_rpcapi.live_migration_abort(context,
instance, migration.id)

# FIXME(sean-k-mooney): rebuild works but we have not tested evacuate yet
@reject_vdpa_instances(instance_actions.EVACUATE)
@reject_vtpm_instances(instance_actions.EVACUATE)
@block_accelerators(until_service=SUPPORT_ACCELERATOR_SERVICE_FOR_REBUILD)
@check_instance_state(vm_state=[vm_states.ACTIVE, vm_states.STOPPED,
Expand Down
2 changes: 2 additions & 0 deletions nova/compute/manager.py
Original file line number Diff line number Diff line change
Expand Up @@ -10808,6 +10808,8 @@ def _update_migrate_vifs_profile_with_pci(self,
profile['pci_slot'] = pci_dev.address
profile['pci_vendor_info'] = ':'.join([pci_dev.vendor_id,
pci_dev.product_id])
if pci_dev.mac_address:
profile['device_mac_address'] = pci_dev.mac_address
mig_vif.profile = profile
LOG.debug("Updating migrate VIF profile for port %(port_id)s:"
"%(profile)s", {'port_id': port_id,
Expand Down
7 changes: 6 additions & 1 deletion nova/compute/resource_tracker.py
Original file line number Diff line number Diff line change
Expand Up @@ -326,7 +326,12 @@ def _move_claim(self, context, instance, new_instance_type, nodename,
migration_id=migration.id,
old_numa_topology=instance.numa_topology,
new_numa_topology=claim.claimed_numa_topology,
old_pci_devices=instance.pci_devices,
# NOTE(gibi): the _update_usage_from_migration call below appends
# the newly claimed pci devices to the instance.pci_devices list
# to keep the migration context independent we need to make a copy
# that list here. We need a deep copy as we need to duplicate
# the instance.pci_devices.objects list
old_pci_devices=copy.deepcopy(instance.pci_devices),
new_pci_devices=claimed_pci_devices,
old_pci_requests=instance.pci_requests,
new_pci_requests=new_pci_requests,
Expand Down
60 changes: 41 additions & 19 deletions nova/network/neutron.py
Original file line number Diff line number Diff line change
Expand Up @@ -666,8 +666,10 @@ def _unbind_ports(self, context, ports,
# NOTE: We're doing this to remove the binding information
# for the physical device but don't want to overwrite the other
# information in the binding profile.
for profile_key in ('pci_vendor_info', 'pci_slot',
constants.ALLOCATION):
for profile_key in (
'pci_vendor_info', 'pci_slot',
constants.ALLOCATION, 'device_mac_address'
):
if profile_key in port_profile:
del port_profile[profile_key]
port_req_body['port'][constants.BINDING_PROFILE] = port_profile
Expand Down Expand Up @@ -1181,6 +1183,11 @@ def _update_ports_for_instance(self, context, instance, neutron,
context, instance, request.pci_request_id, port_req_body,
network=network, neutron=neutron,
bind_host_id=bind_host_id)

# NOTE(gibi): Remove this once we are sure that the fix for
# bug 1942329 is always present in the deployed neutron. The
# _populate_neutron_extension_values() call above already
# populated this MAC to the binding profile instead.
self._populate_pci_mac_address(instance,
request.pci_request_id, port_req_body)

Expand Down Expand Up @@ -1496,11 +1503,27 @@ def _get_port_binding(context, client, port_id, host):
def _get_pci_device_profile(self, pci_dev):
dev_spec = self.pci_whitelist.get_devspec(pci_dev)
if dev_spec:
return {'pci_vendor_info': "%s:%s" %
(pci_dev.vendor_id, pci_dev.product_id),
'pci_slot': pci_dev.address,
'physical_network':
dev_spec.get_tags().get('physical_network')}
dev_profile = {
'pci_vendor_info': "%s:%s"
% (pci_dev.vendor_id, pci_dev.product_id),
'pci_slot': pci_dev.address,
'physical_network': dev_spec.get_tags().get(
'physical_network'
),
}
if pci_dev.dev_type == obj_fields.PciDeviceType.SRIOV_PF:
# In general the MAC address information flows fom the neutron
# port to the device in the backend. Except for direct-physical
# ports. In that case the MAC address flows from the physical
# device, the PF, to the neutron port. So when such a port is
# being bound to a host the port's MAC address needs to be
# updated. Nova needs to put the new MAC into the binding
# profile.
if pci_dev.mac_address:
dev_profile['device_mac_address'] = pci_dev.mac_address

return dev_profile

raise exception.PciDeviceNotFound(node_id=pci_dev.compute_node_id,
address=pci_dev.address)

Expand Down Expand Up @@ -3367,15 +3390,14 @@ def _get_pci_mapping_for_migration(self, instance, migration):
migration.get('status') == 'reverted')
return instance.migration_context.get_pci_mapping_for_migration(revert)

def _get_port_pci_slot(self, context, instance, port):
"""Find the PCI address of the device corresponding to the port.
def _get_port_pci_dev(self, instance, port):
"""Find the PCI device corresponding to the port.
Assumes the port is an SRIOV one.

:param context: The request context.
:param instance: The instance to which the port is attached.
:param port: The Neutron port, as obtained from the Neutron API
JSON form.
:return: The PCI address as a string, or None if unable to find.
:return: The PciDevice object, or None if unable to find.
"""
# Find the port's PCIRequest, or return None
for r in instance.pci_requests.requests:
Expand All @@ -3395,8 +3417,7 @@ def _get_port_pci_slot(self, context, instance, port):
LOG.debug('No PCI device found for request %s',
request.request_id, instance=instance)
return None
# Return the device's PCI address
return device.address
return device

def _update_port_binding_for_instance(
self, context, instance, host, migration=None,
Expand Down Expand Up @@ -3460,13 +3481,14 @@ def _update_port_binding_for_instance(
raise exception.PortUpdateFailed(port_id=p['id'],
reason=_("Unable to correlate PCI slot %s") %
pci_slot)
# NOTE(artom) If migration is None, this is an unshevle, and we
# need to figure out the pci_slot from the InstancePCIRequest
# and PciDevice objects.
# NOTE(artom) If migration is None, this is an unshelve, and we
# need to figure out the pci related binding information from
# the InstancePCIRequest and PciDevice objects.
else:
pci_slot = self._get_port_pci_slot(context, instance, p)
if pci_slot:
binding_profile.update({'pci_slot': pci_slot})
pci_dev = self._get_port_pci_dev(instance, p)
if pci_dev:
binding_profile.update(
self._get_pci_device_profile(pci_dev))
updates[constants.BINDING_PROFILE] = binding_profile

# NOTE(gibi): during live migration the conductor already sets the
Expand Down
19 changes: 19 additions & 0 deletions nova/objects/pci_device.py
Original file line number Diff line number Diff line change
Expand Up @@ -149,6 +149,12 @@ def obj_make_compatible(self, primitive, target_version):
reason='dev_type=%s not supported in version %s' % (
dev_type, target_version))

def __repr__(self):
return (
f'PciDevice(address={self.address}, '
f'compute_node_id={self.compute_node_id})'
)

def update_device(self, dev_dict):
"""Sync the content from device dictionary to device object.

Expand Down Expand Up @@ -176,6 +182,9 @@ def update_device(self, dev_dict):
# NOTE(ralonsoh): list of parameters currently added to
# "extra_info" dict:
# - "capabilities": dict of (strings/list of strings)
# - "parent_ifname": the netdev name of the parent (PF)
# device of a VF
# - "mac_address": the MAC address of the PF
extra_info = self.extra_info
data = v if isinstance(v, str) else jsonutils.dumps(v)
extra_info.update({k: data})
Expand Down Expand Up @@ -512,6 +521,13 @@ def free(self, instance=None):
def is_available(self):
return self.status == fields.PciDeviceStatus.AVAILABLE

@property
def mac_address(self):
"""The MAC address of the PF physical device or None if the device is
not a PF or if the MAC is not available.
"""
return self.extra_info.get('mac_address')


@base.NovaObjectRegistry.register
class PciDeviceList(base.ObjectListBase, base.NovaObject):
Expand Down Expand Up @@ -551,3 +567,6 @@ def get_by_parent_address(cls, context, node_id, parent_addr):
parent_addr)
return base.obj_make_list(context, cls(context), objects.PciDevice,
db_dev_list)

def __repr__(self):
return f"PciDeviceList(objects={[repr(obj) for obj in self.objects]})"
Loading
Loading