WIP: Hypervisor per ESXi #233

fwiesel · 2021-08-26T13:26:01Z

Currently, the code only collects stats for the different nodes... Now I have to look how to represent it to placement.
After that, the question is, how to migrate the VMs between the hypervisor nodes.

Whiile nova already cleaned up its original image-cache, our patches also added images as templates into the mix that were not cleaned up prior to this patch and thus filling up the datastores. Change-Id: I2fc631d6ce0a9339be9237d63ae7e86d94779dcc (cherry picked from commit 97780bb) (cherry picked from commit ce5f026)

If we sort the by creation-time, we can stop looping over tasks once we find an event that's too old. All other VMs, we didn't find an event for, yet, must be expired anyways. Change-Id: I196bb9ab48867f314d9a5f3b7566384fa72778df (cherry picked from commit 642bcc7) (cherry picked from commit 2190382)

Since we want to keep images in the owner's folder again instead of uploading them to all projects using them, we need to also make sure we clean up everywhere - even in projects not having instances. Change-Id: I9344a9bb8c0436cb5c87f3328fa3b9d31dedbdbc (cherry picked from commit 943babe) (cherry picked from commit b1af943)

We need to support Python3 in the future and this is a low-hanging fruit. Change-Id: I86804af29f4c6aff14fa5495e5344377488ba8fe (cherry picked from commit 95c177f) (cherry picked from commit bce80c6)

We want to make sure we only clean up image-cache templates and no other VMs that might happen to lie in the "Images" folder. Change-Id: I55b7fa7ebbe14b13f579ebc39dcae2549ddedc9a (cherry picked from commit 67fda8f) (cherry picked from commit d18fb11)

Since every nova-compute node will be running the image-cache cleanup, we have to make sure this works in parallel. Therefore, we limit the cleanup of image-cache templates to the datastores the nova-compute node is responsible for - the ones configured in the datastore_regex config option. Change-Id: I5fae822a08bcc06565c64f959553cf7082bb2423 (cherry picked from commit 0d8d760) (cherry picked from commit 5f21c7d)

Even with image-as-template disabled we have to run the image-template cleanup to ensure removal after the setting is switched off and because we switched all our image-upload to using those image-templates. Change-Id: I924fbc42f014ecf4c0342246d48f27b6ba1d5c77 (cherry picked from commit 44b8620) (cherry picked from commit 7331446)

Otherwise, we might copy the same image to the datastore multiple times, just because VMs from multiple projects are deployed from that (shared) image. Change-Id: I02721da655ec505d7c3c5d3c9faca1be77dce813 (cherry picked from commit 95b6176) (cherry picked from commit aea1268)

Every image we upload automatically ends up as template-VM. We can leverage this in normal deployments and copy the VMDK from the image-template, if it still exists. If it doesn't exist anymore, we can try to copy another image-template from another datastore and use its VMDK instead - that should be still faster than copying stuff from glance again. If all fails, we turn back to the original approach of loading the image from glance. Change-Id: I659d22e494a86fe4e567306062784313432d11ee (cherry picked from commit cfa3eb8) (cherry picked from commit a0d9db9)

Until now, the logs during a clean-up showed "Destroyed VM" messages, but one could not find out, which VM was destroyed. Therefore, we add an additional log message to point out that and which image-template VM we're going to destroy. Change-Id: I7429fca0175ec4593689466be6dcc0cb2482cb9f (cherry picked from commit 557e34e) (cherry picked from commit 2d6f122)

There are cases, where we don't want to blindly try and delete a VM, but instead know the outcome for certain types of errors at least. For supporting this case, vm_util.destroy_vm() has to re-raise certain types of exceptions. In the case we're looking for, that's the "InvalidArgument" fault. For this to work like before, code calling destroy_vm() needs to catch these exceptions, too. Change-Id: I28d2f1b94b8439cfea88146746ae6e59d61f087c (cherry picked from commit e112c5e) (cherry picked from commit 73a9a27)

While we don't know why, yet, we've seen image-cache templates missing their directory on the datastore. For the image-template cleanup this means, that Nova cannot destroy those templates, as this raises an InvalidArgument fault on VMware's side, tellung us that ConfigSpec.files.vmPathName is invalid. Since we need to clean those templates up anyways to create a new, usable one, we catch those errors and call UnregisterVM as a fallback. For getting templates from another datastores or re-using the already existing template to copy its disk, we have to catch FileNotFoundException. If this situation occurs, we have to clean that broken template up and also let nova continue the search for a source of the disk for the new VM. Change-Id: Id6d1cc1cd7a50958c77a1417e3f2aed7b9672e15 (cherry picked from commit 1b342cd) (cherry picked from commit 13940c5)

If the DRS policy is not set to "fullyAutomated", i.e. if DRS will not take action on existing, running VMs, we have not change to free up a host and thus don't have to try to find one. Instead, we error out to tell nova-bigvm to search for another cluster. Change-Id: Idbdfe82b4057844401e710fb9d87141478bb3353 (cherry picked from commit e98d6ae) (cherry picked from commit 0e8f17e)

Support for the E1000 driver is phasing out in guest operating systems, so we switch to E1000e, which is also faster and has more hardware offloading capabilities. Change-Id: I08ac32f914a57d3eb7328351a07a20a2ef212cb8 (cherry picked from commit 5e7556d) fix unit tests for vmware: Switch default VIF model (cherry picked from commit 425070f) (cherry picked from commit a3b173a)

Original code can be found at: https://opendev.org/x/nova-mksproxy Integrate the code into nova as a builtin command. Subclass `base.SecurityProxy` into `MksSecurityProxy`. Because MKS' `connect_info.internal_access_path` contains more than just a `path` (namely a JSON object with "ticket", "cfgFile" & "thumbprint") add a new `parse_internal_access_path()` method in `NovaProxyRequestHandler`. This method tries to parse `internal_acess_path` as JSON and if that fails, puts the contents in the "path" key of a new `internal_access_path_data` dict. Co-authored-by: Johannes Kulik <[email protected]>

Has to be "nova-{NAME}proxy".

The old script had a --verbose flag that we are using in the k8s deployment definition for all regions. This will include both Xena and older Rocky deployments. The new name of this flag is --mks-verbose. Instead of adding an image version check in the helm-chart, add an alias for the flag instead, for the duration of the update, and revert this after the Xena upgrade is through everywhere.

Physical CPUs are faster than hyperthreaded CPUs. So by default VMware spreads a VM's CPU cores onto physical cores, spanning multiple NUMA nodes if needed although the vCPUs would fit onto a smaller number of NUMA nodes. HANA VMs prefer low-latency (and thus NUMA-locality) over raw performance, so they set the VMware config `preferHT` to not get spread out. Before this setting was only applied if the VM's flavor qualified as a "Big VM" flavor (i.e. >1GiB) in size. This excluded smaller HANA flavors and a single-NUMA-node VM got spread over two nodes. Make `preferHT` depend on whether the flavor requires one of our `CUSTOM_NUMASIZE_*` traits instead.

Another best-practice recommendation for HANA-on-VMware. The code pretty much duplicates the one for setting the DRS partially- automated behavior. I decided against deduplication and abstraction in favor of readability and at the cost of LOC.

Testing showed that single- and half-numa-node VMs get spawned across multiple NUMA nodes due to `numa.autosize.vcpu.maxPerVirtualNode` being too low. Set the value explictly to the same value that's used for the `CoresPerSocket` cluster-vm setting.

Introduced in 'vmware: Restart BigVMs with "high" priority', the function "update_cluster_das_vm_override()" did not work, because it created a "ClusterDasVmConfigInfo" instead of a "ClusterDasVmConfigSpec" object. This lead to us not being able to spawn BigVMs with the following error: Exception in ReconfigureComputeResource_Task. Cause: Type not found: 'operation' Change-Id: If9acf9ee07e373b7b24c14c642d0d99fe2a41db1

Copy the relevant code from the libvirt driver. This reintroduces the code that was changed (presumably by accident) in fd4f43b VmWare: Refactored resource and inventory collection Fixup that commit with this one when forward porting!

`remove_unused_original_minimum_age_seconds` along with several other config options related to image cache management were moved to their own config group `image_cache` in Ussuri (nova 21), but the cherry- pick in 9de18bd "vmware: Update expired image-template-VM handling" failed to account for this.

Was removed in f3cc311 "vmware: Remove vestigial nova-network support" but not taken into account by aba0dbb "VMware: Image as VM template".

In daeeafd "baremetal quota handling" the `_build_per_flavor_limits()` method is incorrectly rewritten to match surrounding code, esp. `_build_absolute_limits()`, and uses not only the limit, but limit and in_use as value for the resulting limit dict. The original commit, 16857ed "reintroduce baremetal quota handling" used `absolute_limits` directly as method parameter.

For historic reasons, there is a mapping between the volume-uuid and the device[].backing.uuid in vmware through extraConfig. In most cases they are the same, but sometimes they are not. And apparently, the extraProperties can even hold wrong information. Since the risk of a collision of uuids is quite low, and lookup is the iteration over a small list of volumes we first try it that way, avoiding possibly incorrect information in the extraConfig. Failing that, we still do the additional api call to get the mapping, and try that. Work around type errors mypy raises in the changed file Change-Id: Ifcdf96cfc6d00473299c1f2a7cb9d23d03294027

The ironic-python-agent uses qemu-img to convert various image formats to the raw disk format. QCow2 is already correctly marked accordingly, but vmdk was missing Change-Id: Ifd868e6951452b291184bd848d4244d37649f1ed

We need to seperate the arguments for tox from the ones to the linter Change-Id: I6d7ee1d95f0ca17fa6c04a3546a7f907cc6393f1

Nested `ClusterDasVmSettings` inside new `ClusterDasVmConfigInfo` objects don't get created properly. This seems to have worked on Rocky but fails on Xena.

Otherwise will try and decode as UTF-8-encoded text during read().

We have added custom filters and weighers for our common use-case VMware, and excluded there already baremetal. For libvirt as additional hypervisor, we also want to skip those filters/weighers, so we test now if the request is scheduled for vmware either due to a flavor extra_spec or an image property If nothing is set, we default to vmware for backwards compatibility Change-Id: I3223aee433ba9009d2cd6387eeda8e70dbbb3cde

The code assumes that the libvirt version reported by the hypervisor matches the version of the python library installed. Since we run the agent in a container, that is not correct, and the attribute will not be in the library as it has been build for an older version. Change-Id: I156047b1f9829a49b429d51ca7f7777606b10a56

Change-Id: I3edf0d3c6deb8385adc1095c7e3c09526e2d4d65

This command detects changes in the flavors and syncs the instances flavor-info with the new changes. `nova-manage sap sync_instances_flavor`

The vcenter doesn't like a reconfiguration call with an empty spec, and raises an error. So we skip that and save ourselves from doing some work on top. Change-Id: I1da3309500a2cd384c5d7cd431e71297ef5d5a3a

You can configure now [vmware] hypervisor_mode=<cluster,esxi_to_cluster,cluster_to_esxi> to chose to expose the individual esxi hosts instead of the cluster. It will only show the esxi-host, as hypervisor node, but will not honour the node parameter for placeing the instance for instance-creation, migration, etc... Change-Id: I264fd242c0de01ae8442c03bc726a0abfbe176ef

The code assumes that there is a single compute-node per host, which is incorrect for ironic. By summarising over all hosts, we get a correct response also for compute-hosts with more than one compute-node. Change-Id: Iaf6e2a72f6649e234660de47bec8b1da1ea1571e

grandchild · 2023-04-28T13:44:21Z

that's quite few commits. is this meant for xena?

fwiesel · 2023-05-02T08:35:23Z

The PR was meant for rocky, but I've updated the branch for xena.
I've closed this in favour of #423

grandchild · 2023-05-02T08:43:57Z

fyi, it should be possible to change the target branch within the existing PR -- but all good, let's continue in #423.

fwiesel force-pushed the hypervisor_per_esxi branch 2 times, most recently from 27166a8 to d5c92e7 Compare August 27, 2021 11:24

fwiesel force-pushed the hypervisor_per_esxi branch 5 times, most recently from 59ccaf7 to 57ca92b Compare September 21, 2021 07:49

fwiesel force-pushed the hypervisor_per_esxi branch from 57ca92b to 8999084 Compare October 21, 2021 14:20

fwiesel force-pushed the hypervisor_per_esxi branch from 8999084 to e99307a Compare November 3, 2021 09:30

fwiesel force-pushed the hypervisor_per_esxi branch 2 times, most recently from 7fd3978 to be8ed79 Compare November 10, 2021 10:02

fwiesel force-pushed the hypervisor_per_esxi branch from be8ed79 to 8864d38 Compare December 1, 2021 08:45

fwiesel force-pushed the hypervisor_per_esxi branch from 8864d38 to 244fb1f Compare December 21, 2021 10:17

fwiesel force-pushed the hypervisor_per_esxi branch from 244fb1f to 67d41f4 Compare December 30, 2021 10:38

fwiesel force-pushed the hypervisor_per_esxi branch from 67d41f4 to e77d660 Compare January 12, 2022 15:48

fwiesel force-pushed the hypervisor_per_esxi branch from e77d660 to 3d4a22d Compare March 8, 2022 09:33

imitevm and others added 14 commits August 8, 2022 15:46

vmware: Make HistoryCollector python3-ready

9522589

We need to support Python3 in the future and this is a low-hanging fruit. Change-Id: I86804af29f4c6aff14fa5495e5344377488ba8fe (cherry picked from commit 95c177f) (cherry picked from commit bce80c6)

grandchild and others added 25 commits January 26, 2023 12:01

mks-proxy: Fix missing entrypoint

c52003a

tox: fix multiline passenv for tox4+

24ff3f7

mks-proxy: Fix entrypoint name

90ed620

Has to be "nova-{NAME}proxy".

vmware: Add resource overcommit ratio support

3f75d35

Copy the relevant code from the libvirt driver. This reintroduces the code that was changed (presumably by accident) in fd4f43b VmWare: Refactored resource and inventory collection Fixup that commit with this one when forward porting!

vmware: Remove get_vif_info() is_neutron param usage

81f4adb

Was removed in f3cc311 "vmware: Remove vestigial nova-network support" but not taken into account by aba0dbb "VMware: Image as VM template".

[ironic] IPA imports VMDKs via qemu-img just fine

41a2827

The ironic-python-agent uses qemu-img to convert various image formats to the raw disk format. QCow2 is already correctly marked accordingly, but vmdk was missing Change-Id: Ifd868e6951452b291184bd848d4244d37649f1ed

Fix linting with tox4

e1f3e4c

We need to seperate the arguments for tox from the ones to the linter Change-Id: I6d7ee1d95f0ca17fa6c04a3546a7f907cc6393f1

vmware: Fix AttrError when setting restart-priority

a9435f4

Nested `ClusterDasVmSettings` inside new `ClusterDasVmConfigInfo` objects don't get created properly. This seems to have worked on Rocky but fails on Xena.

vmware: Read iso file in binary mode during upload

686b516

Otherwise will try and decode as UTF-8-encoded text during read().

Use pre-commit from openstack/nova master

f4b04c6

Change-Id: I3edf0d3c6deb8385adc1095c7e3c09526e2d4d65

nova-manage: add sync_instances_flavor command

d314bd1

This command detects changes in the flavors and syncs the instances flavor-info with the new changes. `nova-manage sap sync_instances_flavor`

vmware: Do not reconfigure vm without nics in finsh_migration

63be3bc

The vcenter doesn't like a reconfiguration call with an empty spec, and raises an error. So we skip that and save ourselves from doing some work on top. Change-Id: I1da3309500a2cd384c5d7cd431e71297ef5d5a3a

fwiesel force-pushed the hypervisor_per_esxi branch from 3d4a22d to 30fc702 Compare March 31, 2023 14:02

fwiesel closed this May 2, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: Hypervisor per ESXi #233

WIP: Hypervisor per ESXi #233

fwiesel commented Aug 26, 2021

grandchild commented Apr 28, 2023

fwiesel commented May 2, 2023

grandchild commented May 2, 2023

WIP: Hypervisor per ESXi #233

WIP: Hypervisor per ESXi #233

Conversation

fwiesel commented Aug 26, 2021

grandchild commented Apr 28, 2023

fwiesel commented May 2, 2023

grandchild commented May 2, 2023