Port Rocky patches onto Xena #350

joker-at-work · 2022-08-08T13:42:05Z

This includes some master fixes, because the tests wouldn't run through otherwise.

(cherry picked from commit bd338c3) (cherry picked from commit 7976a56)

(cherry picked from commit 4300ca8) (cherry picked from commit 30dfd92)

(e.g. when all hosts in maintainence) (cherry picked from commit 4a32b79) (cherry picked from commit fa208ae)

(cherry picked from commit a78c03e) (cherry picked from commit 70e9676)

…onsole - raising exception.ConsoleTypeUnavailable if the AcquireTicket operation throws InvalidPowerState (cherry picked from commit 7c6c326) (cherry picked from commit 248ab92)

This fix adds configuration flag for avoiding migrations across clusters. Setting `always_resize_on_same_host` config option will put the instance host in `force_hosts` resulting in resize the instance on the same host. (cherry picked from commit 935060e) (cherry picked from commit 1d38c7f)

Nova regularly checks the power state of every VM on the hypervisor. The default behaviour for VMs not matching the requested power state is to call the stop API. But since vmware hypervisors are actually a cluster of hypervisors that do HA on their own. In the process of doing HA failover, especially if that failover fails for some time e.g. because of a still held lock, VMs are reported as SHUTDOWN by the API. If we call the stop API then, nova will shut down the VM once it's RUNNING again, because the stop API sets the expected/user-wanted VM state to SHUTDOWN. This would effectively disable the VMware HA. Therefore, we're introducing a new setting `sync_power_state_unexpected_call_stop` which can be used to disable the call to the stop API. (cherry picked from commit d915a1b) Sync `vm_state` on power sync, too This ensures, the customer can call "server start" again, if he manually shut down the VM from inside. To handle this, we also have to disable calling the stop API on finding a VM unexpectedly running. (cherry picked from commit 525380a) Sync SUSPENDED/PAUSED state into `vm_state`, too If we do it for all other places where we handle `CONF.sync_power_state_unexpected_call_stop`, we should be consistent and sync PAUSED and SUSPENDED, too, even though those states should only occur if someone clicks around in the vcenter. (cherry picked from commit beef5df) Add option to reserve all memory If we don't reserve all memory, there's a swap file pre-created for every VM on the ephemeral storage - even if it's unused. But this option comes at the price of not being able to over-commit memory. Related issue: CCM-9589 (cherry picked from commit a435012) Fix fetch image incorrectly returning folder path instead VMDK path to address OVA image deploy error for incorrect parameter fileType (cherry picked from commit f93a4b3) Handle VM name duplication for OVA image import (cherry picked from commit 90f1629) tests: Fix `test_sync_power_states_instance_not_found` This test fails from time to time because we added a random sleep time in commit > flatten power_sync_state to intervall duration (cherry picked from commit 977552b) (cherry picked from commit 5569c64)

… instead of from the shadow VM (cherry picked from commit 18b4d49) fix detaching volume created from image (cherry picked from commit 446ac66) fix unit tests and feedback for attach/detach volume created from image (cherry picked from commit c71d177) (cherry picked from commit ed19142)

Add global setting `bigvm_mb` This is necessary in multiple places e.g. the scheduler and compute-driver to handle big VMs differently. We cannot currently use flavors for that, because there are still custom flavors in the system that we don't control. (cherry picked from commit 881889d) [scheduler] Add BigVmFilter The filter filters out hosts having more than `bigvm_memory_utilization_max` memory utilization. This is necessary, because it takes too long to find free space on a host for the big VM otherwise. (cherry picked from commit d284fc3) [vmware] Add DRS override for "big" VMs Since DRS doesn't play well with large differences in VM size and HANA doesn't play well with vMotion, we're disabling automatic re-balancing actions for "big" VMs by the DRS. On spawn, we add a `ClusterDrsVmConfigSpec` for instances having more than by default 1 TB of memory - configurable via `bigvm_mb`` configuration variable. The vCenter deletes the overrides automatically when a VM is deleted. One can view current overrides in the vCenter UI via $cluster (e.g. productionbb098) -> Configure -> VM Overrides. We're always adding the rule in `update_cluster_drs_vm_override`, because we only call it in the driver's `spawn` method which seems to be called only when a new vm object is created. Therefore, there should not be any existing rule for the VM. (cherry picked from commit ae68374) [vmware] Always reserve all memory for big VMs Since there are still custom flavors in use for big VMs, we cannot rely on the flavors to set the reserved memory for big VMs, which is necessary because otherwise small VMs will compete with the big VMs' HANA, which does not work well. We're using `CONF.bigvm_mb` here to identify a big VM and reserve all memory regardless of `CONF.reserve_all_memory`. (cherry picked from commit 706bcd8) [scheduler] Make BigVmFilter more dynamic Instead of having a static threshold for the `BigVmFilter`, we compute the threshold depending on the hypervisor size and the requested RAM of the big VM. The reasoning behind this is, that on bigger hypervisors a small big VM doesn't matter as much as a big big VM, e.g. 1.4 TB on a 6 TB hypervisor vs. 5.9 TB on a 6 TB hypervisor. We basically aim at 50 % of the requested RAM free on average over all hosts in the cluster. The filter thus currently only works if a compute-node is in a host aggregate defining the `_AGGREGATE_KEY` (currently "hv_size_mb") having an integer defining the hypervisor's RAM size as a value. (cherry picked from commit 6ad6a2c) [scheduler] Add BigVmHypervisorRamFilter Since we've got multiple BigVM-filters now, this also renames `BigVmFilter` to `BigVmClusterUtilizationFilter` for clarity. With having more diversity in the hypervisor size (e.g. 6 TB hypervisors right around the corner), we have to make sure big VMs actually fit the hypervisor. For that, we introduce a new scheduler filter - `BigVmHypervisorRamFilter` - that prohibits scheduling of big VMs not fitting the hypervisor. In KVM-deployments, this would be the job of the `RamFilter`, but since we run VMware and the scheduler only sees the cluster's and not a single host's resources, we use a special host aggregate attribute - "hv_size_mb" - to retrieve the hypervisor size as already done for `BigVmClusterUtilizationFilter`. (cherry picked from commit 4b6867a) [scheduler] pep8 cleanup (cherry picked from commit 39231e4) Add `is_big_vm` utility function This function should be used to check whether one is handling a big VM. There, all necessary conditions are gathered in on place so one cannot forget a condition. This also makes it easy to change the conditions later on. (cherry picked from commit cc46cf4) [scheduler] Remove BigVmHypervisorRamFilter We actually don't need it, because our filter queries the placement API for resource providers capable of fulfilling the request. With the vmwareapi driver setting the `max_unit` value for `MEMORY_MB` to the cluster's maximum host size, we don't have to add another filter into the scheduler, as too small hosts won't even end up as scheduling options. (cherry picked from commit c2fc281) [scheduler] Use placement API to get HV size We recently discovered, that the hypervisor size is actually available in the placement API. Every compute host has a resource provider with an inventory. There, the vmwareapi virt driver sets the `max_unit` for the `MEMORY_MB` resource class to the biggest host found in the cluster. This means, we don't have to use aggregates to get the hypervisor size in the BigVm filters, but can retrieve it from the placement API. We're building a cache of the value, that expires after 10 min. This guards the case, where a lot of VMs are scheduled in a short period of time - even though tests showed, that querying the inventory of all hosts in QA did not take longer than a second. Cache expiration also guards against stale values for newly-created hypervisors that did not set their inventory, yet. (cherry picked from commit 965d3b4) [vmware] Make `partiallyAutomated` a constant We usually use variables as constants instead of directly embedding strings. (cherry picked from commit 81097b0) (cherry picked from commit fbf0a2d)

Fix for bug where the ClusterInfoEx has no group attribute despite the fact that the VM should be in a server group (cherry picked from commit da99dc3) (cherry picked from commit a784bd2)

This allos the cloud-provider to give the user a possibility to identify the cloud-platform she is running on. We want this so AS ABAP can do group licensing for CCloud. (cherry picked from commit 044d78b) (cherry picked from commit 58ed345)

We have different datastores for VM/ephemeral and volume storage. Moving the volume from to the VMs datastore would destroy the purpose of having differente datastores and would make a cinder volume end up on ephemeral storage. This cannot be intended behaviour and thus we don't do it. (cherry picked from commit 69f2fd2) (cherry picked from commit 8a6a814)

When booting from volume, we want the resize to work regardless of the new flavor's root disk size, as we don't touch the root-disk anyways. (cherry picked from commit ddfcfc2) (cherry picked from commit 11b49cd) [vmware] Skip disk resize for boot from volume When booting from volume, the flavor's root disk size doesn't matter, so we don't have to resize the volume. (cherry picked from commit ddfcfc2) (cherry picked from commit 11b49cd)

When booting from volume, the driver should ignore the volume size and not error out of the root-disk is bigger than what the flavor would have created as ephemeral disk. (cherry picked from commit 533c503) (cherry picked from commit bae6465)

When retrieving a volume from cinder, we translate the object to a dict. This dict now also includes an `owner` attribute read from `os-vol-tenant-attr:tenant_id` - if it exists. It might not exist, if the cinder policy does not allow the user to see that extension attribute. We need that attribute for boot-from-volume where the image-metadata is read from the volume instead of the image. Image metadata contains an owner attribute. (cherry picked from commit 63af5a0) (cherry picked from commit 154acce)

When reporting the memory and cpu resources of a cluster, we must not include failover hosts as they can only be used by the HA process to start VMs from failing hosts. (cherry picked from commit 1be80cb) (cherry picked from commit d62ed1a)

This can be used to have different CPU/memory reservations for hosts depending on the hostgroup they're in. We can also set defaults in percent with this instead of only static values as supported by CONF.reserved_host_memory_mb and CONF.reserved_host_cpus. NOTE: Setting these settings too differently per hostgroup for the same cluster might result in placement API getting confused and scheduling being impossible, as the `max_unit` is still set to the maximum per cluster. In case of our HANA HVs placement would therefore always consider the HANA HVs max unit. That should be no problem as smaller VMs should always fit. (cherry picked from commit 5250436) (cherry picked from commit 28f4eb8)

While the placement-api already adhered to the reservations set per hostgroup, the os-hypervisors/details endpoint did still only use the reserved values from the config file. With this commit nova also uses the reservations to compute a hypervisor's used cpus/memory in the hypervisor-view. This is important to us, as we get the quota capacity from there. (cherry picked from commit e1c4020) Fix hostgroup reservations for production configurationEx The tests worked fine, but production actually has a configurationEx that does not support `.get()` which thus raises an AttributeError on nova-compute start. (cherry picked from commit 927cb88) Fix tests for "tests: Set more fake HostSystem attributes" We now actually return stuff in the tests and thus comparing to "null" is not correct anymore. (cherry picked from commit fa46579) (cherry picked from commit 2329086)

As the comment states, it's only relevant for XML and should be removed in the future. The future is now, as we want fully colored logs :D (cherry picked from commit b332328) pep8 for "[api] Don't remove ANSI from console output" (cherry picked from commit c71cab2) (cherry picked from commit 7c70c9f)

Instead of checking for the flavor attribute `quota:separate` we now use the same flavor attribute as the scheduler: `capabilities:cpu_arch`. This is done for consistency reasons and to make sure we can also use separate quota for the big VMs if we want to. (cherry picked from commit 551a457) (cherry picked from commit e3ba555)

We don't show the total values for RAM/CPU anymore, but only the ones reduced by the reserved resources. We need this, because limes retrieves the full quota capacity from the total values. I opted for changing the resources (for that single call, hence copy.deepcopy) instead of subtracting the reserved values from the totals afterwards, because if an attribute is unset on the `ComputeNode` object, we cannot access it - not even with `getattr`. It just raises a `NotImplementedError` because it cannot fetch the attribute from somewhere (e.g. the DB). But since we need to subtract from the total, we would have to access the attribute. Catching the `NotImplementedError` didn't seem like a good option to me. (cherry picked from commit 9ca7315) (cherry picked from commit fd2c02e)

The function could not handle deleted flavors and thus prohibited quota handling for certain projects by raising an exception. Additionally, it was inefficiently using multiple queries to retrieve missing flavors. The new function now retrieves the flavor information from the `InstanceExtra` object, which saves the previous and current flavor in JSON format, in case the flavor gets deleted. If it cannot find that information - which should basically never happen, as that information is saved there since basically forever - it falls back to the old behaviour of querying the cell DB. (cherry picked from commit 8e377a6) (cherry picked from commit 0a18154)

This filter is supposed to only let full- or half-node size big VMs on a host. This is our quickfix to get some NUMA-alignment for big VMs going until the vmwareapi driver properly supports NUMA-aware scheduling. (cherry picked from commit a320feb) (cherry picked from commit fc725d8) [scheduler] Make big VM host fractions configurable Since the original implementation only supported full- and half-node sizes manually in the code and we now only want full-node size to be supported, we just make the host fractions configurable via settings. The value set in the settings for a host fractioni will be multiplied by the hypervisor's memory. (cherry picked from commit f4978ab) (cherry picked from commit 5a7fd2a) [scheduler] Support multiple extra_specs values in big VM filter We want to define flavors, that can be used both for a full and a half hypervisor, so customers don't have to juggle too many flavors. (cherry picked from commit b19e37c) (cherry picked from commit 347d7c8) [scheduler] Fix host_fractions setting in big VM filter As the ini-style config-file allows the user to only provide string-values, we need to convert the values to floats explicitly. The tests could not find that case, because we set a dict there and could use floats already. (cherry picked from commit 6e3692a) [nova][scheduler] Add logging for no matching host fraction That way we can easier see if everything is configured correctly. (cherry picked from commit b788650) (cherry picked from commit 9082948)

It's possible to create a boot-from-volume instance with instance.image_ref set to something - even something different than the actually booted volume's image according to https://bugs.launchpad.net/nova/+bug/1648501. Therefore, we cannot rely on an empty image_ref for BFV detection and instead use nova.compute.utils.is_volume_backed_instance now. This prohibits the vmware driver from creating a disk on ephemeral store in addition to using the volume and booting from the volume. (cherry picked from commit 395c76d) (cherry picked from commit b08bf77)

instead of returning error on duplicate found with pagination enabled, we fetch the full amount of instances from cells regardless of open buildrequests and slice the result at the end to the requested length. (cherry picked from commit 841e2ce) (cherry picked from commit 8b7183b)

We cannot use `orig_limit` when applying the limit to the list of instances, because using the IP filter needs the full list. The code handled this before by setting `limit` to `None`. Since we removed the code subtracting the already found instances from `limit`, we can safely use `limit` instead of `orig_limit` to make sure we return an unfiltered list if IP-filtering is requested. (cherry picked from commit e8862db) (cherry picked from commit eef803e)

(cherry picked from commit dac4481) [vmware] finish_revert_migration: relocate after the disks are updated The relocate should happen after the _revert_migration_update_disks() has been executed, otherwise reverting to the initial disk size could fail. (cherry picked from commit 4a1957a) [vmware] resize/migrate: fix RelocateVM api calls - fix network devices backing while performing RelocateVM - use migration.dest_host and migration.source_host to determine if we should move the instance to another cluster - make use of resize_instance parameter passed to the driver (cherry picked from commit 4e7a3de) [vmware] _resize_disk: use instance.old_flavor for comparison Since _resize_disk is executed on the finish_migration step we must use instance.old_flavor.root_gb to see wether we are actually increasing the size of the disk. (cherry picked from commit c8f96d0) fix pep8 warnings and remove some unused code (cherry picked from commit ed2db87) fixes for pr comments (cherry picked from commit 7b525c3) fail if a vif is not found on the vm while relocating (cherry picked from commit 5472f5e) create an extra step for attaching volumes after relocating RESIZE_TOTAL_STEPS is now 7 (cherry picked from commit 53c241c) add log for Relocated VM on finish_revert_migration (cherry picked from commit 458376c) fix tests for '[vmware] allow to relocate a VM to another cluster' add assertion for update_cluster_placement call (cherry picked from commit 81f1070) [vmware] attach the block devices by the boot_index order When attaching the volumes after a migration, we must guarantee the order of the boot devices, otherwise the instance may not boot due to boot disk not being attached first. Therefore, we attach first the volumes that have specified a boot_index by the order of that boot_index. The volumes with default (-1) or without boot_index will be attached afterwards as they came from the manager. (cherry picked from commit adff90c) [vmware] handle relocate failure and rollback the detached volumes (#143) In the case when a relocate fails, we must ensure the detached volumes aren't hanging, so we must make sure they get reattached after an exception is thrown. (cherry picked from commit 748d0d9) (cherry picked from commit f4b0146)

Backport of fix that can be found here: https://review.opendev.org/#/c/678958/ ATTENTION: There are more changes in that fix, that we didn't have to backport for queens. Please check the other changes when re-applying to newer versions of nova. (cherry picked from commit 4790b8f) (cherry picked from commit a3b78af)

This patch enables us to add more DRS groups to a VM than only the affinity/anti-affinity groups. This is supposed to be extended in future commits to return more than one group as configured on certain conditions, e.g. big VMs. (cherry picked from commit 6d4ae3b) (cherry picked from commit 61a6562)

Some of our VMs have special spawning needs, which means they need to spawn on their own hypervisor at first. This function helps us define these VMs in a central place. (cherry picked from commit e9829a7) (cherry picked from commit 6f4ed6a)

We changed the code to ignore the file-name, as a vmotion will result in renaming of the files breaking the heuristic to detect the root disk. Instead we were taking the first disk, when the uuid parameter was set. The uuid parameter is not set when working with shadow-vms and vms for image import. So, no special handling is needed, we always want the first disk in those cases too and so we can scrap the uuid argument. Change-Id: Ib3088cfce4f7a0b24f05d45e7830b011c4a39f42 (cherry picked from commit bd7925e)

VMwareAPISession has been moved to its own module, and this change should reflect that in the test case. Change-Id: Ie0878986db41887f9f0de0bc820135d5284df403 (cherry picked from commit 9854168)

The vmwareapi driver uses Managed-Object references throughout the code with the assumption that they are stable. It is however a database id, which may change during the runtime of the compute node If an instance is unregistered and re-registerd in the vcenter, the moref will change. By wrapping a moref in a proxy object, with an additional method to resolve the openstack object to a moref, we can hide those changes from a caller. For that the initial search/resolution needs to wrap the resulting moref in such a proxy. Change-Id: I40568d365e98359dbe90663c400e87be024df2eb (cherry picked from commit 89b5c6e) Vmware: MoRef implementation with closure This should ease the transition to stable mo-refs One simply has to pass the search function as a closure to he MoRef intstance, and the very same method will be called when an exception is raised for the stored reference. Change-Id: I98b59603a8ef3b91114f378d82cd7418d26a1c52 (cherry picked from commit c854d41) Vmware: Implement StableMoRefProy for VM references By encapsulating all the parameters for searching for the vm-ref again, we can move the retry logic to the session object, where we can try to recover the vm-ref should it result in a ManagedObjectNotFound exception Change-Id: Id382cadd685a635cc7a4a83f69b58075521c8771 (cherry picked from commit bc23e94) Vmwareapi: Move equality test to tests The equality test is only used by the tests so it is better implemented there. Change-Id: I51ee54265c4cc2b4f40c0b83f785a49f8a8ebce4 (cherry picked from commit 84f3e06) Vmwareapi: Stable Volume refs The connection_info['data'] contains the managed-object reference (moref) as well as the uuid of the volume. Should the moref become invalid for some reason, we can recover it by searching for the volume-uuid as the `config.instanceUuid` attribute of the shadow-vm. Change-Id: I0ae008fa15a7894e485370e7b585821eeb389a93 (cherry picked from commit a71ddf0)

The clone created in a snapshot would also contain the nvp.vm-uuid field in the extra-config. If we delete then the original vm, the fallback mechanism of searching for the VM by extra-config would trigger, and find the snapshot and delete that instead. Change-Id: I6a66fa07dfe864ad4deedc1cafe537959cd969f4 (cherry picked from commit 90a9f4e)

Remove datastore_regex from VMWareLiveMigrateData This was a leftover of some part of the development process and never used. Thus, we remove it again. Change-Id: I37ce67b4773375e31f18ac809a6029aa41702a3b (cherry picked from commit 17928f7) vmware: ds_util.get_datastore() supports hagroups We're going to implement hagroups of datastores and for that we need to be able to select a datastore from a specified hagroup. This is currently planned via matching the name of the datastore against a regex, that can extra the hagroup from the name. This commits adds retrieving the hagroup and checking it against the requested one to ds_util.get_datastore(). Change-Id: Ie3432a8e0b020ca9bf41abc098c0fac059af0df9 (cherry picked from commit f8e452a) vmware: Add setting datastore_hagroup_regex This setting will be used to enable distribution of ephemeral root disks between hagroups of datastores. The hagroups are found by applying this regex onto the found datastore names and should be named "a" or "b". Change-Id: I45da5dd5c46a4ba64ea521a0e0975f133b5801f1 (cherry picked from commit c10d4e8) vmware: Distribute VM root disks via hagroups We want to distribute the ephemeral root-disk of VMs belonging to the same server-group between groups of datastores (hagroups). This commit adds the mentioned functionality for spawning new VMs, offline and online migration. Change-Id: I889514432f491bac7f7b6dccc4683f414baac167 (cherry picked from commit 6feb47d) vmware: Add method to svMotion config/root-disk For distributing ephemeral root disks of VMs belonging in the same server-group between 2 hagroups, we need to be able to move the disk/config of a VM to another ephemeral datastore. This method will do an svMotion by specifying a datastore for all disks and the VMs. The ephemeral disks - found by using the datastore_regex - receive the target datastore while all other disks, which should be volumes, receive their current datastore as target. Change-Id: Iac9f2a2e35571bef3a58a22f6d96608f2b0bf343 (cherry picked from commit 01b9876) vmware: Ignore bfv instances for hagroups Boot-from-volume instances do not matter for our ephemeral-root-disk anti-affinity as Cinder manages anti-affinity for volumes and config-files going down with a datastore do not bring the instance down, but only make it inaccessible / unmanagable. The swap file could become a problem if it lives on the same datastore as the config-files, but newer compute-nodes store the swap files on node-local NVMe swap datastores in our environment, so we ignore this for now. We could solve this by passing in a config option that determines whether we should ignore bfv instances or not depending on if we detect node-local swap datastores or not. We move the generation of hagroup-relevant members of a server-group into its own function. Change-Id: Id7a7186909e236b7c81b4b8c8489e84f1067f2d4 (cherry picked from commit 2c7e2cc) vmware: Add hagroup disk placement remediation Every time a server-group is updated through the API, we call this method to verify and remedy the disk-placement of VMs in the server-group according to their hagroups. Change-Id: I7ba6b14f5c969fb77dc5ce0fed63a6d9251f556e (cherry picked from commit cc50e0d) vmware: Validate hagroup disk placement in server-group sync-loop This replaces adding an additional nanny to catch when Nova missed an update to a server-group e.g. because of a restart. Change-Id: I9aa516bfe6be127a011539d9d22a78d1f38aba13 (cherry picked from commit 09a32e2) vmware: Use instance lock for ephemeral svMotion When moving the ephemeral root-disk and the VM's config files, we take the instance-lock to serialize changes onto the VM. This makes sure, that we don't squeeze our task between other tasks in the vCenter, which would make use read an inconsistent state of the VM. Change-Id: I04fc39bd48896bfd8010f17baa934f6f828edcef (cherry picked from commit 4f5eda3) vmware: Place VMs to hagroups more randomly The previous implementation of placing a VM onto an hagroup based on the index it has in the server-group has a big disadvantage for the common use-case of replacing instances during upgrades one by one. In doing so, every VM added to the end would end up on the same hagroup. To work against this, we put VMs onto hagroups randomly by taking their UUIDs first character and use this modulo 2 as the deciding factor. These UUIDs being already generated randomly, we don't need to hash them or anything. Change-Id: Ib0d9f24ae7d5e0d4e2dceeb77a1513a8657976d2 (cherry picked from commit 52b5d4b) vmware: datastore_hagroup_regex ignores case When finding hagroups in datastores with the regex from datastore_hagroup_regex, we use re.I to ignore the case so that an error made by an operator in naming the datastore does not break the feature. Change-Id: I4de760d99513abc9977f698aaba85b6456709ca6

Prior to this, the driver was performing migration/resize in a way that could lead a VM into an inconsistent state and was not following the way nova does the allocations during a migration. Nova expects the driver to do the following steps * mirate_disk_and_power_off() - copies the disk to the dest compute * finish_resize() - powers up the VM in the dest compute This change removes the RelocateVM_Task and introduces a new CloneVM_Task instead, in migrate_disk_and_power_off(). The CloneVM_Task also allows now cross-vcenter migrations. Co-Authored-by: Marius Leustean <[email protected]> Change-Id: I9d6f715faecc6782f93a3cd7f83f85f5ece02e60 (cherry picked from commit 95f9036)

If we attach a volume to a VM, we have to set the storage-profile. Otherwise, the VM will not be compliant to the profile and - especially on VMFS datastores - cannot be storage-vMotioned around if the storage-profile includes storage-IO control. With setting the profile for each disk-attachment, the VM also shows compliant to these profiles in the UI. Change-Id: Idad6293dc7dfdf46fed584b9c116c03f928d44fe (cherry picked from commit dabcbca)

If a shadow-vm is missing, we raise an AttributeError, which is not clearly identifying the reason of the failure. We better re-raise the original ManagedObjectNotFound exception, so it is more clearly identifiable Change-Id: I954c57e97961833208743bc88e3ce75ad23cfe8c (cherry picked from commit a5a9dd9)

If multiple nics are attached, they need different device-keys otherwise the vmwareapi will reject the request Change-Id: I0aa58ad11c499e9423c7ecc7998325b05dd9147e (cherry picked from commit 8ba8b32)

When spawning a VM with more than 128 cores, we set numCoresPerSocket and some flags e.g. vvtdEnabled. We missed to add the same flags when resizing to a VM having more than 128 cores. This patch remedies that. Change-Id: I381a413ecf80af14dd4bf1dfde2d070976b6477a (cherry picked from commit cfd906b)

(cherry picked from commit a0dc4cb)

When simply cloning the original VM, the size might not fit on the target hypervisor. Resizing it to the target size might not fit on the source hypervisor. So we simply scale it to minimal size, as we are going to reconfigure it to the proper size on the target hypervisor anyway. Change-Id: Ia05e5b3a5d6913bfcef01fa97465a1aaa69872d0 (cherry picked from commit 40d6589) Vmware: Warn about failed drs override removal An error needs manual intervention, and an exception debugging from a developer. This, however, is a known behaviour which potentially can lead to problems, hence a warning Change-Id: I9479fb6405485e763a6344e7f44a60f75891adcb (cherry picked from commit f88a96c)

When VMs with lots of CPUs are running for a longer period of time, a task to reconfigure the VM might end up hanging in the vCenter. According to VMware support, this problem happens if those VMs are running for a longer period of time and with the large number of CPUs have accumulated enough differences between those CPUs, that getting them all into a state where a reconfigure can be executed takes more time than the default 2s (iirc). The advanced setting to increase this time is "migration.dataTimeout". For simplicity reasons and because it shouldn't hurt (according to VMware), we set it on all big VMs. That way, we do not have to figure out if the VM consumes enough CPUs of the hypervisor to need this setting. Change-Id: Id8bda847c9e48997b385d9e1079ee9e99af9b8e8 (cherry picked from commit 2f7393c)

Until now, we only kept image-template-VMs that had tasks that showed their ussage - but VMs cloned from another image-template-VM don't have any tasks. Thus we immediately removed VMs we cloned to another BB. This could even happen when the copying of the disk into the cache directory was still in progress. To counter this, we now take the "createDate" of the VM into accound and only delete image-cache-VMs that were created more than CONF.remove_unused_original_minimum_age_seconds ago. Additionally, we take the same lock we also take in deploying image-cache-VMs and copying their files. This should protect from deleting the VM while a copy-operation is still in progress. Deleting the VM while copying is still in progress does not stop the copying. Therefore, this race-condition might be responsible for a lot of orphan vmdks of image-cache-VMs on our ephemeral datastores. Change-Id: Ic0a694a8c4df203c8c100abf5b8d2e9ee73866f7 (cherry picked from commit d8f3ddf)

This filter enable to select same host-aggregate/shard/VC for instance resize because it could take more time to migrate the volumes over other shards. (cherry picked from commit f648b9b)

Resizing to a different flavour may imply also a different hw-version, so we need to set it otherwise it will stay on the previous one, which may be incompatible with the desired configuration Only upgrade is possible though. Change-Id: I7976a377c3e8944483a10fdada391e8c51640e30 (cherry picked from commit 28fb1a4) Vmware: Only change hw_version by flavor Be more strict in the upgrade policy, and only upgrade on resize, if the flavor demands it. Not if the default has changed Change-Id: I25a6eb352316f986b179204199b098a418991860

When switching to filtering the AZ via placement, we need the bigvm resource provider to be in the AZ aggregate in addition to being in the aggregate of the host's resource provider. Therefore, we find the host aggregate by seeing which aggregate is also a hypervisor uuid. Change-Id: I250f203b3bb24e084ec1b499a923f7f66e638102 (cherry picked from commit 29ce312) bigvm: Do not remove parent provider's previous aggregates When we filter AZs in placement, we don't want the aggregates our resource providers removed by nova-bigvm, as they represent the AZ. Therefore, we query the aggregates of the "parent" provider and make sure to include these aggregates, if we have to set the resource provider's UUID as an aggregate, too. Change-Id: If3986df022273f20e109816f2752ce0254db4f10 (cherry picked from commit 2e98cd4) bigvm: Ignore deleted ComputeNode instances Querying via ComputeNodeList also returns deleted ComputeNode instances. Therefore, we might create bigvm-deployment resource providers for a deleted instance instead of the right instance and thus for a wrong resource provider. With ignoring deleted ComputeNode instances, this should not happen anymore. Change-Id: I5a4c6c5a1894d1f6f5cff6e3475670c27bb97f28 (cherry picked from commit f7f5f0c)

There can be Ironic hosts, that only have nodes assigned when those nodes are getting repairs or getting build up. Those Ironic hosts would come up empty when searching for ComputeNodes in the sync_aggregates command and would be reported as a problem, which makes the command fail with exit-code 5. Since it's no problem if an Ironic Host doesn't have a ComputeNode, because each node is its own resource provider in placement anyways, we ignore Ironic hosts without nodes now in the error-reporting. Change-Id: I163f3e46f2e375531b870a363b84bba67816954d (cherry picked from commit 67779eb)

The DRS rules can be read from the "rule" attribute, not from the "rules" attribute. We found this, because Nova wasn't deleting DRS rules for no-longer-existing server-groups. Change-Id: I86f7ca85d9b0edc1406a54a6f392bfff8f0af00d (cherry picked from commit 562b084)

When syncing a server-group's DRS rule(s) we now also enable a found rule in case it is disabled. We don't know how this happens, but sometimes rules get disabled and we need them to be enabled to guarantee the customer the appropriate (anti-)affinity settings. Change-Id: Ibc8eb6800640855513716412266fcbb9fbc4db42 (cherry picked from commit d712c23)

When we don't find any datastores for whatever reason, we don't have the "dc_info" variable set and thus cannot call self._age_cached_image_templates() with it as it results in an UnboundLocal exception. Change-Id: I2dca6d2d6ab7ca5cbc4ef7d2c316faaf6edfee7d (cherry picked from commit d2cf44f)

The properties may not be set, if the host is disconnected. Change-Id: I1c53477e891b5b95859ca267fcad8cd1bff260ef (cherry picked from commit 0cb8b61)

Most code related to vms is in vmops, not in the driver So we move this code there too Change-Id: I1b801c8f12b377dd74a31ef646216c564631fe7f (cherry picked from commit ade6f4c)

This requires a change to oslo.vmware to accept a string instead of only a cookiejar. Depends-On: Ia9f16758c388afe0fe05034162f516844ebc6b2b Change-Id: I34a0c275ed48489954e50eb15f8ea11c4f6b1aa6 (cherry picked from commit 726d7a2)

While we cannot live-migrate CD-Roms directly between vcenters, we can copy the data and detach/reattach the device manually. Change-Id: I88b4903f745e1bcfe957ddc07c6e9c040820ed6b (cherry picked from commit 14f9a5f)

Since the mission is to delete the attachment, Cinder returning a 404 on attachment deletion call can be ignored. We've seen this happening where Cinder took some time to delete the attachment so Nova retried as it got a 500 back. On this retry, Nova got a 404 and left the BDM entry as leftover while aborting the deletion, that already happened by driver. Change-Id: I15dd7b59a2b3c528ecad3b337b92885b4d7bd68f (cherry picked from commit 82992a5)

Apparently, the volume-id is not consistently stored as volume_id in connection_info. Use the block_device.get_volume_id function to handle the fallback. Change-Id: If5a8527578db8e4690595524e0785ee8b4de1d79 (cherry picked from commit 607fd0d)

Since we don't explicitly set a disk as boot disk and instead rely on the order the disks have on the VirtualMachine, we need to make sure we attach the root disk first. Change-Id: I3ae6b5f053a3b171ed0a80215fc4204a2bf32481 (cherry picked from commit 7e6dc54)

We've recently changed that not all large VMs need DRS disabled - only the ones over 512 GiB memory. But we still need memory reservations for VMs of 230 GiB - 512, which was previously handled by them being large VMs. While we could do this via flavor, we failed to do so. Additionally, this would limit the amount of large VMs we can spawn on a cluster. To keep the same behavior we previously had for large VMs, we now split memory reservations from big/large VM detection with the following result: 1) a big VM will get DRS disabled - big VMs are VMs bigger than 1024 GiB 2) a large VM will get DRS disabled - large VMs are VMs bigger than 512 GiB 3) all VMs defining CUSTOM_MEMORY_RESERVABLE_MB resources in their flavor get that amount of memory reserved 4) all VMs above full_reservation_memory_mb config setting get all their memory reserved Therefore, is_big_vm() and is_large_vm() now only handle DRS settings and special spawning behavior. Side effect is, that nova-bigvm or rather the special spawning code now doesn't consider 230 GiB - 512 GiB VMs as non-movable anymore and thus finds more free hosts. Change-Id: I2088afecf367efc380f9a0a88e5d18251a19e3a5 (cherry picked from commit dca6fe6)

notandy and others added 30 commits August 8, 2022 15:40

catch non-seeded flavors key exception

d96b040

(cherry picked from commit bd338c3) (cherry picked from commit 7976a56)

[vmware] replace serial ports when creating instance from image template

1575c28

(cherry picked from commit 4300ca8) (cherry picked from commit 30dfd92)

[vmware] don't report empty inventory (fixed keyerror)

045e883

(e.g. when all hosts in maintainence) (cherry picked from commit 4a32b79) (cherry picked from commit fa208ae)

flatten power_sync_state to intervall duration

b99cd60

(cherry picked from commit a78c03e) (cherry picked from commit 70e9676)

Validating that instance is in Running state when trying to get mks c…

7fffc18

…onsole - raising exception.ConsoleTypeUnavailable if the AcquireTicket operation throws InvalidPowerState (cherry picked from commit 7c6c326) (cherry picked from commit 248ab92)

Refactor and fix for _destroy_instance function

8dced72

Fix for bug where the ClusterInfoEx has no group attribute despite the fact that the VM should be in a server group (cherry picked from commit da99dc3) (cherry picked from commit a784bd2)

[vmware] Ignore failover hosts' resources

065bd74

When reporting the memory and cpu resources of a cluster, we must not include failover hosts as they can only be used by the HA process to start VMs from failing hosts. (cherry picked from commit 1be80cb) (cherry picked from commit d62ed1a)

fwiesel and others added 29 commits August 8, 2022 15:40

Vmware: Move session test to down module

b19caa2

VMwareAPISession has been moved to its own module, and this change should reflect that in the test case. Change-Id: Ie0878986db41887f9f0de0bc820135d5284df403 (cherry picked from commit 9854168)

Vmwareapi: Fix attachment of multiple nics

5218f80

If multiple nics are attached, they need different device-keys otherwise the vmwareapi will reject the request Change-Id: I0aa58ad11c499e9423c7ecc7998325b05dd9147e (cherry picked from commit 8ba8b32)

vmware: Make sure instance memory is multiple of 4.

4a4f717

(cherry picked from commit a0dc4cb)

Add PreferSameShardOnResizeWeigher.

41559e7

This filter enable to select same host-aggregate/shard/VC for instance resize because it could take more time to migrate the volumes over other shards. (cherry picked from commit f648b9b)

VMWareapi: Handle missing Host properties

bcb8dec

The properties may not be set, if the host is disconnected. Change-Id: I1c53477e891b5b95859ca267fcad8cd1bff260ef (cherry picked from commit 0cb8b61)

Vmwareapi: Move pre_live_migration to vmops

2483e24

Most code related to vms is in vmops, not in the driver So we move this code there too Change-Id: I1b801c8f12b377dd74a31ef646216c564631fe7f (cherry picked from commit ade6f4c)

Vmwareapi: Pass cookie-header as string

e63f28f

This requires a change to oslo.vmware to accept a string instead of only a cookiejar. Depends-On: Ia9f16758c388afe0fe05034162f516844ebc6b2b Change-Id: I34a0c275ed48489954e50eb15f8ea11c4f6b1aa6 (cherry picked from commit 726d7a2)

Vmwareapi: Workaround for Config-Drives with Live-Migrations

f333c9b

While we cannot live-migrate CD-Roms directly between vcenters, we can copy the data and detach/reattach the device manually. Change-Id: I88b4903f745e1bcfe957ddc07c6e9c040820ed6b (cherry picked from commit 14f9a5f)

joker-at-work merged commit 6574a3b into stable/xena-m3 Aug 8, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Port Rocky patches onto Xena #350

Port Rocky patches onto Xena #350

joker-at-work commented Aug 8, 2022

Port Rocky patches onto Xena #350

Port Rocky patches onto Xena #350

Conversation

joker-at-work commented Aug 8, 2022