WIP: Hypervisor per ESXi #233

Whiile nova already cleaned up its original image-cache, our patches also added images as templates into the mix that were not cleaned up prior to this patch and thus filling up the datastores. Change-Id: I2fc631d6ce0a9339be9237d63ae7e86d94779dcc (cherry picked from commit 97780bb) (cherry picked from commit ce5f026)

If we sort the by creation-time, we can stop looping over tasks once we find an event that's too old. All other VMs, we didn't find an event for, yet, must be expired anyways. Change-Id: I196bb9ab48867f314d9a5f3b7566384fa72778df (cherry picked from commit 642bcc7) (cherry picked from commit 2190382)

Since we want to keep images in the owner's folder again instead of uploading them to all projects using them, we need to also make sure we clean up everywhere - even in projects not having instances. Change-Id: I9344a9bb8c0436cb5c87f3328fa3b9d31dedbdbc (cherry picked from commit 943babe) (cherry picked from commit b1af943)

We need to support Python3 in the future and this is a low-hanging fruit. Change-Id: I86804af29f4c6aff14fa5495e5344377488ba8fe (cherry picked from commit 95c177f) (cherry picked from commit bce80c6)

We want to make sure we only clean up image-cache templates and no other VMs that might happen to lie in the "Images" folder. Change-Id: I55b7fa7ebbe14b13f579ebc39dcae2549ddedc9a (cherry picked from commit 67fda8f) (cherry picked from commit d18fb11)

Since every nova-compute node will be running the image-cache cleanup, we have to make sure this works in parallel. Therefore, we limit the cleanup of image-cache templates to the datastores the nova-compute node is responsible for - the ones configured in the datastore_regex config option. Change-Id: I5fae822a08bcc06565c64f959553cf7082bb2423 (cherry picked from commit 0d8d760) (cherry picked from commit 5f21c7d)

Even with image-as-template disabled we have to run the image-template cleanup to ensure removal after the setting is switched off and because we switched all our image-upload to using those image-templates. Change-Id: I924fbc42f014ecf4c0342246d48f27b6ba1d5c77 (cherry picked from commit 44b8620) (cherry picked from commit 7331446)

Otherwise, we might copy the same image to the datastore multiple times, just because VMs from multiple projects are deployed from that (shared) image. Change-Id: I02721da655ec505d7c3c5d3c9faca1be77dce813 (cherry picked from commit 95b6176) (cherry picked from commit aea1268)

Every image we upload automatically ends up as template-VM. We can leverage this in normal deployments and copy the VMDK from the image-template, if it still exists. If it doesn't exist anymore, we can try to copy another image-template from another datastore and use its VMDK instead - that should be still faster than copying stuff from glance again. If all fails, we turn back to the original approach of loading the image from glance. Change-Id: I659d22e494a86fe4e567306062784313432d11ee (cherry picked from commit cfa3eb8) (cherry picked from commit a0d9db9)

Until now, the logs during a clean-up showed "Destroyed VM" messages, but one could not find out, which VM was destroyed. Therefore, we add an additional log message to point out that and which image-template VM we're going to destroy. Change-Id: I7429fca0175ec4593689466be6dcc0cb2482cb9f (cherry picked from commit 557e34e) (cherry picked from commit 2d6f122)

There are cases, where we don't want to blindly try and delete a VM, but instead know the outcome for certain types of errors at least. For supporting this case, vm_util.destroy_vm() has to re-raise certain types of exceptions. In the case we're looking for, that's the "InvalidArgument" fault. For this to work like before, code calling destroy_vm() needs to catch these exceptions, too. Change-Id: I28d2f1b94b8439cfea88146746ae6e59d61f087c (cherry picked from commit e112c5e) (cherry picked from commit 73a9a27)

While we don't know why, yet, we've seen image-cache templates missing their directory on the datastore. For the image-template cleanup this means, that Nova cannot destroy those templates, as this raises an InvalidArgument fault on VMware's side, tellung us that ConfigSpec.files.vmPathName is invalid. Since we need to clean those templates up anyways to create a new, usable one, we catch those errors and call UnregisterVM as a fallback. For getting templates from another datastores or re-using the already existing template to copy its disk, we have to catch FileNotFoundException. If this situation occurs, we have to clean that broken template up and also let nova continue the search for a source of the disk for the new VM. Change-Id: Id6d1cc1cd7a50958c77a1417e3f2aed7b9672e15 (cherry picked from commit 1b342cd) (cherry picked from commit 13940c5)

If the DRS policy is not set to "fullyAutomated", i.e. if DRS will not take action on existing, running VMs, we have not change to free up a host and thus don't have to try to find one. Instead, we error out to tell nova-bigvm to search for another cluster. Change-Id: Idbdfe82b4057844401e710fb9d87141478bb3353 (cherry picked from commit e98d6ae) (cherry picked from commit 0e8f17e)

Support for the E1000 driver is phasing out in guest operating systems, so we switch to E1000e, which is also faster and has more hardware offloading capabilities. Change-Id: I08ac32f914a57d3eb7328351a07a20a2ef212cb8 (cherry picked from commit 5e7556d) fix unit tests for vmware: Switch default VIF model (cherry picked from commit 425070f) (cherry picked from commit a3b173a)

When we try to fetch an image-template from another datastore, it might happen, that the template has an incompatible hardware version and the vcenter raises VirtualHardwareVersionNotSupported if we try to clone it to our cluster. We handle this case now by logging a debug message and continuing with the next image-template we find, as this one is unusable for us. Change-Id: If9dc9b2a13171252e5f0f0b3a99a51be2f28c6eb (cherry picked from commit d0ed453) (cherry picked from commit b5f59b7)

We have to check a lot and when adding more, we will get a flake8 warning that our function got too long. So we move the checking/cleaning of existing providers into its own function for better overview. Change-Id: I5ceb4d9338f6d94f49cc2deff25eefb19df2030f (cherry picked from commit 1700192) (cherry picked from commit 848d78c) bigvm: Don't add a candidate if we don't have providers When we filter out providers by used cluster percentage, we used to add an empty list of providers to the candidates. It probably ended in not running any of the loops we do over candidates later on, but that's still unneccessary. Change-Id: Ie033332436674f4fe792f4aa3f83f33b12a6d9ed (cherry picked from commit 47ceb57) (cherry picked from commit 027d419)

We might want to disable a cluster for bigVMs for different reasons - one being a currently running Upgrade of all ESXi hosts. When setting the trait "CUSTOM_BIGVM_DISABLED" on a resource-provider, nova-bigvm will not use this cluster for finding a host to spawn bigVMs. Change-Id: I36813ed3d95fd8572c6b75544ebb2fc1936f6bdb (cherry picked from commit 5fc69b9) (cherry picked from commit 2a996e4)

TestPowerVMDriver.test_driver_capabilities was failing because a new capability introduced by SAP in the ComputeDriver. This fixes the assertion done by powervm, so that it will assume the PowerVMDriver capabilities are included in the ComputeDriver capabilities and no equal to it. (cherry picked from commit 69ffaa1)

With vSphere 6.7 VMware started supporting giving more than 128 vCPUs to a VM. For this to work, the VM needs to boot via UEFI, the CPUs have to be split up to multiple cores with an image propery (e.g. hw:cpu_cores='2'), the hardware version of the VM needs to be at least "vmx-15" (image property vmware:hw_version='vmx-15') and we need to set the vvtdEnabled flag when creating the VM. This patch does the last part, because that's only possible to do in Nova. We also set virtualMmuUsage to 'automatic' as VMware complained in our tets, that the MMU needs to be enabled and the manually-created VM for testing had this in their config. Change-Id: I4aabad2c5254f9fa5b83d47bc34675d90c431535 (cherry picked from commit c49cb94)

For noVNC 1.1.0 or newer, there must be a 'path' query parameter only, instead of 'token'. This patch allows 'path' query parameter being present in the novncproxy_base_url configuration. In that case, we just append the token to it. (cherry picked from commit 68d48f6)

Cast Decimal to Int in _get_counts, similar to _get_counts_in_db. SQLalchemy func.sum() returns Decimal on MySQL. When the usage limits are calculated unless explicetely converted to Int we end up with Decimal objects that cannot be JSONified and thus causing ValueError: Circular reference detected. (cherry picked from commit 163a80f)

Rocky introduces database tokens and deprecates the nova-consoleauth service. Thus, shellinabox needs to access the tokens from the database as well. This still supports the old nova-consoleauth by enabling it via the [workarounds]/enable_consoleauth. (cherry picked from commit 421cd8e)

This is necessary with multi-version clusters where migrating e.g. on resize needs to be possible. Since the vmware driver was explicitly overriding the hw_version attribute on ExtraSpecs and didn't use __init__(), we still ended up with no hw_version if the flavor didn't set one. Change-Id: Idc287c4dfa2b5d6a6a837a5014063417c8e13768 (cherry picked from commit 09b2547)

The checks performed by this script aren't always useful to downstream consumers of the repo so allow them to disable the script without having to make changes to tox.ini. Change-Id: I4f551dc4b57905cab8aa005c5680223ad1b57639 (cherry picked from commit 610396f) (cherry picked from commit d2ee27d)

If the project tags from keystone contain the tag "sharding_enabled" then the hosts in _all_ shards will pass the shard filter for this project. This was done to facilitate both enabling sharding (only one simple tag to set), and mainly for frontend code to detect sharding status (mostly) without parsing tag strings. (If sharding is not enabled, then vc-* tags will have to be parsed to find out which shard(s) the project is on.) (cherry picked from commit 01014bc)

When messages time out between nova-api and nova-compute, it can happen, that there's a block-device-mapping that never saw an attachment and wasn't rolled back either. If such a VM is deleted and the volume got attached to another VM in the mean time, a detach call to Cinder without an attachment_id - which cannot exist, because the attachment never got that far - would delete the attachment for the other VM. We now search for a volume-attachment in Cinder if no attachment_id was given. If we don't find one for our instance, but there are attachments, on the volume - which should then belong to other instances - we assume we ran into above's case and don't call detach. Change-Id: I6c6fad88e93fd788e3df1c942fed763c0ad0414f (cherry picked from commit 8a3b9d7)

We want to be able to configure new VMs with our currently-disabled vSPC services, because it doesn't seem to be possible to reconfigure a serial port on a running VM - other than setting it to connected and startConnected. Therefore, we add a config option. Change-Id: I0b9d7a7d1445c2017756146068e287628a39bec6 (cherry picked from commit ed2ac5d)

We need to call _ensure_resource_classes instead. Change-Id: I2329db43adc00e04b07d16d312361ae5e669d298 (cherry picked from commit 4e5413a)

The newer Monitorbar requires the child classes to implement "populate_metrics()" instead of "get_metric()", which should create MonitorMetric objects now, appending them to the supplied list. Change-Id: I534d63f4b4888da2b59b2a10211482e1449f2901 (cherry picked from commit 400cdd2)

If the obj_list parameter to the vim_util.get_properties_for_a_collection_of_objects() function is empty, an empty list was returned. The non-error path returns a vim.RetrieveResult object, with an iterable "objects" attribute. Introduce a dummy class "EmptyRetrieveResult" which behaves like vim.RetrieveResult for the empty case, and return an instance if the list of objects to be queried is empty. (cherry picked from commit f509fae)

"nova-manage placement sync_aggregates" doesn't support Ironic hosts, because they have multiple compute nodes. In general, Nova doesn't support aggregates for Ironic hosts. But we still use them for assigning an availability zone to a rack (a building block) of nodes grouped into a Nova host. Therefore, we patch above-mentioned command to handle a list of (in most cases one) compute node UUIDs, only erroring out if the host we find multiple compute nodes for doesn't look like an Ironic host (based on having "ironic" in the name). Change-Id: I4f7e5fd82c51ce5d6f42089beb5a70e469ec54df (cherry picked from commit 481d398)

We don't want them to move anymore. This might hinder our efforts to spawn big VMs, but the nanny is supposed to help us here. Change-Id: I5ebdbe2f287d50c9b9a755166702c3eccea62b14 (cherry picked from commit 2da895f)

When a flavor has an extra_spec key "catalog:alias", make an additional alias flavor available in the flavor list with the extra_spec value as name for that additional flavor. The aliased flavor's flavorid is prepended with a configurable prefix (default is 'x_deprecated_') in order to have a unique id, with a straightforward removal process, without any further DB lookups. A nova config option 'flavorid_alias_prefix' is introduced for this purpose. The 'x_' part of the prefix default is chosen to sort flavor aliases toward the end of the flavor list, to decrease visibility. This change enables renaming and phasing out flavors. Aliased flavors appear when listing available flavors, but they don't actually exist and get automatically converted into their actual flavor counterparts on flavor show and when creating servers. So, when inspecting a flavor by name or deploying a server with a flavor by name the actual flavor is shown or used for server creation, respectively. Allow multiple flavor aliases for single flavor During renaming/restructuring of flavors, multiple old flavors get mapped to a single new flavor. But 'catalog:alias' on the new flavor previously only accepted a single alias pointing to the deprecated flavor. The interpretation of 'catalog:aliases' extra_spec is changed so that the value can be a single alias name or a comma-separated list of multiple names. Aliased flavorids gain an index suffix (even if only a single alias was given) so that flavorids are still unique. Example: "30" -> "x_deprecated_30_0" If multiple alias names are given, the next aliased flavorid will be "x_deprecated_30_1" and so forth. (cherry picked from commit a6e4b32)

In Nova's rocky version, placement learned to handle nested providers in a better way and supports them from version 1.29 on - which isn't in rocky. Upstream thus added code to filter out nested providers from the response. The problem with that is, that we modeled our CUSTOM_BIGVM-providing resource providers as children of the nova-compute-bb* provider. Therefore, our big VM spawning code cannot find an allocation candidate anymore. We now switch to created the "child" RPs as individual RPs instead. For identifying their "parent", we now use the aggregate assigned to them. Change-Id: Ic9d707c59a4ea405f3a982dbe269cdfea0d03aa5 (cherry picked from commit 65d29ed)

After the instance is deleted in VSphere its "vmware.log" files are deleted along with it, which hinders post-mortem debugging. Add a config flag "mirror_instance_logs_to_syslog" to forward all instance logs to the syslog monitoring service, to be able to inspect them after the instance is gone. Refer to [0] for reference on how this is done in VSphere. [0] https://blogs.vmware.com/management/2020/10/configure-a-vms-vmware-log-file-to-send-messages-to-vrealize-log-insight.html (cherry picked from commit 82a31fd)

VMs with reserved memory have better memory allocation performance and -- it's suspected -- less softlock issues too. Coincidentally, many larger VMs also have high performance requirements that puts strict demands on the quick availability of their memory. Implement memory reservation and a configurable maximum number of cluster hosts that could theoretically fail and still let all VMs with memory reservation boot up. Nova provides the flavor extra_spec "quota:memory_reservation" which reserves the given amount of memory. This change does not make use of that feature, but instead introduces a parallel custom resource "CUSTOM_MEMORY_RESERVABLE_MB". The reason is that "quota:memory_reservation" does not allow for limiting the total amount of reservable memory in the same way the resource provider mechanics do. This is a requirement for tolerating the above-mentioned partial host failures, which can occur because a VMware host is a cluster of hypervisors. Add a "vm_reservable_memory_ratio" value to the cluster stats in the VMware driver, and use that to calculate and return the MEMORY_RESERVABLE_MB resource from the VMware driver's get_inventory(). (cherry picked from commit 93e22ef)

The logic in BigVmManager to decide whether a cluster is able to free another BigVM spawn-host is changed from only using the total percentage of used memory to using the amount of reserved memory as well. (cherry picked from commit 5b05204)

This makes it possible to see which caused the deallocation of a bigvm host. (cherry picked from commit 67e5c23)

(cherry picked from commit 88443e5)

Before, the resources:CUSTOM_MEMORY_RESERVABLE_MB flavor setting (which can be only partial) superseded the CONF.vmware.reserve_all_memory setting and BigVM memory reservation. Invert this to use CUSTOM_MEMORY_RESERVABLE_MB only if none of the above apply. (cherry picked from commit 9a3d778)

Clamp to a maximum of the flavor memory. (cherry picked from commit bd2d012)

If one of the new CUSTOM_NUMASIZE_* host traits is required on a (BigVM) flavor, then the host was already matched in placement, and the filter can succeed early. This is a temporary measure until we phase out host-fraction filtering altogether when the old BigVM flavors are disabled. (cherry picked from commit fdb127d)

Using these values, one can identify a vCLS VM by looking at the "managedBy" attribute of the VM's config in vSphere. These VMs are special as they're created by VMware's DRS service as an agent VM and any actions done by the user will be reverted by the cluster automatically. See [1] for more details on the why. [1] https://blogs.vmware.com/vsphere/2020/09/vsphere-7-update-1-vsphere-clustering-service-vcls.html Change-Id: I31d1ece3fa514ca42a3ccc1b348da3763b1b1388 (cherry picked from commit 7e0649c)

Those VMs came in with vSphere 7 and DRS doesn't move those VMs, even if they violate a DRS rule. Therefore, we would never see a host getting freed up, if it contained a vCLS VM. They take up 100 MiB of reserved RAM and thus should fit next to a big VM. Therefore, we ignore them. Change-Id: I737f312db0e156fa971a189d47efd227c666b178 (cherry picked from commit d403461) vmware: Fix special spawning vCLS detection Not all VMs have a "config.managedBy" attribute ... Change-Id: I0fc0b4d0c8027dd6b2c45060597cbadb60f0d649 (cherry picked from commit ae038d5)

This fixes the vCLS detection again, as we just wrote code without ever querying out the appropriate attribute. Therefore, it couldn't detect vCLS VMs at all. Change-Id: Ic425e5a0d2178afb6764cb0c84f372c2bb67908a (cherry picked from commit 07891e7)

In-buildup clusters show up without complete resource inventory, i.e. without memory resources. Log and skip these resource providers. Also add debug-logging when skipping due to missing reservable memory resource. This was the only "continue" in the else section without logging. This complicated finding out why exactly resource providers are skipped. (cherry picked from commit c356138)

Probably with switching to mariadb, the query got too slow in bigger regions, because mariadb joins the instance_extra and the instances table, which have more than 100.000 rows in a bigger region. Since we only need one instance_extra entry per missing flavor, we force mariadb to use a temporary table to join against instance_extra, which is much faster it only contains as many rows as we're missing flavors. Attention: This code probably doesn't run on PostgresSQL anymore, because "GROUP BY" doesn't behave the same way there. Change-Id: If0e95cd1d62c00490dc86ca6273e07f8d2fd98ac (cherry picked from commit 963b107)

We don't expect ironic nodes to be in a vc-* host aggregate, therefore we log it to debug instead of error for such nodes. (cherry picked from commit fe7893e)

A new filter filters out a new deployment and prevents deployment in the cluster if the used RAM goes over a certain threshold, which is a new option `resize_reserved_ram_percent`. (cherry picked from commit 6360dfa)

We're currently unable to remove a DRS override as our SOAP library is unable to create an appropriate request accepted by the vCenter. Therefore, we still allow the resize to work for now and have to manually remove the override later on. We keep the code in, so we get a reminder in the logs and sentry, that this needs fixing. We just don't fail the resize for it. Change-Id: I4d344347860c7d97d6f4b2e68d9bbac069d71b74 (cherry picked from commit d7b157f)

(cherry picked from commit 73b3789)

Follow the same pattern as in the other functions Set the local variable client_factory to the client factory of the current session and use that in the function (cherry picked from commit f7a984c)

The logic is inverted: Only if use_property_collector is active, we actually expect the cache to contain sensible values for the vm-state. So, only then it makes sense to log a cache miss (cherry picked from commit f7bdd71)

The image-meta-data passed to a live-migration doesn't have the attribute set Accessing it inconditionally will cause an exception (cherry picked from commit 2360db9)

All vm related api calls work with ManagedObject-References (morefs), while openstack works with instance uuids. In order to avoid having to call vsphere to map the instance uuid to such a moref, the driver keeps a cache. The implementation assumes that those morefs are stable However, the operator can unregister and reregister a VM, which would cause the moref to change. Any operation on the previous moref would cause a ManagedObjectNotFoundException. We retry function annotated with the decorator vm_ref_cache_heal_from_instance: Removing a stale entry from the cache will either result now in recovering a new moref or raising InstanceNotFound, which is properly handled (cherry picked from commit a953c29)

The code was getting the hardware devices in multiple places directly, and then (often, but not always) normalising the array. Functions getting the device array then also had to normalise the array again. By consolidating the retrieval and the normalisation in a function, the code becomes less repetitive (cherry picked from commit a8f3d01)

Vsphere can migrate just fine from newer to older vcenters, but the live-migration pre-check tests for that. It is more a work-around to make an option for that, as a proper solution would delegate that to the driver. (cherry picked from commit 3653f52)

For cross-vcenter migrations (live or not), we need to create ServiceLocatorSpec. This change provides the required functions (cherry picked from commit 92bf534)

In various places, own version of iterating over the results are implemented, sometimes even faulty. The following functions where only getting up to vmware.maximum_objects objects (100 by default) vm_util.get_all_cluster_mors, vm_util.get_stats_from_cluster, _SpecialVmSpawningServer._get_vms_on_host, _SpecialVmSpawningServer.free_host Only in _SpecialVmSpawningServer._get_vms_on_host we might get over the 100 items, and actually have missed ones. Previously, the results were fetched in batches of up to vmware.maximum_objects items. Using WithRetrieval yields an iterator to the results, which pages transparently to the next request. Receiver of the output of the results where changed to consume an iterator, where easily possible. Replace the quadratic algorithm in `ds_util._filter_datastores_matching_storage_policy` with one of O(n log(n)) runtime (cherry picked from commit fcabb04)

This extracts getting the hosts and reservations from a cluster in get_stats_from_cluster into its own function get_hosts_and_reservations_for_cluster. For cross-vcenter vmotion, we need to specify hosts, and we want the same ones as we use elsewhere. (cherry picked from commit 05cd96b)

Previously vmops was calling the private function `vm_util._get_server_groups` and that function would access to nova.object. All other code seem to work more the way, that the nova.object retrieval is responsibility of a VmOps (or VolumeOps) instance. `vm_util` only works with vmware-objects (plus passed instances). Additionally, `update_cluster_placement` was called in `VMwareVMOps.build_virtual_machine`, which is also called for creating a virtual machine for a vm-template out of an image. In that case, the server-groups of the instance requiring the image would be wrongly applied to the vm-template of the image. (cherry picked from commit 475fdb1)

pre-commit uses the system flake8, which might be newer and cause warnings, which are partly in conflict with the old one. So run the pinned one of the project instead Change-Id: I8a854268d3c7ea8d885915105917041430871010 (cherry picked from commit 3d81c49)

When retrieving multiple - or all - server groups, the code tries to find not deleted members for each server group in every cell individually. This is highly inefficient, which is especially noticable when the number of server groups rises. We change this to query all members of all server-groups we will reply with (i.e. from the already limited list) in advance and pass this set of existing uuids into the function formatting the server group. This is more efficient, because we only do one large query instead of up to 1000 times the number of cells. Change-Id: I3459ce7a8bec9a9e6f3a3b496a3e441078b86af0 (cherry picked from commit e676dd1)

The regex isn't used anywhere, and uses an unescaped format Change-Id: I76aaf133af517eb70fcaf3783953625c63141083 (cherry picked from commit cebfe18)

Order of imports Duplicate imports Spelling mistake Indentation Change-Id: I4ff5594b0a628fee9579761248627099b3f251b8 (cherry picked from commit 0f68df4)

We still have to old `_create_vm_group_spec()` to get the same behavior, but nobody outside of `cluster_util` uses it. The new functions `create_vm_group()`, `create_host_group()` and `create_group_spec()` expose a clearer interface and make it possible to overwrite hosts/vms instead of just appending to existing ones. This will be useful if customers can upgrade server-groups via API. Change-Id: I5444318994ac7929a24d357fabb8133410d5bd9d (cherry picked from commit 27a9012)

This splits up functionality for creating rules between VMs into `create_vm_rule()` and `create_rule_spec()` and thus enables us to use it more controlled also from the outside, e.g. in the upcoming sync loop for server-groups. Since DRS ignores the "mandatory" attribute for VM-VM rules, we remove setting it here, so it doesn't look like changing the value would make a difference. Change-Id: Ib515e02226e674d0f7cbdc3c354ade5cd77a0b8c (cherry picked from commit 9386f11)

Sometimes it's helpful to get the moref of a VM together with a list of properties. Since we support getting properties in `_list_instances_in_cluster()`, we now also support returning the moref with it. Change-Id: Ie10b95c53595db62131789a8ada1c81b7a662780 (cherry picked from commit bcc7f60)

When starting up, we start with an empty cache and thus need to query the vCenter for every instance we handle. To optimize this, we query the instances in bulk and update the cache once at startup. Change-Id: I56d746f79f1303bcf2f9ec3f66ec8b770b0e6e1c (cherry picked from commit 7c7bd31)

It's more suited there - or even better in oslo.vmware - and we want to use it in cluster_util, which would result in a cyclic import if we keep it in vm_util. Change-Id: I6ed4006d568b0cd3614965b18af5a2927bf12728 (cherry picked from commit fc17a44)

We will need this in the sync-loop after customers are enabled to change server-group memebers via API. Change-Id: I5e7f27251b6f2c09002445d6c374f887864ea19f (cherry picked from commit 3f27feb)

We will need this in the sync-loop after customers are enabled to change server-group members via API. Change-Id: Id2fbaa67b799e370331472ecccc79d99dd07e01f (cherry picked from commit eafa794)

No need for an additional indirection, as the driver is the only user of the information, and query the value directly Change-Id: I6d44910c420de4c76b6112904ccfebe3ec923098 (cherry picked from commit ffd3160)

When init_host() is called, we keep the provided host as an attribute so we can later refer to it when e.g. getting all instances for the host the driver is running for. Change-Id: I261006a727125a87c204564f95ec8797060cd557 (cherry picked from commit 598016f)

This moves out the method for getting and caching hypervisor sizes into its own class, that's supposed to be used as mixin by anybody that needs it. We keep the cache as single instance shared between everybody using the mixin, so we keep the requests to placement minimal. There shouldn't be much of a problem with concurrent access other than 2 threads updating the same value if they happen to be running at the same time. Since the value we're caching here is basically static, we don't need to change the retention time on per mixin-user basis. Change-Id: If7a3e49fad0061fcab4fe73cc792ca3b66a94003 (cherry picked from commit 81cee5c)

This filter allows us to define a threshold for small VMs, so that only they are allowed to spawn on hypervisors up to a certain size - also defined by threshold. This filter is necessary to guard against few bigger VMs clogging the small hypervisors and thus forcing smaller VMs to big hypervisors. Having too many VMs on a single hypervisor can be problematic and there fit much more small VMs onto a big hypervisor. Change-Id: Idecbe624384ca3bf323ed53d98978791d04c25cb (cherry picked from commit 8977dc6)

This weigher takes a configurable list of static weights, that are assigned to a RAM class. We define a RAM class by its upper bounds, i.e. 1024 means all HVs up to 1024 GiB of memory, not fitting in any class below. We need this weigher to prefer scheduling to certain HV sizes, while still making it possible to schedule to others if there aren't enought preferred HVs available. Co-authored-by: Fabian Wiesel <[email protected]> Change-Id: Ia042e466544b73b8dd15ee7231d3baf1da6069a1 (cherry picked from commit 6d50363)

Instead of using our own cache implementation, which only cleared the cache 10 minutes after the last write to, we switch to oslo.cache's DictCacheBackend. We gain code-reuse and a retention time per entry. Change-Id: I302ebea93dfe30eb72c9a0dfe42f5e8c956f228a (cherry picked from commit b61765f)

We used to do 2 reconfigure calls for a vmdk attach, which could lead to inconsistencies if the second - the extraConfig update - did not succeed. For the VMware driver, the volume looked not attached to the server then and thus wouldn't get detached later on. We now change this procedure to attach the vmdk and add the extraConfig entry at the same time, thus making both fail or none of them - same for the detach case. Since we only need this for vmdks, attach_disk_to_vm() and detach_disk_from_vm() only do it if the right parameters were supplied. The underlying functions then only update a given config_spec instead of creating a new one and reconfiguring the cluster. We could remove that part as the changed code-path was the only consumer. Besides producing less inconsistencies, this should also make the volume-attachment/-detachment process a little faster, because it requires only a single reconfigure task instead of two. Change-Id: Icce0c39aaed523ce4e5df97d130a7a14cfabb9c5 (cherry picked from commit 655a044)

We want to create a sync-loop for applying/removing server-groups changed by the user via API. For this we need to be able to distinguish between driver-created DRS groups/rules and admin-created ones. To do this, we introduce a prefix for the DRS group/rule name, which will work as an identifier later on. Since we're now using not only a UUID, but a UUID with a prefix, we change GroupInfo to have a "name" attribute instead of a "uuid" attribute. As we're changing how DRS groups/rules look, we need a migration to run before deploying this to production. Change-Id: I07ecd1953a85d0f53082fa9b0c49b80c2c9bf9d3 (cherry picked from commit 7984cae)

It was previously used when deleting empty DRS groups. Change-Id: I82612decf3938285c43f5e97abb48da349ad3fba (cherry picked from commit 8e0836d)

The DRS rules created by Nova do not use those VmGroups and they don't seem to server a purpose otherwise. Therfore, we only update/create a VmGroup that's not based on a server-group, i.e. an admin-defined group. We currently only have this kind of group for the special_spawning case to support a free host for big VMs. Change-Id: Ide011e157ad46037304cfdb52b1db397dde38cc8 (cherry picked from commit 8c3640b)

Since we don't create new ones, we also don't have to keep track of empty ones. Instead, we will just delete all the VmGroups once via an external script. Change-Id: I88baeadcea3a7bfe946596b169bc3abe6798d9d6 (cherry picked from commit 89dcef0)

We're going to change the API to allow updates of server-group members and need to trigger a sync in the backend, whenever such a change occurs. Therefore, we need a bunch of methods going through the whole stack to call the driver. We use a cast and not a call here, because we cannot let the API wait for all of this to happen. If the cast gets lost somehow, a sync-loop implemented in the driver will pick the change up, eventually The API will have to supply a list of hosts to call, so we only sync the group on the necessary hosts. Change-Id: I00e012ed52ba9fd36b094ecf2dc86b023f2f5a21 (cherry picked from commit f3d2ad7)

These functions are simple helper functions to create, update and delete a DRS rule, so we have to code necessary for that at a single place. Change-Id: I44aeed6f99b9803adca0062b2d7b12cc2e295f03 (cherry picked from commit 8c1fbcf)

The VMware driver supports syncing server-groups as DRS rules into the cluster managed by the nova-compute node. The method will be called when a user updates a server-group via API and might get reused when spawning a VM, too. We might be able to optimize it a little more, by keeping a local list of DRS rules instead of querying the cluster in real-time. Tests have shown, that it takes < 500ms to query the cluster, though. Change-Id: I534c035a1e2d962cf5d187d56d104e743f7ade15 (cherry picked from commit 52f42fc)

In case we missed an update to a server-group, we want to be able to recover at some point in time. Therefore, we implement a sync-loop to call the driver's sync_server_group() for every server-group UUID we find as belonging to our host and also for every DRS rule we find. Change-Id: I9a633dc87ad1aab7d5f00e5143fac97dd3b87176 (cherry picked from commit 583b1fd)

We can use sync_server_group() in update_cluster_placement() to use the same mechanism that's used when a user edits a server-group in the DB. This makes sure, that a DRS rule is only created, if it has more than 1 members and also syncs in the other member in case the newly-spawned instance is the second member. Change-Id: I62c1ae3d897f5ccda8788a4d4b23e553be8bc5cf (cherry picked from commit 2dd55ea)

We need to know the compute host to query the instances from the API for sync_server_group(), so we save it onto the VMwareVMOps instance when the VMwareVCDriver instance receives it in the init_host() call. Change-Id: I473539000dde4629e2a251cd9145c8047ce60a41 (cherry picked from commit 22ab4cb)

When instances are going away during the sync, this can lead to an error. We want to avoid that. Additionally, VMs conatined in a DRS rule cannot be vMotioned, so we want to exclude them, too. This makes it easier for the live-migration code to update the cluster appropriately. Change-Id: I8bc94aebebcd878f60c33fe009f048afeb9a42c0 (cherry picked from commit 17f48bf)

Since we only start the sync-loop for server-groups once on startup, we have to make it resistent against any exceptions - e.g. if the DB is unreachable for a time or the vCenter is unreachable at the time. Change-Id: I74036b6bfc449b407f687afac1ba4365bcbdc2ee (cherry picked from commit 2120075)

The replaced codde in cluster_util already ignored server-groups having a policy of "soft-affinity", so we have to do it in the new sync_server_group(), too. The main reason is, that we wanted to give customers the possibility to end up on the same cluster, without the need to end up on the same host, too. We also cannot implement the "soft" part currently and spawning a VM will fail if the host is too full, as there are no non-mandatory VM-to-VM rules. Change-Id: I745ac6616eefd193ce8c7a9a5cba3c68fc59ac75 (cherry picked from commit 65558c7)

This function wasn't previously available, because - as the removed comment says - there was no user-facing API that would allow removal of instance group members. Since we plan on changing the API, we need to add a pulic method. Adding this method, we also need to send notifications like we do for "add_members()" and thus added the appropriate functionality. Change-Id: I4270212b57782e5ffeaf69dc3bd57c7c60a7ffe5 (cherry picked from commit df3bd90)

We exted "server_groups._get_not_deleted()" function to return a dictionary of instance UUID to host mapping. This is a preliminary step to introducing an update of the server-groups, which will need both the instance UUIDs and the hosts to filter out to-be-removed instances and check the validity of the policy when adding instances. We cannot use "InstanceGroup.get_hosts()" for this, as it's not cell-aware and thus extend this function instead. TODO: * tests Change-Id: I253ef54560c2422baec187b350f05b1b2affc34e (cherry picked from commit a94f6f7)

This new API endpoint allows to add and/or remove members from a server-group. We found this to be necessary, because instances might get created without a server-group, but later need an HA-partner and re-installing would mean downtime or too much effort. The endpoint checks the validity of the policy is still given after all changes are done and strives for idempotency in that it allows removal/addition of already removed/added instance uuids to accommodate for requests built in parallel. TODO: * api-request docs Change-Id: I30d5d1dc3a41553b4336aad3877018989159495c (cherry picked from commit 7220be3)

Searching for a list of instance groups belonging to a list of instances can be helpful for checking if a server-group update is valid and maybe if a nova-compute wants to sync down server-groups like the VMware HV is able to. Change-Id: Iec93becf0299ec0617e99ce16c06e37c84cb33ee (cherry picked from commit 838142b)

This changes the server-group update API method to not allow a server to join a second server-group if it already joined one. The whole change to the server-group is errored out if any of the servers are already in another group to make it easier in validation and either apply the whole change requested by the user or nothing. Change-Id: I6a06e4b4a8c22737c20ab51e85ceb2bf98082b26 (cherry picked from commit 02a4c82)

The attribute group/rule gets removed by the underlying SOAP library if it's empty. Therefore, we have to check for it to exist instead of trying to iterate over it unconditionally. Change-Id: I443338bf97dcf83478e0a9971179480ecb01c009 (cherry picked from commit 6d984f6)

If a VM moves across clusters, the DRS override for the cluster will be gone. So we need to set it again Change-Id: Ic5d010de95f194ea660f10805c46ad43762bd83e (cherry picked from commit 69e181a)

We're going to switch the backing SOAP library of oslo.vmware at some point in time and should already use the compatibility layer when accessing ManangedObjectReference attributes. Change-Id: I1b18a7b7db0452a10f0adc499be2df26d923f936 (cherry picked from commit c45fa56) special_spawning: Also consider empty hosts When clusters get build up, they don't contain VMs on all hosts, yet. We failed to consider these hosts as possible targets, because our list of candidates was created from the list of VMs. To fix that, we now retrieve all hosts of the cluster first and pre-populate the dict with hosts before retrieving the VMs. Change-Id: Ibd576211ac33f38ef0e1b9016381a955916d7c1c (cherry picked from commit 2bf8738) special_spawning: Remove safety check for no returned VMs With building up new capacity and dedicating whole compute nodes to big VMs, we can end up in a situation where a cluster contains no VMs, yet, and us not allowing to spawn new ones because of that check. Therefore, we remove the check. Change-Id: I2883c5c713ee657006c80d61ebc59a086ec22411 (cherry picked from commit 3c8bd73)

After the recent changes of syncing server-groups, the syncing of rules in cluster_util.update_placement has become unused. The code can be simplified as a preparation for live-migration, as it will add complexity. VMops.update_cluster_placement now calls - sync_instance_server_group, which fetches _the_ server-group for the instance and syncs it with the existing VMOps.sync_server_group - update_admin_vm_group_membership handles the membership to special_spawning_vm_group Since that is the only remaining use-case of cluster_util.update_placement, the function has been renamed to update_vm_group_membership and reduced to that use-case Change-Id: I94c311790610fc4658f9b06ac052ad46660f1cea (cherry picked from commit a05999b)

The resource tracker iterates over all instances in _update_usage_from_instances, where it will the fetch for each instance the block-device-mappings in an rpc call. By fetching the block-device-mappings up-front, we get a single rpc/db call, instead of several small ones, where the latencies then add up If a block-device-mapping list cannot be retrieved for an instance, it will fall back to the old behaviour and fetch it individually. Change-Id: Ice9ab0a9c1b783c059687e1d992eea1f97cb3193 (cherry picked from commit 6caf72a)

During a migration, we have to consider the following situations: - On the source-host, we have to remove the group constraints as soon as the migration has started, in order to avoid that the constraints disallow the movement - On the destination-host, we have to add the rules as soon as the vm is in the vcenter, before the instance.host has been updated. Otherwise, we might remove rules added by the migration itself. The parameters to specify the cluster and the host are in preparation for the migration-task, which needs to call the sync for the source-host on the destination-host Change-Id: I2b3c626ecf4a33c3baa20489b66bb7e6b69459b6 (cherry picked from commit 81ba1e5)

If a vm has none of the requested properties, propSet will not be set. So, we need to skip over that instance Change-Id: Ia633fecb021bffe1557820e36c33ef53cf90db83 (cherry picked from commit 6e8f9c7)

In order to migrate a vm (within a vcenter) we need to be able to remove a vm from a vm-group constraining it to a set of hosts Change-Id: I8ef5ebdc54b3c3de0310e461132828aa251ee657 (cherry picked from commit e2a08bd)

When migrating or resizing an instance, the driver looks for vsphere locations properties stored with the image. We can't do that though, if the image has been deleted, so we fall back gracefully. Change-Id: I55c4d1f49e3c6fc0bb89794ac1b44da99ce009ca (cherry picked from commit 39b4f57)

Split out the changes for live-migration which are affecting the conductor Basing it on the upstream VMWareLiveMigrateData 1.0 version in the hope that we have an easier path merging the code with upstream eventually Change-Id: I5d8417c836d735122e033eeb72f7671b2558afc4 (cherry picked from commit a2c335f)

This fixes an AttributeError in _SpecialVmSpawningServer that was introduced in Ibd576211ac33f38ef0e1b9016381a955916d7c1c, where we access self._cluster_ref instead of self._cluster. Change-Id: I43d3d060f7c51841a8561a29d3189e37e4f87fb1 (cherry picked from commit 6de3153)

When we mark resource-provider with the CUSTOM_HANA_EXCLUSIVE_HOST trait, only hana_* flavors can spawn on them. For these nova-compute nodes, we want to allow spawning of large/big VMs until the hosts are full. Since memory is reserved, we make sure there's enough failover available by having >= 2 failover hosts in the cluster. Change-Id: Iaa1d18eb0a3e78bf1e361c8e8d1040aa07344448 (cherry picked from commit de76a14)

We used to keep multiple dictionaries around for the different reasons of marking providers for deletion, but we never used those individually and they cluttered the code with multiple ifs to skip over already marked providers. To fix this, we keep a single dictionary for providers to be deleted. Change-Id: Ie96a41bd9151e869fabd18d69777f38db85d0ca6 (cherry picked from commit d9010c1)

If we change settings or marked a resource-provider for bigvms manually, it can happen, that we end up with a bigvm provider without a vmware provider. This led to KeyError and nova-bigvm not continuing to free up a bigvm host. We fix this by marking all bigvm providers for deletion, that don't have a matching vmware provider available. Change-Id: I1eec50a5b1b6206e5b3eab8d5f9fa891fecb4b25 (cherry picked from commit a46e311)

When we log the cluster moref in getting the list of instances, we log the whole moref and thus create multiple newlines in the log-line. To have all data in a single line, we now only log the moref value. Change-Id: I89d3908abdffe60570cec947e9a74b544488b535 (cherry picked from commit 920d080)

If an instance cannot be found on VMware-side during an operation, we try to fetch the new moref and retry the operation. This can fail, if we passed in a moref explicitly and thus didn't look into the cache and the cache contains no value for the instance - this would raise an AttributeError. We fix this by checking the cache actually contained a value and return without retry if it didn't, as we cannot tell if the vm_ref passed belongs to the instance or not. AttributeError: 'NoneType' object has no attribute 'value' ... File "nova/compute/manager.py", line 2975, in start_instance self._power_on(context, instance) File "nova/compute/manager.py", line 2945, in _power_on block_device_info) File "nova/virt/vmwareapi/driver.py", line 633, in power_on self._vmops.power_on(instance) File "nova/virt/vmwareapi/vmops.py", line 1936, in power_on vm_util.power_on_instance(self._session, instance) File "nova/virt/vmwareapi/vm_util.py", line 326, in wrapper if obj != vm_ref.value: Change-Id: I093a0c6260cb19478c7c25c630e453dd77d39f40 (cherry picked from commit 48416da)

Tested with attached VMs for old cinder api and new (3.44+) block-device-mapping. Requires patches to cinder to "migrate" the volumes between vcenter by creating an empty shadow-vm. Nova takes care of the rest - deleting the outdated shadow-vms - Attaching the volumes to the new shadow-vms PlaceVM api is called to chose the host, but the initial placement doesn't take the server groups into account, they are applied after the migration. BigVMs are missing the probably needed special treatment (emptying of the host) The code is written in a way, that if anything goes wrong during the migration, such as - the conductor - one of the compute nodes - one of the vcenter apis being offline/restarted, that simply a second migration can be started. The migration may happen in the backround of vcenter, and will be picked up when finished. The necessary book-keeping will be updated after the migration. The migration data will only be kept in memory, and will not be recovered on restart. Change-Id: If8c7265c53b64f00292e6689d1f6860ff29c671e (cherry picked from commit ee0d89c)

When a VM gets reregistered in the vCenter, it changes its moref id. To catch this case, we previously introduced a decorator, that would react to ManagedObjectNotFoundException and retries the function without cached moref. We want the same to happen in VMwareVMOps.get_info(), as this function is called regularly to check the VM's state. To achieve this, we refactor get_info() a little to contain a local function matching the decorator's expected parameters. Change-Id: If798a3af4430a82dce9ef03a5ef097a215271b40 (cherry picked from commit bb754e8)

When using python-raven to send exceptions to Sentry, the serialization might run into a deadlock if the exception happens during server build and the NetworkInfoWrapper object is not done. We mitigate this by registering our own serializer in raven, which does not go into the content, but just prints the greenthread. Change-Id: Ie170e951e4d8d007a48d5878ec957e2e95155628 (cherry picked from commit f896d89) Fix NetworkInfoAsyncWrapper registration order As it turns out, the serializer for NetworkInfoAsyncWrapper we introduced in Ie170e951e4d8d007a48d5878ec957e2e95155628 and which's registration we already fixed in Ib6f436ded2481d99dc1b32c54974c37b94281b81 was never called, because the more generic IteratorSerializer is registered before it via the base.py of raven.utils.serializer. Since SerializationManager.register() always appends to the list, there's no way for us to let our serializer come earlier in the list than getting dirty. While we could monkeypatch our own implementation of a SerializationManager into the module, too, we rather just access private parts of the already existing SerializationManager and prepend our serializer to the list of serializers, as it's more specific than the others.

This helper functions retrieves all DRS VM overrides of a cluster and provides them as a dict. We plan to use this in the special_spawning code. Change-Id: I091878d88b8545cb094b0f534f4fa57221c33719 (cherry picked from commit 06cdb9e)

With large VMs being set to partiallyAutomated, compute nodes stayed in "waiting" state more and more often and never came out of it. Since we still need to deploy big VMs to those compute nodes and they are usually just stopped from being free by a large VM that cannot move, we often mark those compute nodes free by hand. Since this doesn't scale, we want to automate this behavior by letting the nova-compute take the same decision. Therefore, if there are only partiallyAutomated VMs left on a host, we still report that host as freed up. This is done by ignoring all partiallyAutomated VMs during the check of running VMs on the host. Change-Id: I47feb6a34c0e210f0ebb0edd4479550750e605d7 (cherry picked from commit 9d66fe3)

When at least one of the disks attached to the VM during snapshotting is associated with a profile containing storage IO control, the clone operation done during snapshotting fails with `VmConfigFault` and the more detailed "IO Filters are configured for the Source Disk vm-174314:2002, but no storage-policy selected for the destination. Select an appropriate storage-policy for destination disk.". We can mitigate this by specifying the profile in the RelocateSpec part of the CloneSpec, i.e. add an VirtualMachineRelocateSpecDiskLocator for each disk to be detached into the "location.disk" attribute. Setting the "profile" attribute on the VirtualDeviceConfigSpec we create for specifying the removal does not help. This will not add a profile for volumes attached before Cinder's queens release, as the "profile_id" was not part of connection_info before that. Change-Id: I2b71ef4a0b2ce79c287946dd15b7dc6af22439e9 (cherry picked from commit 4e06287)

The condition checking if the DRS rule should be created could never trigger, because we overwrite the checked "group" variable right before checking if it's None. We remove the condition, because there could be cases, where we had a leftover hostgroup around, but someone manually deleted the rule. In those cases, we should still make a request to create/edit the rule even if the hostgroup was not created, but just edited. Change-Id: Iac3b8c183f633bf5ddc59acf340b477bd1eb88cc (cherry picked from commit 36d3614)

DRS does not update its internal state for a rule, if the rule has become an invalid configuration. This effectively means, that the rule stays in place if we remove the hostgroup but not the rule using it. This commit changes the behavior to also remove all rules using the hostgroup, when removing the hostgroup used for special spawning. Change-Id: Ic57a71bc4e69c57833396690fc3fb5453aa122b3 (cherry picked from commit 2cdc9d4)

When we add/remove a server to/from a server-group, we have to update the server's RequestSpec's instance_group attribute, because this is used during scheduling when resizing a server to record a list of appropriate hosts for the instance. AttributeError: 'NoneType' object has no attribute 'hosts' File "oslo_messaging/rpc/server.py", line 166, in _process_incoming res = self.dispatcher.dispatch(message) File "oslo_messaging/rpc/dispatcher.py", line 265, in dispatch return self._do_dispatch(endpoint, method, ctxt, args) File "oslo_messaging/rpc/dispatcher.py", line 194, in _do_dispatch result = func(ctxt, **new_args) File "oslo_messaging/rpc/server.py", line 229, in inner return func(*args, **kwargs) File "nova/conductor/manager.py", line 94, in wrapper return fn(self, context, *args, **kwargs) File "nova/compute/utils.py", line 1246, in decorated_function return function(self, context, *args, **kwargs) File "nova/conductor/manager.py", line 298, in migrate_server host_list) File "nova/conductor/manager.py", line 370, in _cold_migrate updates, ex, request_spec) File "oslo_utils/excutils.py", line 220, in __exit__ self.force_reraise() File "oslo_utils/excutils.py", line 196, in force_reraise six.reraise(self.type_, self.value, self.tb) File "nova/conductor/manager.py", line 339, in _cold_migrate task.execute() File "nova/conductor/tasks/base.py", line 27, in wrap self.rollback() File "oslo_utils/excutils.py", line 220, in __exit__ self.force_reraise() File "oslo_utils/excutils.py", line 196, in force_reraise six.reraise(self.type_, self.value, self.tb) File "nova/conductor/tasks/base.py", line 24, in wrap return original(self) File "nova/conductor/tasks/base.py", line 42, in execute return self._execute() File "nova/conductor/tasks/migrate.py", line 174, in _execute scheduler_utils.setup_instance_group(self.context, self.request_spec) File "nova/scheduler/utils.py", line 893, in setup_instance_group request_spec.instance_group.hosts = list(group_info.hosts) Change-Id: Ic193dd3c59bc717ba5329f63054297f44127d76d (cherry picked from commit e01ef81)

We're using a single function to compute the usage of multiple quota reources. There's already a check in place that prohibits recomputing that data, if the resource's name is already in the data. When computing usage data, we use the instances belonging to a user/project. If that user/project does not have any baremetal instances of a flavor, we don't add the flavor's resource to the usage data. This means, the check is unable to skip recomputing the data. If "instances" is already in the usage data, our shared function must have been called already. Therefore, if we encounter a resource having a name starting with "instances_" - which should be only true for resources we created for baremetal flavors - we can skip the recomputation even if the resource's name is not in the usage data. (cherry picked from commit 3a820fd)

Mariadb still doesn't do the best job in executing that query (takes ~3s currently). The subquery alone takes roughly 0.3s and if we use the result of that query (in my tests 5 UUIDs) in a new query's WHERE, it takes only 0.06s. This should make up for the additional round-trip to the DB. Change-Id: I73aa89b0b76a0620265fb20caf4a18eb1f5f8311 (cherry picked from commit 6a60115)

Since the VM is gone after a migration, it won't be possible to update the vm-group membership Change-Id: I04c62f400a522bf9fe5828199c0dff80a1004f42 (cherry picked from commit 2b0c7f4)

We have to catch all exceptions in the post-migration steps, otherwise a roll-back will be initiated which we cannot do properly as the VM has already been migrated with the vsphere api. Instead the VM will be set in error state as it will require a manual inspection and intervention by an operator. Change-Id: I75faecfdd48c9f40d243aecdd2b90b89e5158335 (cherry picked from commit 93f6cc9)

The attribute 'details' isn't always set, so we have to raise the exception without recovering the missing vm_ref Change-Id: Ib1fbd90e03a0f36fb5833c71ae6cdde454e74958 (cherry picked from commit 31672d2)

The object is not only used by the driver, but in practically all modules of vmwareapi. It reduces a bit the scope of the driver module itself Change-Id: I76e446945c312e5b4fea54d04335d7d20ef3829d (cherry picked from commit 23eccae)

Until now, we handled shard-migration transparently in Cinder. This did not work well, if the migration took longer than Cinder's RPC-timeout - which is a given for volumes over a certain size. To address this, Nova will now initialize the migration and wait for it to finish. To help with that, Cinder gained a new endpoint to migrate volumes by connector - because we pass the vCenter UUID inside the connection_capabilities of it - and by returning a new error-message on attachment_update with HTTP error code 416, so Nova knows that the update failed because the volume is in another shard - or rather assigned to another backend. Since we now have to migrate a volume, callers of attachment_update now have to provide a volume_id in addition to the previous parameters. Change-Id: I9f89f2887be6f5e2f2184cd771542007393af0dd (cherry picked from commit b691628)

Instead of having a static timeout of 1 day for all volume sizes, we now compute a timeout based on the size of the volume and an assumed minimum speed that's configurable. Change-Id: I3896fdd2e368d60f75e48292af2ec201194316b3 (cherry picked from commit d4b0802)

From the instance we can derive the shard and pass the value as a scheduling hint to cinder Change-Id: I81faa098634916b64af147d20427796036dd2cbb (cherry picked from commit 5ece029)

Without setting this, it's hard to find the requests made to Cinder during the migration and thus harder to find out why the migration failed. Change-Id: I7b2cdb7fc750682f681a3457d2b5783423f896bf (cherry picked from commit ab28c80)

5min is quite a long time to keep a volume "hanging" in reserved state without a reason. We'll reduce it to a max of 1 min for now. Change-Id: I092f72db5257690e76b9e003552e7cbce3f991a7 (cherry picked from commit afe5b85)

Joined the collection of the cpu-info with the host-stats in order to avoid calling twice the property collector in different places to get information over all the host. Split the code into getting the data and aggregating it over the cluster. Partly to split the logic into more easily consumable parts, but it is also a preparation for exposing the individual esxi-hosts as hypervisors. Change-Id: I383854bb0e956519e3bdc42121b59d43ca54743d (cherry picked from commit 8bf970c)

Clients such as the cli expect cpu_info to be a json dict, so returning a string is breaking them. By aggregating the individual attributes, we also work around having different cpu models with the same flags in a cluster not yielding any information. Now we can see that the model mismatch, but the cpu feature flags are matching. Change-Id: Id552121b642ec90f6e06be09825ab7339531f9d6 (cherry picked from commit a84a785)

Host in certain states (such as disconnected ones) do not report quickStats about memory usage, so we have to assume values here, which is better than failing updating the stats for all hosts. Since they are not available, both the usage as well as the free memory will not be added to the total for the cluster. Change-Id: I05b3a158a058034d9a50a6948cdf302769f2d0eb (cherry picked from commit c934ffc)

The filter would block any operation with a source-host, that includes also migrations. Live-migrations already work across shards and cold will hopefully follow soon. Change-Id: I9f9b2c4f3eae642d78fb349f5c752711bcd94af2 (cherry picked from commit d92c2a4) Scheduler: Pass also the source-node Each compute host may have multiple hypervisors by design The 'HostState' passed to the filter is per hypervisor, and just happens to be one per hypervisor in all cases except ironic currently Change-Id: Ifcf7d7ea390562c963f6cc3a9ae5bc7efe5a5e8f (cherry picked from commit 3423e23) CpuInfoMigrationFilter: Ensure CPU-compatibility We can only do a live-migration to a 'host' with a super-set of cpu-feature flags of the source. The filter ensures Change-Id: Icc4a6d1989fe055348a120abda24ba57048d3921 (cherry picked from commit 93ac818)

The code was previously using the object-id to lookup an object, meaning that you couldn't pass a newly created Managed-object-reference like you could over the vmware-api. Now the lookup happens over the ref-id string, and in turn some functions were refactored to take that into account. Change-Id: I70b87ed5f4fe08076745f9bc389b0f42930395cf (cherry picked from commit 8e7609b)

The attachment operation may potentially be long-running, so a batch of attachments/reservations may be causing a rpc timeout. The compute node still blocks on the lock and creates a bdm regardless. This replaces the instance lock with a bdm specific lock on bdm allocation. That means reserving a device name can run concurrently with detaching a volume. This will likely create a different behaviour, but since the value has a non-zero probability of being wrong anyway, we risk it being wrong more often. Change-Id: I99921bb0f22b02c51377ae276429319639e534df (cherry picked from commit d64c094)

Instead of only reserving memory for big VMs, we now also reserve memory for large VMs, because we want to help DRS with scheduling those VMs correctly (i.e. by configured memory) and improve performance for the VMs. Change-Id: I8b3a1a63ea4c1d459ed5c731d5ef94343d9e6046 (cherry picked from commit 572ca67)

Since .ova files contain more than just the .vmdk, the size reported by glance is not the size of the actual upload. Since olso.vmware now has a guard against not-finished uploads integrated, we cannot just use the size from glance. Instead, we switch to using the size of the .vmdk as reported by the tar file. Change-Id: I05cdc3fc47974ceb34a72704d79b3f7c54c05d41 (cherry picked from commit 5d73d02)

The nova.utils.spawn and spawn_n methods transport the context (and profiling information) to the newly created threads. But the same isn't done when submitting work to thread-pools in the ComputeManager. The code doing that is extracted to a new function and called to submit the work to the thread-pools. Change-Id: I9085deaa8cf0b167d87db68e4afc4a463c00569c (cherry picked from commit 57e1efc)

As we cannot conform to the original design of volumes being accessible to all hosts, we raise an exception, in the case that is not possible and migrate the volume to a cinder-host, where it is possible. That is already done for attachment_update, which is called for the normal attachment of a volume to a server. In case of a live-migration (step pre_live_migration) creates a new attachment for the target hypervisor node, and passes the connection_info with it. It can fail for the same reason, as in attachment_update. As before, we need to migrate the volume to a cinder-host where the volume is accessible to the target hypervisor. Change-Id: I6232f34f47ae2bcb78d83f587d8edaf701c2341b (cherry picked from commit dd9c94a)

In case all_cells is not supported, user needs to use separate nova.conf just to switch cells. Thus providing support to purge for all cells. (cherry picked from commit 07e2551)

The function is already implemented in oslo.vmware Change-Id: I48ab65502d1cd825fede6f73e764a5926d949beb (cherry picked from commit 7966aa1)

get_network_with_the_name is the last place where the nova function of the same name is used instead of the one implemented in oslo_vmware. Change-Id: I4293f3b2a7551793dd53dd427383583469ed0868 (cherry picked from commit dcaf9c6)

When importing an image, and having a duplicate name we were going through all the VMs in the vcenter to find matching one. This can be potentially 100k, while it is rather likely that it is in the same location we are trying to import. We now only search in the folder. As the name is specific for the datastore, it would should only fail, if someone uses the same naming convention and place it somewhere else. Change-Id: I9e1c55d560d36768037f4036b546b80eaa21ed32 (cherry picked from commit 2ca317a)

The confusing names led apparently to a parameter passed through scheduler_hint, which is containing a filter_property, which has scheduler_hints, which is where the parameter should have ended up. Additionally, set the live-migrate flag instead of inferring the state from other flags. The flags are persisted, so we need to overwrite them. Change-Id: Id80bfd2f3e4771856cc0d85bc1b85a7d14f3b136 (cherry picked from commit 2c26b67)

We have to look up the cell of the compute node to query the database for the correct information. Change-Id: I90951af80091fe871de217bb17f98c67b1284722 (cherry picked from commit 789bb9a)

The previous version was doing it hand-coded, while this one uses the type-system, which allows us to serialise only the necessary fields (non-null), and also serialise polymorphic objects. Change-Id: I0dcafb74b3494185b3b58d78cb501069675aea33 (cherry picked from commit d77cdcb)

The advantage of this approach is, that we now properly encapsulate each compute node instead of 'messing' remotely around in another vcenter. So caching and etc should work as expected. The downside is additional hops: Before: compute -> other-vcenter Now: compute -> messaging -> other-compute -> other-vc So more ways of going wrong. Change-Id: I37be358ff7c3bd9f786e6ce086e91ff2b2fc3861 (cherry picked from commit da49100)

We changed the code to ignore the file-name, as a vmotion will result in renaming of the files breaking the heuristic to detect the root disk. Instead we were taking the first disk, when the uuid parameter was set. The uuid parameter is not set when working with shadow-vms and vms for image import. So, no special handling is needed, we always want the first disk in those cases too and so we can scrap the uuid argument. Change-Id: Ib3088cfce4f7a0b24f05d45e7830b011c4a39f42 (cherry picked from commit bd7925e)

VMwareAPISession has been moved to its own module, and this change should reflect that in the test case. Change-Id: Ie0878986db41887f9f0de0bc820135d5284df403 (cherry picked from commit 9854168)

The vmwareapi driver uses Managed-Object references throughout the code with the assumption that they are stable. It is however a database id, which may change during the runtime of the compute node If an instance is unregistered and re-registerd in the vcenter, the moref will change. By wrapping a moref in a proxy object, with an additional method to resolve the openstack object to a moref, we can hide those changes from a caller. For that the initial search/resolution needs to wrap the resulting moref in such a proxy. Change-Id: I40568d365e98359dbe90663c400e87be024df2eb (cherry picked from commit 89b5c6e) Vmware: MoRef implementation with closure This should ease the transition to stable mo-refs One simply has to pass the search function as a closure to he MoRef intstance, and the very same method will be called when an exception is raised for the stored reference. Change-Id: I98b59603a8ef3b91114f378d82cd7418d26a1c52 (cherry picked from commit c854d41) Vmware: Implement StableMoRefProy for VM references By encapsulating all the parameters for searching for the vm-ref again, we can move the retry logic to the session object, where we can try to recover the vm-ref should it result in a ManagedObjectNotFound exception Change-Id: Id382cadd685a635cc7a4a83f69b58075521c8771 (cherry picked from commit bc23e94) Vmwareapi: Move equality test to tests The equality test is only used by the tests so it is better implemented there. Change-Id: I51ee54265c4cc2b4f40c0b83f785a49f8a8ebce4 (cherry picked from commit 84f3e06) Vmwareapi: Stable Volume refs The connection_info['data'] contains the managed-object reference (moref) as well as the uuid of the volume. Should the moref become invalid for some reason, we can recover it by searching for the volume-uuid as the `config.instanceUuid` attribute of the shadow-vm. Change-Id: I0ae008fa15a7894e485370e7b585821eeb389a93 (cherry picked from commit a71ddf0)

The clone created in a snapshot would also contain the nvp.vm-uuid field in the extra-config. If we delete then the original vm, the fallback mechanism of searching for the VM by extra-config would trigger, and find the snapshot and delete that instead. Change-Id: I6a66fa07dfe864ad4deedc1cafe537959cd969f4 (cherry picked from commit 90a9f4e)

Remove datastore_regex from VMWareLiveMigrateData This was a leftover of some part of the development process and never used. Thus, we remove it again. Change-Id: I37ce67b4773375e31f18ac809a6029aa41702a3b (cherry picked from commit 17928f7) vmware: ds_util.get_datastore() supports hagroups We're going to implement hagroups of datastores and for that we need to be able to select a datastore from a specified hagroup. This is currently planned via matching the name of the datastore against a regex, that can extra the hagroup from the name. This commits adds retrieving the hagroup and checking it against the requested one to ds_util.get_datastore(). Change-Id: Ie3432a8e0b020ca9bf41abc098c0fac059af0df9 (cherry picked from commit f8e452a) vmware: Add setting datastore_hagroup_regex This setting will be used to enable distribution of ephemeral root disks between hagroups of datastores. The hagroups are found by applying this regex onto the found datastore names and should be named "a" or "b". Change-Id: I45da5dd5c46a4ba64ea521a0e0975f133b5801f1 (cherry picked from commit c10d4e8) vmware: Distribute VM root disks via hagroups We want to distribute the ephemeral root-disk of VMs belonging to the same server-group between groups of datastores (hagroups). This commit adds the mentioned functionality for spawning new VMs, offline and online migration. Change-Id: I889514432f491bac7f7b6dccc4683f414baac167 (cherry picked from commit 6feb47d) vmware: Add method to svMotion config/root-disk For distributing ephemeral root disks of VMs belonging in the same server-group between 2 hagroups, we need to be able to move the disk/config of a VM to another ephemeral datastore. This method will do an svMotion by specifying a datastore for all disks and the VMs. The ephemeral disks - found by using the datastore_regex - receive the target datastore while all other disks, which should be volumes, receive their current datastore as target. Change-Id: Iac9f2a2e35571bef3a58a22f6d96608f2b0bf343 (cherry picked from commit 01b9876) vmware: Ignore bfv instances for hagroups Boot-from-volume instances do not matter for our ephemeral-root-disk anti-affinity as Cinder manages anti-affinity for volumes and config-files going down with a datastore do not bring the instance down, but only make it inaccessible / unmanagable. The swap file could become a problem if it lives on the same datastore as the config-files, but newer compute-nodes store the swap files on node-local NVMe swap datastores in our environment, so we ignore this for now. We could solve this by passing in a config option that determines whether we should ignore bfv instances or not depending on if we detect node-local swap datastores or not. We move the generation of hagroup-relevant members of a server-group into its own function. Change-Id: Id7a7186909e236b7c81b4b8c8489e84f1067f2d4 (cherry picked from commit 2c7e2cc) vmware: Add hagroup disk placement remediation Every time a server-group is updated through the API, we call this method to verify and remedy the disk-placement of VMs in the server-group according to their hagroups. Change-Id: I7ba6b14f5c969fb77dc5ce0fed63a6d9251f556e (cherry picked from commit cc50e0d) vmware: Validate hagroup disk placement in server-group sync-loop This replaces adding an additional nanny to catch when Nova missed an update to a server-group e.g. because of a restart. Change-Id: I9aa516bfe6be127a011539d9d22a78d1f38aba13 (cherry picked from commit 09a32e2) vmware: Use instance lock for ephemeral svMotion When moving the ephemeral root-disk and the VM's config files, we take the instance-lock to serialize changes onto the VM. This makes sure, that we don't squeeze our task between other tasks in the vCenter, which would make use read an inconsistent state of the VM. Change-Id: I04fc39bd48896bfd8010f17baa934f6f828edcef (cherry picked from commit 4f5eda3) vmware: Place VMs to hagroups more randomly The previous implementation of placing a VM onto an hagroup based on the index it has in the server-group has a big disadvantage for the common use-case of replacing instances during upgrades one by one. In doing so, every VM added to the end would end up on the same hagroup. To work against this, we put VMs onto hagroups randomly by taking their UUIDs first character and use this modulo 2 as the deciding factor. These UUIDs being already generated randomly, we don't need to hash them or anything. Change-Id: Ib0d9f24ae7d5e0d4e2dceeb77a1513a8657976d2 (cherry picked from commit 52b5d4b) vmware: datastore_hagroup_regex ignores case When finding hagroups in datastores with the regex from datastore_hagroup_regex, we use re.I to ignore the case so that an error made by an operator in naming the datastore does not break the feature. Change-Id: I4de760d99513abc9977f698aaba85b6456709ca6

Prior to this, the driver was performing migration/resize in a way that could lead a VM into an inconsistent state and was not following the way nova does the allocations during a migration. Nova expects the driver to do the following steps * mirate_disk_and_power_off() - copies the disk to the dest compute * finish_resize() - powers up the VM in the dest compute This change removes the RelocateVM_Task and introduces a new CloneVM_Task instead, in migrate_disk_and_power_off(). The CloneVM_Task also allows now cross-vcenter migrations. Co-Authored-by: Marius Leustean <[email protected]> Change-Id: I9d6f715faecc6782f93a3cd7f83f85f5ece02e60 (cherry picked from commit 95f9036)

If we attach a volume to a VM, we have to set the storage-profile. Otherwise, the VM will not be compliant to the profile and - especially on VMFS datastores - cannot be storage-vMotioned around if the storage-profile includes storage-IO control. With setting the profile for each disk-attachment, the VM also shows compliant to these profiles in the UI. Change-Id: Idad6293dc7dfdf46fed584b9c116c03f928d44fe (cherry picked from commit dabcbca)

If a shadow-vm is missing, we raise an AttributeError, which is not clearly identifying the reason of the failure. We better re-raise the original ManagedObjectNotFound exception, so it is more clearly identifiable Change-Id: I954c57e97961833208743bc88e3ce75ad23cfe8c (cherry picked from commit a5a9dd9)

If multiple nics are attached, they need different device-keys otherwise the vmwareapi will reject the request Change-Id: I0aa58ad11c499e9423c7ecc7998325b05dd9147e (cherry picked from commit 8ba8b32)

When spawning a VM with more than 128 cores, we set numCoresPerSocket and some flags e.g. vvtdEnabled. We missed to add the same flags when resizing to a VM having more than 128 cores. This patch remedies that. Change-Id: I381a413ecf80af14dd4bf1dfde2d070976b6477a (cherry picked from commit cfd906b)

(cherry picked from commit a0dc4cb)

When simply cloning the original VM, the size might not fit on the target hypervisor. Resizing it to the target size might not fit on the source hypervisor. So we simply scale it to minimal size, as we are going to reconfigure it to the proper size on the target hypervisor anyway. Change-Id: Ia05e5b3a5d6913bfcef01fa97465a1aaa69872d0 (cherry picked from commit 40d6589) Vmware: Warn about failed drs override removal An error needs manual intervention, and an exception debugging from a developer. This, however, is a known behaviour which potentially can lead to problems, hence a warning Change-Id: I9479fb6405485e763a6344e7f44a60f75891adcb (cherry picked from commit f88a96c)

When VMs with lots of CPUs are running for a longer period of time, a task to reconfigure the VM might end up hanging in the vCenter. According to VMware support, this problem happens if those VMs are running for a longer period of time and with the large number of CPUs have accumulated enough differences between those CPUs, that getting them all into a state where a reconfigure can be executed takes more time than the default 2s (iirc). The advanced setting to increase this time is "migration.dataTimeout". For simplicity reasons and because it shouldn't hurt (according to VMware), we set it on all big VMs. That way, we do not have to figure out if the VM consumes enough CPUs of the hypervisor to need this setting. Change-Id: Id8bda847c9e48997b385d9e1079ee9e99af9b8e8 (cherry picked from commit 2f7393c)

Until now, we only kept image-template-VMs that had tasks that showed their ussage - but VMs cloned from another image-template-VM don't have any tasks. Thus we immediately removed VMs we cloned to another BB. This could even happen when the copying of the disk into the cache directory was still in progress. To counter this, we now take the "createDate" of the VM into accound and only delete image-cache-VMs that were created more than CONF.remove_unused_original_minimum_age_seconds ago. Additionally, we take the same lock we also take in deploying image-cache-VMs and copying their files. This should protect from deleting the VM while a copy-operation is still in progress. Deleting the VM while copying is still in progress does not stop the copying. Therefore, this race-condition might be responsible for a lot of orphan vmdks of image-cache-VMs on our ephemeral datastores. Change-Id: Ic0a694a8c4df203c8c100abf5b8d2e9ee73866f7 (cherry picked from commit d8f3ddf)

This filter enable to select same host-aggregate/shard/VC for instance resize because it could take more time to migrate the volumes over other shards. (cherry picked from commit f648b9b)

Resizing to a different flavour may imply also a different hw-version, so we need to set it otherwise it will stay on the previous one, which may be incompatible with the desired configuration Only upgrade is possible though. Change-Id: I7976a377c3e8944483a10fdada391e8c51640e30 (cherry picked from commit 28fb1a4) Vmware: Only change hw_version by flavor Be more strict in the upgrade policy, and only upgrade on resize, if the flavor demands it. Not if the default has changed Change-Id: I25a6eb352316f986b179204199b098a418991860

When switching to filtering the AZ via placement, we need the bigvm resource provider to be in the AZ aggregate in addition to being in the aggregate of the host's resource provider. Therefore, we find the host aggregate by seeing which aggregate is also a hypervisor uuid. Change-Id: I250f203b3bb24e084ec1b499a923f7f66e638102 (cherry picked from commit 29ce312) bigvm: Do not remove parent provider's previous aggregates When we filter AZs in placement, we don't want the aggregates our resource providers removed by nova-bigvm, as they represent the AZ. Therefore, we query the aggregates of the "parent" provider and make sure to include these aggregates, if we have to set the resource provider's UUID as an aggregate, too. Change-Id: If3986df022273f20e109816f2752ce0254db4f10 (cherry picked from commit 2e98cd4) bigvm: Ignore deleted ComputeNode instances Querying via ComputeNodeList also returns deleted ComputeNode instances. Therefore, we might create bigvm-deployment resource providers for a deleted instance instead of the right instance and thus for a wrong resource provider. With ignoring deleted ComputeNode instances, this should not happen anymore. Change-Id: I5a4c6c5a1894d1f6f5cff6e3475670c27bb97f28 (cherry picked from commit f7f5f0c)

There can be Ironic hosts, that only have nodes assigned when those nodes are getting repairs or getting build up. Those Ironic hosts would come up empty when searching for ComputeNodes in the sync_aggregates command and would be reported as a problem, which makes the command fail with exit-code 5. Since it's no problem if an Ironic Host doesn't have a ComputeNode, because each node is its own resource provider in placement anyways, we ignore Ironic hosts without nodes now in the error-reporting. Change-Id: I163f3e46f2e375531b870a363b84bba67816954d (cherry picked from commit 67779eb)

The DRS rules can be read from the "rule" attribute, not from the "rules" attribute. We found this, because Nova wasn't deleting DRS rules for no-longer-existing server-groups. Change-Id: I86f7ca85d9b0edc1406a54a6f392bfff8f0af00d (cherry picked from commit 562b084)

When syncing a server-group's DRS rule(s) we now also enable a found rule in case it is disabled. We don't know how this happens, but sometimes rules get disabled and we need them to be enabled to guarantee the customer the appropriate (anti-)affinity settings. Change-Id: Ibc8eb6800640855513716412266fcbb9fbc4db42 (cherry picked from commit d712c23)

When we don't find any datastores for whatever reason, we don't have the "dc_info" variable set and thus cannot call self._age_cached_image_templates() with it as it results in an UnboundLocal exception. Change-Id: I2dca6d2d6ab7ca5cbc4ef7d2c316faaf6edfee7d (cherry picked from commit d2cf44f)

The properties may not be set, if the host is disconnected. Change-Id: I1c53477e891b5b95859ca267fcad8cd1bff260ef (cherry picked from commit 0cb8b61)

Most code related to vms is in vmops, not in the driver So we move this code there too Change-Id: I1b801c8f12b377dd74a31ef646216c564631fe7f (cherry picked from commit ade6f4c)

This requires a change to oslo.vmware to accept a string instead of only a cookiejar. Depends-On: Ia9f16758c388afe0fe05034162f516844ebc6b2b Change-Id: I34a0c275ed48489954e50eb15f8ea11c4f6b1aa6 (cherry picked from commit 726d7a2)

While we cannot live-migrate CD-Roms directly between vcenters, we can copy the data and detach/reattach the device manually. Change-Id: I88b4903f745e1bcfe957ddc07c6e9c040820ed6b (cherry picked from commit 14f9a5f)

Since the mission is to delete the attachment, Cinder returning a 404 on attachment deletion call can be ignored. We've seen this happening where Cinder took some time to delete the attachment so Nova retried as it got a 500 back. On this retry, Nova got a 404 and left the BDM entry as leftover while aborting the deletion, that already happened by driver. Change-Id: I15dd7b59a2b3c528ecad3b337b92885b4d7bd68f (cherry picked from commit 82992a5)

Apparently, the volume-id is not consistently stored as volume_id in connection_info. Use the block_device.get_volume_id function to handle the fallback. Change-Id: If5a8527578db8e4690595524e0785ee8b4de1d79 (cherry picked from commit 607fd0d)

Since we don't explicitly set a disk as boot disk and instead rely on the order the disks have on the VirtualMachine, we need to make sure we attach the root disk first. Change-Id: I3ae6b5f053a3b171ed0a80215fc4204a2bf32481 (cherry picked from commit 7e6dc54)

We've recently changed that not all large VMs need DRS disabled - only the ones over 512 GiB memory. But we still need memory reservations for VMs of 230 GiB - 512, which was previously handled by them being large VMs. While we could do this via flavor, we failed to do so. Additionally, this would limit the amount of large VMs we can spawn on a cluster. To keep the same behavior we previously had for large VMs, we now split memory reservations from big/large VM detection with the following result: 1) a big VM will get DRS disabled - big VMs are VMs bigger than 1024 GiB 2) a large VM will get DRS disabled - large VMs are VMs bigger than 512 GiB 3) all VMs defining CUSTOM_MEMORY_RESERVABLE_MB resources in their flavor get that amount of memory reserved 4) all VMs above full_reservation_memory_mb config setting get all their memory reserved Therefore, is_big_vm() and is_large_vm() now only handle DRS settings and special spawning behavior. Side effect is, that nova-bigvm or rather the special spawning code now doesn't consider 230 GiB - 512 GiB VMs as non-movable anymore and thus finds more free hosts. Change-Id: I2088afecf367efc380f9a0a88e5d18251a19e3a5 (cherry picked from commit dca6fe6)

This reverts commit 5c9c81e. We need the stein migrations non-compacted to be able to migrate from rocky. Our versions is 393 instead of upstream's 391 because, we added a migration 392 in our Rocky release and are thus already on 392 without having applied 391 coming later in Stein. Starting at 393, we have the possibility to add upstream's 391 as our 393. We did not add the nova/tests/* changes of the original commit back, because I don't think we need to test them if they were tested upstream before. Additionally, we had to change the version in the newer nova/db/migration.py instead of the previously changed nova/db/sqlalchemy/migration.py Change-Id: I57a163b1b603f0ac4a52ae7f6d58785cdd835530

This reverts commit f0175a3. We need the Stein migrations uncompacted to be able to migrate from Rocky. Since we already had a 392 migration in our downstream Rocky changes, we move upstream's 391 to 393 and add a placeholder for 391, since our DB is already at 392. Our downstream 392 will be added in a later commit. Change-Id: Ic8bebe7fb0770e60dd9856df9d529247e474e2c3

…5 chars. The 'internal_access_path' column of 'console_auth_tokens' table has the type String with max length of 255 characters which might not be enough when used with VmWare and that will cause a DBDataError. This migration changes the internal_access_path type to be Text with max length of 65535 characters. And also adds a placeholder for migration version number 391. Change-Id: I4463f01ae727edd5e76b4a50860b116cbdea6124 Closes-Bug: #1900371

This reverts commit df89596. We need the migrations non-compacted, because we are upgrading from Rocky. Change-Id: I68bcbc90d543b526b6abed61f9326109d9727c4f

This reverts commit dae3c89. We need these migration non-compacted, because we're upgrading from Rocky. Change-Id: I41b5f8d46639810f3be139e18ac2f399e4f637f8

We cannot use the upper-contraints.txt with URLs anymorme, because newer pip doesn't like that. Additionally, we want our repositories with full git history around for easier debugging - the other services are doing it like that, too. We also add our git version of oslo.vmware to the text-requirements and switch the tox default for the constraints file to our version, that does not contain an oslo.vmware pinning so we can install from git. Change-Id: Ia5d8fe096e15b9b244573d662ea73613b3c68744

As described in [0] the newer versions of os-brick require the "lock_path" to be set - which is not done in the ComputeManagerUnitTestCase and TestDriverBlockDevice. Since detaching a volume uses os-brick to do the locking, the unit tests would fail. This commit fixes this setting REQUIRES_LOCKING as suggested in [0] - I don't know why that wasn't committed upstream, though. [0] https://bugs.launchpad.net/os-brick/+bug/1969794 Change-Id: Ifd01a0b38143839719ab2de4ee53e6aa7752146b

There are some imports that changed as the compaction of the DB migrations was a little longer ago ... Change-Id: I3ad9a1dd3144ff09a73382d4233cf5f995dee2d8

…than 255 chars.

This task is run by our internal CI during image-build to check that the unit-tests pass. Change-Id: I89a03514093682b9bd2a1c48a13c6f7206b2e9e4

We previously built a separate image for the nova-console-shellinaboxproxy, but are now using the same images as the rest of Nova, because Nova's dependencies can now support mitmproxy. Therefore, we add mitmproxy as custom requirement to be installed in our image. Change-Id: Ib3b60f86434938b0805778650d6d9694cfd922bd

Newer versions of mitmproxy don't support "-R", "--port" and "--bind-address" anymore. Instead, the reverse-proxy mode is now set with "--mode reverse:URL", the port is defined with "--listen-port" and the binding address with "--listen-host". We switch to calling mitmproxy without a shell here, because we don't need any features of the shell. This makes the command definition more readable, too. Change-Id: Iaa75d1771f0b998484012debe408349ba139e6b5

The NovaProxyRequestHandlerBase class we based our NovaShellInaBoxProxy upon when implementing db token support in shellinabox: add support for db tokens was merged into NovaProxyRequestHandler in trivial: Merge unnecessary 'NovaProxyRequestHandlerBase' separation Therefore, we now switch our base to NovaProxyRequestHandler, but do not call the parent's init as we do not need any functionality of websockify.ProxyRequestHandler in our class. The compatibility of the __init__() needs to be checked on upgrade. Change-Id: I4fda6d5251d671af161441c8cb8bbe091bb970b4

We previously had proxyclient_url as optional positional argument to nova-shellinaboxproxy, but with the update to Xena, this changed to a required argument. We usually configure nova-shellinaboxproxy via nova.conf and thus don't need the required argument. Therefore, we make it non-positional now and thus available via --shellinabox-proxyclient_url instead. Change-Id: Iefeeaa8169835c2bbbe74fe59ab1b9588b8ee636

When Nova computes the wait time for volume-migration, it uses the volume size. For really small volumes (e.g. 1 GiB) the computed time is lower than the overhead added by the API and RPC calls inside Cinder. Therefore, Nova times out to wait for the migration, even though the migration happens as expected. To fix this, we add an additional static overhead for the migration timeout, defaulting to 10min. Change-Id: I1532054524653bc9dfaf5010f3250ea6bff03701

In our Rocky code-base we already had migration 403 backported as migration 320 and thus unconditionally adding the UniqueConstraint does not work for 403. Instead, we now check if there's already a UniqueConstraint on the table containing the columns we would want to add and return without applying actions if that's the case. Change-Id: Ie0cba9500945cd08d6c418cc9719aea7ede80e90

We can make use of the changes to the scheduler to simplify the internal logic. If it is a resize is now stored in the request_spec, as well as the source host, so we do not have to reconstruct it from instance data anymore. Change-Id: I4b016448a5a905a5d9833aa821daed186d7f1f8a

Apparently, the vmware api can report datastores with more free space than capacity, which causes an exception in the olso_vmware api. Therefore, we limit the free space to the capacity. Change-Id: If6013022a8a32029d43f9074eaaeea5b55855104

oslo.vmware contains a new function `image_pull_from_url` to pass the image URL directly to VMware for downloading the image. This can be feature toggled on/off via: [vmware]/allow_pulling_images_from_url

The notification causes a load of all projects for each flavor created, times for each test set up. Patching it out reduces total runtime of the unit tests by 20% Change-Id: Ib3b1f2bc401be67d043b723ecff59a0c45d9f81d

A few vCenters have shown intermitent errors while verifying the SSL connection to Swift endpoint, thus throwing ocassionally the vim.fault.SSLVerifyFault exception. In such unexpected scenarios we can still fallback to transfer the image by uploading it to the HttpNfcLease URL.

Depends-On: Ie76b1e6940b5022563ce91d5692df589573704d0 Change-Id: I6fe097a9d2a83115f73c51016914ea18b708292b

This commit removes tenant (project) filter for security group list in two places `nova-api` and `nova-compute`. Nova will not be able to get the wrong security groups (that are not allowed) because in these places user's context is used. Security By adding additional debug logs I printed the list of security groups that Nova got from Neutron API and compare it with the available security group list. Both lists match 100%. Non-unique name problems If the user will have non-unique names for security groups: one in the project and the second one shared Nova API will return the error `More than one SecurityGroup exists with the name '...'`. Example: $ openstack server create --network=test-net --flavor=24 \ --image=test-image --security-group=test-sg-rbac test More than one SecurityGroup exists with the name 'test-sg-rbac'. Performance As a result of these changes, Nova gets all security groups without filtering from Neutron API and it will be slower. Performance comparison: API call with project_id filter: time_namelookup: 0.004326s time_connect: 0.041808s time_appconnect: 0.137256s time_pretransfer: 0.137411s time_redirect: 0.000000s time_starttransfer: 0.611862s ---------- time_total: 0.778839s API call without filtering: time_namelookup: 0.006605s time_connect: 0.046886s time_appconnect: 0.145570s time_pretransfer: 0.145793s time_redirect: 0.000000s time_starttransfer: 0.773550s ---------- time_total: 0.938802s Change-Id: Ic859328ddc907311537a680b3aa18b1983474c14

In case we already created a binding for a live-migration but crashed during the ongoing process, neutron will already hold a port binding for the host. Instead of failing, we can simply take the existing port-binding and continue Change-Id: If84c74e258084d4ab648a6a413896eda087317d7 See: https://specs.openstack.org/openstack/neutron-specs/specs/backlog/pike/portbinding_information_for_nova.html

Instead of immediately checking the default DRS behavior setting, we now try to find a free host first and only error out if we would need to rely on DRS to free up a host. This makes it possible to support BBs where an operator already freed up a host manually. Change-Id: I22dbdcc9f135bbfc9ef05e13c801e88a78e64236

Change-Id: I05dffb99a2f4ae5d629871c96642983435ac79b4

This is a fixup for commit 072b15f [vmware] Add configurable reservations per hostgroup Change-Id: I191aedb0e3bba7698825771089cf134f320368ec

With upgrading to Xena and thus Nova using newer microversions to talk to Placement, we cannot report floating point numbers for the resources anymore. This came up in DISK_GB, which is computed from MB and thus not always a round number. We fix this by converting the numbers to int and thus cutting off the values after the comma. With that, we under-report resources, which seems better than over-reporting them, because we could not fit everything in when over-reporting. Change-Id: I0d364f347afa235ed2b7e8ae90f5851275b7738e

rc_fields got removed with the placement-api removal from Nova and is replaced by its own package os_resource_classes. SchedulerClient got resolved, because it was only proxying functions through to SchedulerReportClient at some point in time, so we use SchedulerReportClient directly now. Change-Id: Iabeae6e01f9615be7c122d1e3fd719a1e53762d9

Since we don't want to enable scoped tokens, yet, we still have policies relying on checking the owner like so "project_id:%(project_id)s". With all the patches to "Pass the actual target" landing since Rocky, there are multiple APIs that do not work with the old owner check anymore, because the patches change away from the old default behavior of passing the token's project_id. Instead, they pass an empty dict. Since we want to change to scoped tokens at some point in time, we add our own "target_has_no_project_id" so we can support both resources assigned to a project and generic requests e.g. for listing availability zones more easily by specifying "rule:owner or target_has_no_project_id:True". Change-Id: Ia41f120cdc5f9eaea9b119e15115033964113085

This reverts commit 48cefa6. The description given in the commit is not correct. We switched from Python 2 to Python 3 and thus got some changes in how "/" is interpreted. Previously, if there were only integers involved, the outcome would be an integer. With Python 3 the outcome is of type float and one needs to use // for integer division. We will accommodate for that in the next commit.

In Python 2, division with the / operator results in an integer-type result if divisor and dividend are integer-typed - if one of them is of type float, then the result will be a float. Python 3 made this more explicit by introducing the // operator which always results in an integer result while changing the / operator to always return floats. Some of our code relied on the old behavior and we need to update it to use // so we don't put floats to placement or the vCenter where they don't understand them. There's even some code that relied on the new behavior and newer worked before - we don't update it here. Change-Id: Ib81728cc8dcde852a035bfbbd380435ed06c56ba

Only in a subset of the situations the filters or weighers are interested in the placement of very specific instances for the scheduling decision. But the host-manager fetches/holds all the instances for all the hosts, which at a sufficient scale occupies the scheduler fully with book-keeping of the instances. As a first step, return the instance-ids each filter/weigher is interested in, and skip on updating that information, if none is required. At a later step, the update can be limited to those instances. Change-Id: I3ea05f98e300bbf0e4b0b42ad334e86d34b21ab6

The function is called from other code-paths as well, and we need to preserve the old semantics for use-cases besides the FilteringScheduler (such as the CachingScheduler) Change-Id: Id99e08fa5e833b197324ccf525a5fbcdfcce318a

The refactoring was incomplete, still having the old function names instead of host_info_requiring_instance_ids Change-Id: Ibb69e15654ec6818a1bc920b1c8197f6a3c52080

Both host_info_requiring_instance_ids as well as host_passes/_weigh_object had duplicated code for extracting the instance-ids needed By consolidating them we reduce the code duplication. Change-Id: Icfc1d3e554ff0834dec35d52772996284dc0a5da

…lters request any" Change-Id: I9b82a2f6367f8b77c9e1ca3296eced66b094c628

We want to know when a service was first activated and last deactivated. For this, we log a line every time a service is enabled/disabled. The log line can then be saved in long-term storage and looked up again. Change-Id: Ia904ac8108dd384d1675eba5250a38b77a5a8184

We need the VCState inside the VMwareVMOps instance to access information about different hosts. We plan to use this for splitting server-groups based on the amount of available hosts in the cluster, but it can also be used for scheduling big VMs. Change-Id: I47edac9a81ef9a02cf07ab05e63edb9ed02d17b7

Since VMware doesn't support the "mandatory" setting on VM-VM DRS rules, all rules we create are mandatory. This leads to VMs being unable to spawn in a cluster, if there are already as many VMs in the same soft-anti-affinity server-group as there are hosts in the cluster. Customers expect soft-anti-affinity to work also in this case. To accommodate for that, we now split members of soft-anti-affinity server-groups into multiple chunks. Each chunk has the size of available hosts in the cluster. For each chunk, we create an anti-affinity DRS rule. We try to make the members of each chunk stable by sorting the members in the server-group, but this commit probably still leads to more updates of DRS rules in those bigger server-groups. Since there can be rules in a no-longer-used chunk and since there are currently already rules with a different naming scheme (i.e. without the trailing number), we also fetch all rules of the same prefix and "update" them. Updating a rule without members leads to deletion of this rule. Change-Id: Id28fcc71193b491a1ac57e5c4f28c3b4862eeee5

This function iterates over all datastores in the vCenter trying to find a datastore with the given name. Might be a good candidate for downporting into oslo.vmware. Change-Id: I3fc7f171592c2cd21b765e0eb0218bf87d45a37c

This function returns the path to a VM's .vmx file parsed into a DatastorePath object. While this method is small enough for inlining it into any code, that code is easier to unit-test with this function. Change-Id: If37768910803a9b456c0328a6904c2d53b96cccf

This sets the supports_bfv_rescue capability for the vmwareapi driver and updates the rescue function to put the rescue disk next to the vmx file of the VM. While the previous code would have worked - at least on VMFS datastores, not sure about vVol - , it would have put the rescue disk onto the volume's datastore into the volume's directory. We don't want this as it skews Cinder's resource counting. Therefore, we now change the code to look up the path of the vmx file instead of taking the first vmdk's path and use its datastore and folder to place the rescue disk. Change-Id: Id707de9f273f618711dab8aa2e2a88dd8d942a6e

Some password policies require more than one occurence of symbols of one kind, or make restrictions about their occurence effectively requiring them to occure more often. By providing the configuration value 'password_all_group_samples' the administrator can increase the rounds to sample from all groups to adhere to such policies Often, password policies require not only ascii letters (upper/lower-case) and numbers, but also other printable characters in the password as a fourth symbol group. By makeing the symbol-classes the multi-string list 'password_symbol_groups', the administrator can add those and also override the other classes, if desired. Change-Id: I5b995883a41f65296de86f3effa0102ecb12c1fa

Original code can be found at: https://opendev.org/x/nova-mksproxy Integrate the code into nova as a builtin command. Subclass `base.SecurityProxy` into `MksSecurityProxy`. Because MKS' `connect_info.internal_access_path` contains more than just a `path` (namely a JSON object with "ticket", "cfgFile" & "thumbprint") add a new `parse_internal_access_path()` method in `NovaProxyRequestHandler`. This method tries to parse `internal_acess_path` as JSON and if that fails, puts the contents in the "path" key of a new `internal_access_path_data` dict. Co-authored-by: Johannes Kulik <[email protected]>

Has to be "nova-{NAME}proxy".

The old script had a --verbose flag that we are using in the k8s deployment definition for all regions. This will include both Xena and older Rocky deployments. The new name of this flag is --mks-verbose. Instead of adding an image version check in the helm-chart, add an alias for the flag instead, for the duration of the update, and revert this after the Xena upgrade is through everywhere.

Physical CPUs are faster than hyperthreaded CPUs. So by default VMware spreads a VM's CPU cores onto physical cores, spanning multiple NUMA nodes if needed although the vCPUs would fit onto a smaller number of NUMA nodes. HANA VMs prefer low-latency (and thus NUMA-locality) over raw performance, so they set the VMware config `preferHT` to not get spread out. Before this setting was only applied if the VM's flavor qualified as a "Big VM" flavor (i.e. >1GiB) in size. This excluded smaller HANA flavors and a single-NUMA-node VM got spread over two nodes. Make `preferHT` depend on whether the flavor requires one of our `CUSTOM_NUMASIZE_*` traits instead.

Another best-practice recommendation for HANA-on-VMware. The code pretty much duplicates the one for setting the DRS partially- automated behavior. I decided against deduplication and abstraction in favor of readability and at the cost of LOC.

Testing showed that single- and half-numa-node VMs get spawned across multiple NUMA nodes due to `numa.autosize.vcpu.maxPerVirtualNode` being too low. Set the value explictly to the same value that's used for the `CoresPerSocket` cluster-vm setting.

Introduced in 'vmware: Restart BigVMs with "high" priority', the function "update_cluster_das_vm_override()" did not work, because it created a "ClusterDasVmConfigInfo" instead of a "ClusterDasVmConfigSpec" object. This lead to us not being able to spawn BigVMs with the following error: Exception in ReconfigureComputeResource_Task. Cause: Type not found: 'operation' Change-Id: If9acf9ee07e373b7b24c14c642d0d99fe2a41db1

Copy the relevant code from the libvirt driver. This reintroduces the code that was changed (presumably by accident) in fd4f43b VmWare: Refactored resource and inventory collection Fixup that commit with this one when forward porting!

`remove_unused_original_minimum_age_seconds` along with several other config options related to image cache management were moved to their own config group `image_cache` in Ussuri (nova 21), but the cherry- pick in 9de18bd "vmware: Update expired image-template-VM handling" failed to account for this.

Was removed in f3cc311 "vmware: Remove vestigial nova-network support" but not taken into account by aba0dbb "VMware: Image as VM template".

In daeeafd "baremetal quota handling" the `_build_per_flavor_limits()` method is incorrectly rewritten to match surrounding code, esp. `_build_absolute_limits()`, and uses not only the limit, but limit and in_use as value for the resulting limit dict. The original commit, 16857ed "reintroduce baremetal quota handling" used `absolute_limits` directly as method parameter.

For historic reasons, there is a mapping between the volume-uuid and the device[].backing.uuid in vmware through extraConfig. In most cases they are the same, but sometimes they are not. And apparently, the extraProperties can even hold wrong information. Since the risk of a collision of uuids is quite low, and lookup is the iteration over a small list of volumes we first try it that way, avoiding possibly incorrect information in the extraConfig. Failing that, we still do the additional api call to get the mapping, and try that. Work around type errors mypy raises in the changed file Change-Id: Ifcdf96cfc6d00473299c1f2a7cb9d23d03294027

The ironic-python-agent uses qemu-img to convert various image formats to the raw disk format. QCow2 is already correctly marked accordingly, but vmdk was missing Change-Id: Ifd868e6951452b291184bd848d4244d37649f1ed

We need to seperate the arguments for tox from the ones to the linter Change-Id: I6d7ee1d95f0ca17fa6c04a3546a7f907cc6393f1

Nested `ClusterDasVmSettings` inside new `ClusterDasVmConfigInfo` objects don't get created properly. This seems to have worked on Rocky but fails on Xena.

Otherwise will try and decode as UTF-8-encoded text during read().

We have added custom filters and weighers for our common use-case VMware, and excluded there already baremetal. For libvirt as additional hypervisor, we also want to skip those filters/weighers, so we test now if the request is scheduled for vmware either due to a flavor extra_spec or an image property If nothing is set, we default to vmware for backwards compatibility Change-Id: I3223aee433ba9009d2cd6387eeda8e70dbbb3cde

The code assumes that the libvirt version reported by the hypervisor matches the version of the python library installed. Since we run the agent in a container, that is not correct, and the attribute will not be in the library as it has been build for an older version. Change-Id: I156047b1f9829a49b429d51ca7f7777606b10a56

Change-Id: I3edf0d3c6deb8385adc1095c7e3c09526e2d4d65

This command detects changes in the flavors and syncs the instances flavor-info with the new changes. `nova-manage sap sync_instances_flavor`

The vcenter doesn't like a reconfiguration call with an empty spec, and raises an error. So we skip that and save ourselves from doing some work on top. Change-Id: I1da3309500a2cd384c5d7cd431e71297ef5d5a3a

You can configure now [vmware] hypervisor_mode=<cluster,esxi_to_cluster,cluster_to_esxi> to chose to expose the individual esxi hosts instead of the cluster. It will only show the esxi-host, as hypervisor node, but will not honour the node parameter for placeing the instance for instance-creation, migration, etc... Change-Id: I264fd242c0de01ae8442c03bc726a0abfbe176ef

The code assumes that there is a single compute-node per host, which is incorrect for ironic. By summarising over all hosts, we get a correct response also for compute-hosts with more than one compute-node. Change-Id: Iaf6e2a72f6649e234660de47bec8b1da1ea1571e

Commits on Jan 27, 2023

mks-proxy: Fix missing entrypoint

grandchild committed Jan 27, 2023

Configuration menu

View commit details

Copy full SHA for c52003a

Browse repository at this point

Copy the full SHA

c52003a View commit details

Browse the repository at this point in the history

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: Hypervisor per ESXi #233

WIP: Hypervisor per ESXi #233

Commits on Aug 8, 2022

Commits on Aug 9, 2022

Commits on Aug 11, 2022

Commits on Aug 15, 2022

Commits on Aug 19, 2022

Commits on Aug 23, 2022

Commits on Sep 1, 2022

Commits on Oct 5, 2022

Commits on Oct 25, 2022

Commits on Nov 9, 2022

Commits on Nov 10, 2022

Commits on Nov 11, 2022

Commits on Nov 17, 2022

Commits on Nov 23, 2022

Commits on Nov 24, 2022

Commits on Nov 28, 2022

Commits on Nov 30, 2022

Commits on Jan 26, 2023

Commits on Jan 27, 2023

Commits on Feb 1, 2023

Commits on Feb 3, 2023

Commits on Feb 13, 2023

Commits on Feb 17, 2023

Commits on Feb 21, 2023

Commits on Mar 2, 2023

Commits on Mar 3, 2023

Commits on Mar 10, 2023

Commits on Mar 15, 2023

Commits on Mar 16, 2023

Commits on Mar 17, 2023

Commits on Mar 21, 2023

Commits on Mar 27, 2023

Commits on Mar 31, 2023