forked from openstack/nova
-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Hypervisor per ESXi #233
Closed
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
fwiesel
force-pushed
the
hypervisor_per_esxi
branch
2 times, most recently
from
August 27, 2021 11:24
27166a8
to
d5c92e7
Compare
fwiesel
force-pushed
the
hypervisor_per_esxi
branch
5 times, most recently
from
September 21, 2021 07:49
59ccaf7
to
57ca92b
Compare
fwiesel
force-pushed
the
hypervisor_per_esxi
branch
from
October 21, 2021 14:20
57ca92b
to
8999084
Compare
fwiesel
force-pushed
the
hypervisor_per_esxi
branch
from
November 3, 2021 09:30
8999084
to
e99307a
Compare
fwiesel
force-pushed
the
hypervisor_per_esxi
branch
2 times, most recently
from
November 10, 2021 10:02
7fd3978
to
be8ed79
Compare
fwiesel
force-pushed
the
hypervisor_per_esxi
branch
from
December 1, 2021 08:45
be8ed79
to
8864d38
Compare
fwiesel
force-pushed
the
hypervisor_per_esxi
branch
from
December 21, 2021 10:17
8864d38
to
244fb1f
Compare
fwiesel
force-pushed
the
hypervisor_per_esxi
branch
from
December 30, 2021 10:38
244fb1f
to
67d41f4
Compare
fwiesel
force-pushed
the
hypervisor_per_esxi
branch
from
January 12, 2022 15:48
67d41f4
to
e77d660
Compare
fwiesel
force-pushed
the
hypervisor_per_esxi
branch
from
March 8, 2022 09:33
e77d660
to
3d4a22d
Compare
Whiile nova already cleaned up its original image-cache, our patches also added images as templates into the mix that were not cleaned up prior to this patch and thus filling up the datastores. Change-Id: I2fc631d6ce0a9339be9237d63ae7e86d94779dcc (cherry picked from commit 97780bb) (cherry picked from commit ce5f026)
If we sort the by creation-time, we can stop looping over tasks once we find an event that's too old. All other VMs, we didn't find an event for, yet, must be expired anyways. Change-Id: I196bb9ab48867f314d9a5f3b7566384fa72778df (cherry picked from commit 642bcc7) (cherry picked from commit 2190382)
Since we want to keep images in the owner's folder again instead of uploading them to all projects using them, we need to also make sure we clean up everywhere - even in projects not having instances. Change-Id: I9344a9bb8c0436cb5c87f3328fa3b9d31dedbdbc (cherry picked from commit 943babe) (cherry picked from commit b1af943)
Since every nova-compute node will be running the image-cache cleanup, we have to make sure this works in parallel. Therefore, we limit the cleanup of image-cache templates to the datastores the nova-compute node is responsible for - the ones configured in the datastore_regex config option. Change-Id: I5fae822a08bcc06565c64f959553cf7082bb2423 (cherry picked from commit 0d8d760) (cherry picked from commit 5f21c7d)
Even with image-as-template disabled we have to run the image-template cleanup to ensure removal after the setting is switched off and because we switched all our image-upload to using those image-templates. Change-Id: I924fbc42f014ecf4c0342246d48f27b6ba1d5c77 (cherry picked from commit 44b8620) (cherry picked from commit 7331446)
Every image we upload automatically ends up as template-VM. We can leverage this in normal deployments and copy the VMDK from the image-template, if it still exists. If it doesn't exist anymore, we can try to copy another image-template from another datastore and use its VMDK instead - that should be still faster than copying stuff from glance again. If all fails, we turn back to the original approach of loading the image from glance. Change-Id: I659d22e494a86fe4e567306062784313432d11ee (cherry picked from commit cfa3eb8) (cherry picked from commit a0d9db9)
Until now, the logs during a clean-up showed "Destroyed VM" messages, but one could not find out, which VM was destroyed. Therefore, we add an additional log message to point out that and which image-template VM we're going to destroy. Change-Id: I7429fca0175ec4593689466be6dcc0cb2482cb9f (cherry picked from commit 557e34e) (cherry picked from commit 2d6f122)
There are cases, where we don't want to blindly try and delete a VM, but instead know the outcome for certain types of errors at least. For supporting this case, vm_util.destroy_vm() has to re-raise certain types of exceptions. In the case we're looking for, that's the "InvalidArgument" fault. For this to work like before, code calling destroy_vm() needs to catch these exceptions, too. Change-Id: I28d2f1b94b8439cfea88146746ae6e59d61f087c (cherry picked from commit e112c5e) (cherry picked from commit 73a9a27)
While we don't know why, yet, we've seen image-cache templates missing their directory on the datastore. For the image-template cleanup this means, that Nova cannot destroy those templates, as this raises an InvalidArgument fault on VMware's side, tellung us that ConfigSpec.files.vmPathName is invalid. Since we need to clean those templates up anyways to create a new, usable one, we catch those errors and call UnregisterVM as a fallback. For getting templates from another datastores or re-using the already existing template to copy its disk, we have to catch FileNotFoundException. If this situation occurs, we have to clean that broken template up and also let nova continue the search for a source of the disk for the new VM. Change-Id: Id6d1cc1cd7a50958c77a1417e3f2aed7b9672e15 (cherry picked from commit 1b342cd) (cherry picked from commit 13940c5)
If the DRS policy is not set to "fullyAutomated", i.e. if DRS will not take action on existing, running VMs, we have not change to free up a host and thus don't have to try to find one. Instead, we error out to tell nova-bigvm to search for another cluster. Change-Id: Idbdfe82b4057844401e710fb9d87141478bb3353 (cherry picked from commit e98d6ae) (cherry picked from commit 0e8f17e)
Support for the E1000 driver is phasing out in guest operating systems, so we switch to E1000e, which is also faster and has more hardware offloading capabilities. Change-Id: I08ac32f914a57d3eb7328351a07a20a2ef212cb8 (cherry picked from commit 5e7556d) fix unit tests for vmware: Switch default VIF model (cherry picked from commit 425070f) (cherry picked from commit a3b173a)
Original code can be found at: https://opendev.org/x/nova-mksproxy Integrate the code into nova as a builtin command. Subclass `base.SecurityProxy` into `MksSecurityProxy`. Because MKS' `connect_info.internal_access_path` contains more than just a `path` (namely a JSON object with "ticket", "cfgFile" & "thumbprint") add a new `parse_internal_access_path()` method in `NovaProxyRequestHandler`. This method tries to parse `internal_acess_path` as JSON and if that fails, puts the contents in the "path" key of a new `internal_access_path_data` dict. Co-authored-by: Johannes Kulik <[email protected]>
Has to be "nova-{NAME}proxy".
The old script had a --verbose flag that we are using in the k8s deployment definition for all regions. This will include both Xena and older Rocky deployments. The new name of this flag is --mks-verbose. Instead of adding an image version check in the helm-chart, add an alias for the flag instead, for the duration of the update, and revert this after the Xena upgrade is through everywhere.
Physical CPUs are faster than hyperthreaded CPUs. So by default VMware spreads a VM's CPU cores onto physical cores, spanning multiple NUMA nodes if needed although the vCPUs would fit onto a smaller number of NUMA nodes. HANA VMs prefer low-latency (and thus NUMA-locality) over raw performance, so they set the VMware config `preferHT` to not get spread out. Before this setting was only applied if the VM's flavor qualified as a "Big VM" flavor (i.e. >1GiB) in size. This excluded smaller HANA flavors and a single-NUMA-node VM got spread over two nodes. Make `preferHT` depend on whether the flavor requires one of our `CUSTOM_NUMASIZE_*` traits instead.
Another best-practice recommendation for HANA-on-VMware. The code pretty much duplicates the one for setting the DRS partially- automated behavior. I decided against deduplication and abstraction in favor of readability and at the cost of LOC.
Testing showed that single- and half-numa-node VMs get spawned across multiple NUMA nodes due to `numa.autosize.vcpu.maxPerVirtualNode` being too low. Set the value explictly to the same value that's used for the `CoresPerSocket` cluster-vm setting.
Introduced in 'vmware: Restart BigVMs with "high" priority', the function "update_cluster_das_vm_override()" did not work, because it created a "ClusterDasVmConfigInfo" instead of a "ClusterDasVmConfigSpec" object. This lead to us not being able to spawn BigVMs with the following error: Exception in ReconfigureComputeResource_Task. Cause: Type not found: 'operation' Change-Id: If9acf9ee07e373b7b24c14c642d0d99fe2a41db1
Copy the relevant code from the libvirt driver. This reintroduces the code that was changed (presumably by accident) in fd4f43b VmWare: Refactored resource and inventory collection Fixup that commit with this one when forward porting!
`remove_unused_original_minimum_age_seconds` along with several other config options related to image cache management were moved to their own config group `image_cache` in Ussuri (nova 21), but the cherry- pick in 9de18bd "vmware: Update expired image-template-VM handling" failed to account for this.
In daeeafd "baremetal quota handling" the `_build_per_flavor_limits()` method is incorrectly rewritten to match surrounding code, esp. `_build_absolute_limits()`, and uses not only the limit, but limit and in_use as value for the resulting limit dict. The original commit, 16857ed "reintroduce baremetal quota handling" used `absolute_limits` directly as method parameter.
For historic reasons, there is a mapping between the volume-uuid and the device[].backing.uuid in vmware through extraConfig. In most cases they are the same, but sometimes they are not. And apparently, the extraProperties can even hold wrong information. Since the risk of a collision of uuids is quite low, and lookup is the iteration over a small list of volumes we first try it that way, avoiding possibly incorrect information in the extraConfig. Failing that, we still do the additional api call to get the mapping, and try that. Work around type errors mypy raises in the changed file Change-Id: Ifcdf96cfc6d00473299c1f2a7cb9d23d03294027
The ironic-python-agent uses qemu-img to convert various image formats to the raw disk format. QCow2 is already correctly marked accordingly, but vmdk was missing Change-Id: Ifd868e6951452b291184bd848d4244d37649f1ed
We need to seperate the arguments for tox from the ones to the linter Change-Id: I6d7ee1d95f0ca17fa6c04a3546a7f907cc6393f1
Nested `ClusterDasVmSettings` inside new `ClusterDasVmConfigInfo` objects don't get created properly. This seems to have worked on Rocky but fails on Xena.
Otherwise will try and decode as UTF-8-encoded text during read().
We have added custom filters and weighers for our common use-case VMware, and excluded there already baremetal. For libvirt as additional hypervisor, we also want to skip those filters/weighers, so we test now if the request is scheduled for vmware either due to a flavor extra_spec or an image property If nothing is set, we default to vmware for backwards compatibility Change-Id: I3223aee433ba9009d2cd6387eeda8e70dbbb3cde
The code assumes that the libvirt version reported by the hypervisor matches the version of the python library installed. Since we run the agent in a container, that is not correct, and the attribute will not be in the library as it has been build for an older version. Change-Id: I156047b1f9829a49b429d51ca7f7777606b10a56
Change-Id: I3edf0d3c6deb8385adc1095c7e3c09526e2d4d65
This command detects changes in the flavors and syncs the instances flavor-info with the new changes. `nova-manage sap sync_instances_flavor`
The vcenter doesn't like a reconfiguration call with an empty spec, and raises an error. So we skip that and save ourselves from doing some work on top. Change-Id: I1da3309500a2cd384c5d7cd431e71297ef5d5a3a
You can configure now [vmware] hypervisor_mode=<cluster,esxi_to_cluster,cluster_to_esxi> to chose to expose the individual esxi hosts instead of the cluster. It will only show the esxi-host, as hypervisor node, but will not honour the node parameter for placeing the instance for instance-creation, migration, etc... Change-Id: I264fd242c0de01ae8442c03bc726a0abfbe176ef
The code assumes that there is a single compute-node per host, which is incorrect for ironic. By summarising over all hosts, we get a correct response also for compute-hosts with more than one compute-node. Change-Id: Iaf6e2a72f6649e234660de47bec8b1da1ea1571e
fwiesel
force-pushed
the
hypervisor_per_esxi
branch
from
March 31, 2023 14:02
3d4a22d
to
30fc702
Compare
that's quite few commits. is this meant for xena? |
The PR was meant for rocky, but I've updated the branch for xena. |
fyi, it should be possible to change the target branch within the existing PR -- but all good, let's continue in #423. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Currently, the code only collects stats for the different nodes... Now I have to look how to represent it to placement.
After that, the question is, how to migrate the VMs between the hypervisor nodes.