Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Hypervisor per ESXi #233

Closed
wants to merge 7,522 commits into from
Closed

WIP: Hypervisor per ESXi #233

wants to merge 7,522 commits into from
This pull request is big! We’re only showing the most recent 250 commits.

Commits on Aug 8, 2022

  1. vmware: image template cache expiration

    Whiile nova already cleaned up its original image-cache, our patches
    also added images as templates into the mix that were not cleaned up
    prior to this patch and thus filling up the datastores.
    
    Change-Id: I2fc631d6ce0a9339be9237d63ae7e86d94779dcc
    (cherry picked from commit 97780bb)
    (cherry picked from commit ce5f026)
    imitevm authored and joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    814cf1f View commit details
    Browse the repository at this point in the history
  2. vmware: Optimize image-template aging by sorting VC task list

    If we sort the by creation-time, we can stop looping over tasks once we
    find an event that's too old. All other VMs, we didn't find an event
    for, yet, must be expired anyways.
    
    Change-Id: I196bb9ab48867f314d9a5f3b7566384fa72778df
    (cherry picked from commit 642bcc7)
    (cherry picked from commit 2190382)
    imitevm authored and joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    eedfd0f View commit details
    Browse the repository at this point in the history
  3. vmware: Manage image cache expiration for all projects

    Since we want to keep images in the owner's folder again instead of
    uploading them to all projects using them, we need to also make sure we
    clean up everywhere - even in projects not having instances.
    
    Change-Id: I9344a9bb8c0436cb5c87f3328fa3b9d31dedbdbc
    (cherry picked from commit 943babe)
    (cherry picked from commit b1af943)
    imitevm authored and joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    2dedd2a View commit details
    Browse the repository at this point in the history
  4. vmware: Make HistoryCollector python3-ready

    We need to support Python3 in the future and this is a low-hanging
    fruit.
    
    Change-Id: I86804af29f4c6aff14fa5495e5344377488ba8fe
    (cherry picked from commit 95c177f)
    (cherry picked from commit bce80c6)
    imitevm authored and joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    9522589 View commit details
    Browse the repository at this point in the history
  5. vmware: Validate VM name against convention during image-cache cleanup

    We want to make sure we only clean up image-cache templates and no other
    VMs that might happen to lie in the "Images" folder.
    
    Change-Id: I55b7fa7ebbe14b13f579ebc39dcae2549ddedc9a
    (cherry picked from commit 67fda8f)
    (cherry picked from commit d18fb11)
    imitevm authored and joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    3ed2aaa View commit details
    Browse the repository at this point in the history
  6. vmware: Only clean image-templates from the local DS

    Since every nova-compute node will be running the image-cache cleanup,
    we have to make sure this works in parallel. Therefore, we limit the
    cleanup of image-cache templates to the datastores the nova-compute node
    is responsible for - the ones configured in the datastore_regex config
    option.
    
    Change-Id: I5fae822a08bcc06565c64f959553cf7082bb2423
    (cherry picked from commit 0d8d760)
    (cherry picked from commit 5f21c7d)
    joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    a547a7d View commit details
    Browse the repository at this point in the history
  7. vmware: Unconditionally perform image-template cleanup

    Even with image-as-template disabled we have to run the image-template
    cleanup to ensure removal after the setting is switched off and because
    we switched all our image-upload to using those image-templates.
    
    Change-Id: I924fbc42f014ecf4c0342246d48f27b6ba1d5c77
    (cherry picked from commit 44b8620)
    (cherry picked from commit 7331446)
    imitevm authored and joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    580f210 View commit details
    Browse the repository at this point in the history
  8. vmware: Keep image-templates in owner project folder

    Otherwise, we might copy the same image to the datastore multiple times,
    just because VMs from multiple projects are deployed from that (shared)
    image.
    
    Change-Id: I02721da655ec505d7c3c5d3c9faca1be77dce813
    (cherry picked from commit 95b6176)
    (cherry picked from commit aea1268)
    Mitev authored and joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    d58216f View commit details
    Browse the repository at this point in the history
  9. vmware: Rewrite _fetch_image_if_missing

    Every image we upload automatically ends up as template-VM. We can
    leverage this in normal deployments and copy the VMDK from the
    image-template, if it still exists. If it doesn't exist anymore, we can
    try to copy another image-template from another datastore and use its
    VMDK instead - that should be still faster than copying stuff from
    glance again.
    
    If all fails, we turn back to the original approach of loading the image
    from glance.
    
    Change-Id: I659d22e494a86fe4e567306062784313432d11ee
    (cherry picked from commit cfa3eb8)
    (cherry picked from commit a0d9db9)
    joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    91cfe13 View commit details
    Browse the repository at this point in the history
  10. vmware: Log message when destroying image template

    Until now, the logs during a clean-up showed "Destroyed VM" messages,
    but one could not find out, which VM was destroyed. Therefore, we add an
    additional log message to point out that and which image-template VM
    we're going to destroy.
    
    Change-Id: I7429fca0175ec4593689466be6dcc0cb2482cb9f
    (cherry picked from commit 557e34e)
    (cherry picked from commit 2d6f122)
    joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    17e51f9 View commit details
    Browse the repository at this point in the history
  11. vmware: Let destroy_vm() re-raise InvalidArgument fault

    There are cases, where we don't want to blindly try and delete a VM, but
    instead know the outcome for certain types of errors at least. For
    supporting this case, vm_util.destroy_vm() has to re-raise certain types
    of exceptions. In the case we're looking for, that's the
    "InvalidArgument" fault. For this to work like before, code calling
    destroy_vm() needs to catch these exceptions, too.
    
    Change-Id: I28d2f1b94b8439cfea88146746ae6e59d61f087c
    (cherry picked from commit e112c5e)
    (cherry picked from commit 73a9a27)
    joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    0f73003 View commit details
    Browse the repository at this point in the history
  12. vmware: Handle image-templates without data on datastore

    While we don't know why, yet, we've seen image-cache templates missing
    their directory on the datastore.
    
    For the image-template cleanup this means, that Nova cannot destroy
    those templates, as this raises an InvalidArgument fault on VMware's
    side, tellung us that ConfigSpec.files.vmPathName is invalid. Since we
    need to clean those templates up anyways to create a new, usable one, we
    catch those errors and call UnregisterVM as a fallback.
    
    For getting templates from another datastores or re-using the already
    existing template to copy its disk, we have to catch
    FileNotFoundException. If this situation occurs, we have to clean that
    broken template up and also let nova continue the search for a source of
    the disk for the new VM.
    
    Change-Id: Id6d1cc1cd7a50958c77a1417e3f2aed7b9672e15
    (cherry picked from commit 1b342cd)
    (cherry picked from commit 13940c5)
    joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    949f16c View commit details
    Browse the repository at this point in the history
  13. vmware: Check default DRS policy in special_spawning

    If the DRS policy is not set to "fullyAutomated", i.e. if DRS will not
    take action on existing, running VMs, we have not change to free up a
    host and thus don't have to try to find one. Instead, we error out to
    tell nova-bigvm to search for another cluster.
    
    Change-Id: Idbdfe82b4057844401e710fb9d87141478bb3353
    (cherry picked from commit e98d6ae)
    (cherry picked from commit 0e8f17e)
    joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    b71e96c View commit details
    Browse the repository at this point in the history
  14. vmware: Switch default VIF model

    Support for the E1000 driver is phasing out in guest operating systems,
    so we switch to E1000e, which is also faster and has more hardware
    offloading capabilities.
    
    Change-Id: I08ac32f914a57d3eb7328351a07a20a2ef212cb8
    (cherry picked from commit 5e7556d)
    
    fix unit tests for vmware: Switch default VIF model
    
    (cherry picked from commit 425070f)
    (cherry picked from commit a3b173a)
    joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    65e3c1a View commit details
    Browse the repository at this point in the history
  15. vmware: Handle incompatible hardware-version when cling image-template

    When we try to fetch an image-template from another datastore, it might
    happen, that the template has an incompatible hardware version and the
    vcenter raises VirtualHardwareVersionNotSupported if we try to clone it
    to our cluster.
    
    We handle this case now by logging a debug message and continuing with
    the next image-template we find, as this one is unusable for us.
    
    Change-Id: If9dc9b2a13171252e5f0f0b3a99a51be2f28c6eb
    (cherry picked from commit d0ed453)
    (cherry picked from commit b5f59b7)
    joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    ba877df View commit details
    Browse the repository at this point in the history
  16. bigvm: Move checking/cleaning of bigvm providers into method

    We have to check a lot and when adding more, we will get a flake8
    warning that our function got too long. So we move the checking/cleaning
    of existing providers into its own function for better overview.
    
    Change-Id: I5ceb4d9338f6d94f49cc2deff25eefb19df2030f
    (cherry picked from commit 1700192)
    (cherry picked from commit 848d78c)
    
    bigvm: Don't add a candidate if we don't have providers
    
    When we filter out providers by used cluster percentage, we used to add
    an empty list of providers to the candidates. It probably ended in not
    running any of the loops we do over candidates later on, but that's
    still unneccessary.
    
    Change-Id: Ie033332436674f4fe792f4aa3f83f33b12a6d9ed
    (cherry picked from commit 47ceb57)
    (cherry picked from commit 027d419)
    joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    5aa2b8e View commit details
    Browse the repository at this point in the history
  17. bigvm: Add trait to disable provider for bigVMs

    We might want to disable a cluster for bigVMs for different reasons -
    one being a currently running Upgrade of all ESXi hosts. When setting
    the trait "CUSTOM_BIGVM_DISABLED" on a resource-provider, nova-bigvm
    will not use this cluster for finding a host to spawn bigVMs.
    
    Change-Id: I36813ed3d95fd8572c6b75544ebb2fc1936f6bdb
    (cherry picked from commit 5fc69b9)
    (cherry picked from commit 2a996e4)
    joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    6046521 View commit details
    Browse the repository at this point in the history
  18. [SAP] fix powervm's test_driver_capabilities unit test

    TestPowerVMDriver.test_driver_capabilities was failing because a
    new capability introduced by SAP in the ComputeDriver.
    This fixes the assertion done by powervm, so that it will assume
    the PowerVMDriver capabilities are included in the ComputeDriver
    capabilities and no equal to it.
    
    (cherry picked from commit 69ffaa1)
    mariusleu authored and joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    7d59236 View commit details
    Browse the repository at this point in the history
  19. vmware: Set necessary flags for > 128 vCPUs

    With vSphere 6.7 VMware started supporting giving more than 128 vCPUs to
    a VM. For this to work, the VM needs to boot via UEFI, the CPUs have to
    be split up to multiple cores with an image propery (e.g.
    hw:cpu_cores='2'), the hardware version of the VM needs to be at least
    "vmx-15" (image property vmware:hw_version='vmx-15') and we need to set
    the vvtdEnabled flag when creating the VM. This patch does the last
    part, because that's only possible to do in Nova.
    
    We also set virtualMmuUsage to 'automatic' as VMware complained in our
    tets, that the MMU needs to be enabled and the manually-created VM for
    testing had this in their config.
    
    Change-Id: I4aabad2c5254f9fa5b83d47bc34675d90c431535
    (cherry picked from commit c49cb94)
    joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    99b410f View commit details
    Browse the repository at this point in the history
  20. console_auth_token: allow pre-existing 'path' query parameter for novnc

    For noVNC 1.1.0 or newer, there must be a 'path' query parameter only,
    instead of 'token'.
    This patch allows 'path' query parameter being present in the
    novncproxy_base_url configuration. In that case, we just append
    the token to it.
    
    (cherry picked from commit 68d48f6)
    mariusleu authored and joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    8d12381 View commit details
    Browse the repository at this point in the history
  21. Fix project limits calculation.

    Cast Decimal to Int in _get_counts, similar to _get_counts_in_db.
    
    SQLalchemy func.sum() returns Decimal on MySQL. When the usage limits
    are calculated unless explicetely converted to Int we end up with
    Decimal objects that cannot be JSONified and thus causing
    ValueError: Circular reference detected.
    
    (cherry picked from commit 163a80f)
    galkindmitrii authored and joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    fdbeddd View commit details
    Browse the repository at this point in the history
  22. shellinabox: add support for db tokens

    Rocky introduces database tokens and deprecates the nova-consoleauth
    service. Thus, shellinabox needs to access the tokens from the
    database as well.
    This still supports the old nova-consoleauth by enabling it via
    the [workarounds]/enable_consoleauth.
    
    (cherry picked from commit 421cd8e)
    leust authored and joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    a9630a4 View commit details
    Browse the repository at this point in the history
  23. vmware: Add option for setting default hw_version

    This is necessary with multi-version clusters where migrating e.g. on
    resize needs to be possible.
    
    Since the vmware driver was explicitly overriding the hw_version
    attribute on ExtraSpecs and didn't use __init__(), we still ended up
    with no hw_version if the flavor didn't set one.
    
    Change-Id: Idc287c4dfa2b5d6a6a837a5014063417c8e13768
    (cherry picked from commit 09b2547)
    joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    a8d23fa View commit details
    Browse the repository at this point in the history
  24. tools: Allow check-cherry-picks.sh to be disabled by an env var

    The checks performed by this script aren't always useful to downstream
    consumers of the repo so allow them to disable the script without having
    to make changes to tox.ini.
    
    Change-Id: I4f551dc4b57905cab8aa005c5680223ad1b57639
    (cherry picked from commit 610396f)
    (cherry picked from commit d2ee27d)
    lyarwood authored and joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    3ab670c View commit details
    Browse the repository at this point in the history
  25. Handle sharding-enabled in scheduler shard filter

    If the project tags from keystone contain the tag "sharding_enabled"
    then the hosts in _all_ shards will pass the shard filter for this
    project.
    
    This was done to facilitate both enabling sharding (only one simple
    tag to set), and mainly for frontend code to detect sharding status
    (mostly) without parsing tag strings. (If sharding is not enabled,
    then vc-* tags will have to be parsed to find out which shard(s) the
    project is on.)
    
    (cherry picked from commit 01014bc)
    grandchild authored and joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    729f5e2 View commit details
    Browse the repository at this point in the history
  26. Don't call detach without attachment_id and other attachments

    When messages time out between nova-api and nova-compute, it can happen,
    that there's a block-device-mapping that never saw an attachment and
    wasn't rolled back either. If such a VM is deleted and the volume got
    attached to another VM in the mean time, a detach call to Cinder without
    an attachment_id - which cannot exist, because the attachment never got
    that far - would delete the attachment for the other VM.
    
    We now search for a volume-attachment in Cinder if no attachment_id was
    given. If we don't find one for our instance, but there are attachments,
    on the volume - which should then belong to other instances - we assume
    we ran into above's case and don't call detach.
    
    Change-Id: I6c6fad88e93fd788e3df1c942fed763c0ad0414f
    (cherry picked from commit 8a3b9d7)
    joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    172dcd4 View commit details
    Browse the repository at this point in the history
  27. vmware: Optionally attach disconnected serial ports

    We want to be able to configure new VMs with our currently-disabled vSPC
    services, because it doesn't seem to be possible to reconfigure a serial
    port on a running VM - other than setting it to connected and
    startConnected. Therefore, we add a config option.
    
    Change-Id: I0b9d7a7d1445c2017756146068e287628a39bec6
    (cherry picked from commit ed2ac5d)
    joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    f663f04 View commit details
    Browse the repository at this point in the history
  28. bigvm: SchedulerReportClient._ensure_resource_class got renamed

    We need to call _ensure_resource_classes instead.
    
    Change-Id: I2329db43adc00e04b07d16d312361ae5e669d298
    (cherry picked from commit 4e5413a)
    joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    d963430 View commit details
    Browse the repository at this point in the history
  29. vmware-metrics: Port to newer Monitorbase

    The newer Monitorbar requires the child classes to implement
    "populate_metrics()" instead of "get_metric()", which should create
    MonitorMetric objects now, appending them to the supplied list.
    
    Change-Id: I534d63f4b4888da2b59b2a10211482e1449f2901
    (cherry picked from commit 400cdd2)
    joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    e6452f6 View commit details
    Browse the repository at this point in the history
  30. VMware: Fix empty vim.get_properties_for_..._objects

    If the obj_list parameter to the
    vim_util.get_properties_for_a_collection_of_objects()
    function is empty, an empty list was returned. The non-error
    path returns a vim.RetrieveResult object, with an iterable
    "objects" attribute.
    
    Introduce a dummy class "EmptyRetrieveResult" which behaves
    like vim.RetrieveResult for the empty case, and return an
    instance if the list of objects to be queried is empty.
    
    (cherry picked from commit f509fae)
    grandchild authored and joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    2e2b2fe View commit details
    Browse the repository at this point in the history
  31. manage: Support Ironic in sync_aggregates

    "nova-manage placement sync_aggregates" doesn't support Ironic hosts,
    because they have multiple compute nodes. In general, Nova doesn't
    support aggregates for Ironic hosts. But we still use them for assigning
    an availability zone to a rack (a building block) of nodes grouped into
    a Nova host.
    
    Therefore, we patch above-mentioned command to handle a list of (in most
    cases one) compute node UUIDs, only erroring out if the host we find
    multiple compute nodes for doesn't look like an Ironic host (based on
    having "ironic" in the name).
    
    Change-Id: I4f7e5fd82c51ce5d6f42089beb5a70e469ec54df
    (cherry picked from commit 481d398)
    joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    96e5a61 View commit details
    Browse the repository at this point in the history
  32. vmware: Set large VMs as partiallyAutomated in DRS

    We don't want them to move anymore. This might hinder our efforts to
    spawn big VMs, but the nanny is supposed to help us here.
    
    Change-Id: I5ebdbe2f287d50c9b9a755166702c3eccea62b14
    (cherry picked from commit 2da895f)
    joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    5e5372f View commit details
    Browse the repository at this point in the history
  33. Alias flavor names from catalog:alias extra_spec

    When a flavor has an extra_spec key "catalog:alias", make an
    additional alias flavor available in the flavor list with the
    extra_spec value as name for that additional flavor.
    
    The aliased flavor's flavorid is prepended with a configurable prefix
    (default is 'x_deprecated_') in order to have a unique id, with a
    straightforward removal process, without any further DB lookups.
    
    A nova config option 'flavorid_alias_prefix' is introduced for this
    purpose.
    
    The 'x_' part of the prefix default is chosen to sort flavor aliases
    toward the end of the flavor list, to decrease visibility.
    
    This change enables renaming and phasing out flavors. Aliased flavors
    appear when listing available flavors, but they don't actually exist
    and get automatically converted into their actual flavor counterparts
    on flavor show and when creating servers.
    
    So, when inspecting a flavor by name or deploying a server with a
    flavor by name the actual flavor is shown or used for server creation,
    respectively.
    
    Allow multiple flavor aliases for single flavor
    During renaming/restructuring of flavors, multiple old flavors get
    mapped to a single new flavor. But 'catalog:alias' on the new flavor
    previously only accepted a single alias pointing to the deprecated
    flavor.
    
    The interpretation of 'catalog:aliases' extra_spec is changed so that
    the value can be a single alias name or a comma-separated list of
    multiple names.
    
    Aliased flavorids gain an index suffix (even if only a single alias
    was given) so that flavorids are still unique.
    Example:
        "30" -> "x_deprecated_30_0"
    If multiple alias names are given, the next aliased flavorid will be
    "x_deprecated_30_1" and so forth.
    
    (cherry picked from commit a6e4b32)
    grandchild authored and joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    31ad08a View commit details
    Browse the repository at this point in the history
  34. bigvm: Create RPs without parent

    In Nova's rocky version, placement learned to handle nested providers in
    a better way and supports them from version 1.29 on - which isn't in
    rocky. Upstream thus added code to filter out nested providers from the
    response. The problem with that is, that we modeled our
    CUSTOM_BIGVM-providing resource providers as children of the
    nova-compute-bb* provider. Therefore, our big VM spawning code cannot
    find an allocation candidate anymore.
    
    We now switch to created the "child" RPs as individual RPs instead. For
    identifying their "parent", we now use the aggregate assigned to them.
    
    Change-Id: Ic9d707c59a4ea405f3a982dbe269cdfea0d03aa5
    (cherry picked from commit 65d29ed)
    joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    5143fd8 View commit details
    Browse the repository at this point in the history
  35. vmware: Add Mirror-instance-logs-to-syslog config

    After the instance is deleted in VSphere its "vmware.log" files are
    deleted along with it, which hinders post-mortem debugging.
    
    Add a config flag "mirror_instance_logs_to_syslog" to forward all
    instance logs to the syslog monitoring service, to be able to inspect
    them after the instance is gone.
    
    Refer to [0] for reference on how this is done in VSphere.
    
    [0] https://blogs.vmware.com/management/2020/10/configure-a-vms-vmware-log-file-to-send-messages-to-vrealize-log-insight.html
    
    (cherry picked from commit 82a31fd)
    grandchild authored and joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    ca95cc2 View commit details
    Browse the repository at this point in the history
  36. [memreserv] Reserve memory for certain flavors

    VMs with reserved memory have better memory allocation
    performance and -- it's suspected -- less softlock issues
    too. Coincidentally, many larger VMs also have high
    performance requirements that puts strict demands on the
    quick availability of their memory.
    
    Implement memory reservation and a configurable maximum
    number of cluster hosts that could theoretically fail and
    still let all VMs with memory reservation boot up.
    
    Nova provides the flavor extra_spec
        "quota:memory_reservation"
    which reserves the given amount of memory. This change does
    not make use of that feature, but instead introduces a
    parallel custom resource
        "CUSTOM_MEMORY_RESERVABLE_MB".
    The reason is that "quota:memory_reservation" does not allow
    for limiting the total amount of reservable memory in the
    same way the resource provider mechanics do. This is a
    requirement for tolerating the above-mentioned partial host
    failures, which can occur because a VMware host is a cluster
    of hypervisors.
    
    Add a "vm_reservable_memory_ratio" value to the cluster
    stats in the VMware driver, and use that to calculate and
    return the MEMORY_RESERVABLE_MB resource from the VMware
    driver's get_inventory().
    
    (cherry picked from commit 93e22ef)
    grandchild authored and joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    d38faca View commit details
    Browse the repository at this point in the history
  37. [bigvm] Don't free host when too much reserved RAM

    The logic in BigVmManager to decide whether a cluster is
    able to free another BigVM spawn-host is changed from only
    using the total percentage of used memory to using the
    amount of reserved memory as well.
    
    (cherry picked from commit 5b05204)
    grandchild authored and joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    37145f7 View commit details
    Browse the repository at this point in the history
  38. [bigvm] Split memory overuse trigger log messages

    This makes it possible to see which caused the deallocation of a bigvm
    host.
    
    (cherry picked from commit 67e5c23)
    grandchild authored and joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    cfe0dfa View commit details
    Browse the repository at this point in the history
  39. [bigvm] Fix reserved memory usage lookup key

    (cherry picked from commit 88443e5)
    grandchild authored and joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    a434e56 View commit details
    Browse the repository at this point in the history
  40. [memreserv] Only use flavor-reserved-memory if not all reserved

    Before, the resources:CUSTOM_MEMORY_RESERVABLE_MB flavor setting
    (which can be only partial) superseded the
    CONF.vmware.reserve_all_memory setting and BigVM memory reservation.
    
    Invert this to use CUSTOM_MEMORY_RESERVABLE_MB only if none of the
    above apply.
    
    (cherry picked from commit 9a3d778)
    grandchild authored and joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    9a2c7ab View commit details
    Browse the repository at this point in the history
  41. [memreserv] Prohibit reserving more memory than the flavor has

    Clamp to a maximum of the flavor memory.
    
    (cherry picked from commit bd2d012)
    grandchild authored and joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    f9c8f01 View commit details
    Browse the repository at this point in the history
  42. bigvm/hostsizefilter: Pass filter if NUMA trait required

    If one of the new CUSTOM_NUMASIZE_* host traits is required on a
    (BigVM) flavor, then the host was already matched in placement, and
    the filter can succeed early.
    
    This is a temporary measure until we phase out host-fraction filtering
    altogether when the old BigVM flavors are disabled.
    
    (cherry picked from commit fdb127d)
    grandchild authored and joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    7cae532 View commit details
    Browse the repository at this point in the history
  43. vmware: Add constants for DRS-created vCLS VMs

    Using these values, one can identify a vCLS VM by looking at the
    "managedBy" attribute of the VM's config in vSphere.
    
    These VMs are special as they're created by VMware's DRS service as an
    agent VM and any actions done by the user will be reverted by the
    cluster automatically. See [1] for more details on the why.
    
    [1] https://blogs.vmware.com/vsphere/2020/09/vsphere-7-update-1-vsphere-clustering-service-vcls.html
    
    Change-Id: I31d1ece3fa514ca42a3ccc1b348da3763b1b1388
    (cherry picked from commit 7e0649c)
    joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    6850bbb View commit details
    Browse the repository at this point in the history
  44. vmware: Ignore vCLS VMs in special spawning

    Those VMs came in with vSphere 7 and DRS doesn't move those VMs, even if
    they violate a DRS rule. Therefore, we would never see a host getting
    freed up, if it contained a vCLS VM. They take up 100 MiB of reserved
    RAM and thus should fit next to a big VM. Therefore, we ignore them.
    
    Change-Id: I737f312db0e156fa971a189d47efd227c666b178
    (cherry picked from commit d403461)
    
    vmware: Fix special spawning vCLS detection
    
    Not all VMs have a "config.managedBy" attribute ...
    
    Change-Id: I0fc0b4d0c8027dd6b2c45060597cbadb60f0d649
    (cherry picked from commit ae038d5)
    joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    b17ae57 View commit details
    Browse the repository at this point in the history
  45. vmware: Query out managedBy info in special spawning

    This fixes the vCLS detection again, as we just wrote code without ever
    querying out the appropriate attribute. Therefore, it couldn't detect
    vCLS VMs at all.
    
    Change-Id: Ic425e5a0d2178afb6764cb0c84f372c2bb67908a
    (cherry picked from commit 07891e7)
    joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    8debad3 View commit details
    Browse the repository at this point in the history
  46. bigvm: Skip getting RPs without MEMORY_MB resources

    In-buildup clusters show up without complete resource inventory, i.e.
    without memory resources. Log and skip these resource providers.
    
    Also add debug-logging when skipping due to missing reservable memory
    resource. This was the only "continue" in the else section without
    logging. This complicated finding out why exactly resource providers
    are skipped.
    
    (cherry picked from commit c356138)
    grandchild authored and joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    57feebb View commit details
    Browse the repository at this point in the history
  47. Optimize quota:separate query

    Probably with switching to mariadb, the query got too slow in bigger
    regions, because mariadb joins the instance_extra and the instances
    table, which have more than 100.000 rows in a bigger region. Since we
    only need one instance_extra entry per missing flavor, we force mariadb
    to use a temporary table to join against instance_extra, which is much
    faster it only contains as many rows as we're missing flavors.
    
    Attention: This code probably doesn't run on PostgresSQL anymore,
    because "GROUP BY" doesn't behave the same way there.
    
    Change-Id: If0e95cd1d62c00490dc86ca6273e07f8d2fd98ac
    (cherry picked from commit 963b107)
    joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    fc3ec69 View commit details
    Browse the repository at this point in the history
  48. shard_filter: debug log for ironic nodes without VC aggregate

    We don't expect ironic nodes to be in a vc-* host aggregate,
    therefore we log it to debug instead of error for such nodes.
    
    (cherry picked from commit fe7893e)
    leust authored and joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    4bbfed8 View commit details
    Browse the repository at this point in the history
  49. Add filter for reserving RAM for resize

      A new filter filters out a new deployment and prevents deployment
      in the cluster if the used RAM goes over a certain threshold,
      which is a new option `resize_reserved_ram_percent`.
    
    (cherry picked from commit 6360dfa)
    Mike Durnosvystov authored and joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    a573ee3 View commit details
    Browse the repository at this point in the history
  50. vmware: Do not reraise exception on DRS override removal

    We're currently unable to remove a DRS override as our SOAP library is
    unable to create an appropriate request accepted by the vCenter.
    Therefore, we still allow the resize to work for now and have to
    manually remove the override later on.
    
    We keep the code in, so we get a reminder in the logs and sentry, that
    this needs fixing. We just don't fail the resize for it.
    
    Change-Id: I4d344347860c7d97d6f4b2e68d9bbac069d71b74
    (cherry picked from commit d7b157f)
    joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    a21c4a0 View commit details
    Browse the repository at this point in the history
  51. Vmware: Remove print statements

    (cherry picked from commit 73b3789)
    fwiesel authored and joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    c072462 View commit details
    Browse the repository at this point in the history
  52. VmWare: DRY getting the client factory

    Follow the same pattern as in the other functions
    Set the local variable client_factory to the client factory
    of the current session and use that in the function
    
    (cherry picked from commit f7a984c)
    fwiesel authored and joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    0053f1c View commit details
    Browse the repository at this point in the history
  53. VmWare: Logging cache miss with PropertyCollector enabled

    The logic is inverted: Only if use_property_collector is active, we actually expect
    the cache to contain sensible values for the vm-state. So, only then it makes
    sense to log a cache miss
    
    (cherry picked from commit f7bdd71)
    fwiesel authored and joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    dd9e67c View commit details
    Browse the repository at this point in the history
  54. Image owner atribute may not be set

    The image-meta-data passed to a live-migration doesn't have the attribute set
    Accessing it inconditionally will cause an exception
    
    (cherry picked from commit 2360db9)
    fwiesel authored and joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    780e169 View commit details
    Browse the repository at this point in the history
  55. VMware: Try to recover from outdated vm-refs

    All vm related api calls work with ManagedObject-References (morefs),
    while openstack works with instance uuids.
    In order to avoid having to call vsphere to map the instance uuid to such a moref,
    the driver keeps a cache. The implementation assumes that those morefs are stable
    However, the operator can unregister and reregister a VM, which would cause the moref to change.
    
    Any operation on the previous moref would cause a ManagedObjectNotFoundException.
    
    We retry function annotated with the decorator vm_ref_cache_heal_from_instance:
    Removing a stale entry from the cache will either result now in recovering
    a new moref or raising InstanceNotFound, which is properly handled
    
    (cherry picked from commit a953c29)
    fwiesel authored and joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    84d21eb View commit details
    Browse the repository at this point in the history
  56. VmWare: Don't repeat yourself with hardware devices

    The code was getting the hardware devices in multiple places directly,
    and then (often, but not always) normalising the array.
    
    Functions getting the device array then also had to normalise the array again.
    
    By consolidating the retrieval and the normalisation in a function,
    the code becomes less repetitive
    
    (cherry picked from commit a8f3d01)
    fwiesel authored and joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    34f4adc View commit details
    Browse the repository at this point in the history
  57. Vmware: Make livemigration hv-version check optional

    Vsphere can migrate just fine from newer to older vcenters,
    but the live-migration pre-check tests for that.
    
    It is more a work-around to make an option for that,
    as a proper solution would delegate that to the driver.
    
    (cherry picked from commit 3653f52)
    fwiesel authored and joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    339cbc1 View commit details
    Browse the repository at this point in the history
  58. VmWare: Create ServiceLocatorSpec

    For cross-vcenter migrations (live or not), we need to create
    ServiceLocatorSpec. This change provides the required functions
    
    (cherry picked from commit 92bf534)
    fwiesel authored and joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    ba45d8a View commit details
    Browse the repository at this point in the history
  59. Vmware: Use WithRetrieval to get all results

    In various places, own version of iterating over the results are implemented,
    sometimes even faulty.
    The following functions where only getting up to vmware.maximum_objects objects (100 by default)
    vm_util.get_all_cluster_mors, vm_util.get_stats_from_cluster,
    _SpecialVmSpawningServer._get_vms_on_host, _SpecialVmSpawningServer.free_host
    
    Only in _SpecialVmSpawningServer._get_vms_on_host we might get over the 100 items,
    and actually have missed ones.
    
    Previously, the results were fetched in batches of up to vmware.maximum_objects items.
    Using WithRetrieval yields an iterator to the results, which pages transparently to
    the next request.
    Receiver of the output of the results where changed to consume an iterator, where easily
    possible.
    
    Replace the quadratic algorithm in `ds_util._filter_datastores_matching_storage_policy`
    with one of O(n log(n)) runtime
    
    (cherry picked from commit fcabb04)
    fwiesel authored and joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    b3451ee View commit details
    Browse the repository at this point in the history
  60. Vmwareapi: Split out getting hosts from get_stats_from_cluster

    This extracts getting the hosts and reservations from a cluster in
    get_stats_from_cluster into its own function get_hosts_and_reservations_for_cluster.
    
    For cross-vcenter vmotion, we need to specify hosts, and we want
    the same ones as we use elsewhere.
    
    (cherry picked from commit 05cd96b)
    fwiesel authored and joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    ccbf9c0 View commit details
    Browse the repository at this point in the history
  61. Vmware: Refactor server groups

    Previously vmops was calling the private function
    `vm_util._get_server_groups` and that function would access to nova.object.
    All other code seem to work more the way, that the nova.object retrieval is
    responsibility of a VmOps (or VolumeOps) instance. `vm_util` only works
    with vmware-objects (plus passed instances).
    
    Additionally, `update_cluster_placement` was called in
    `VMwareVMOps.build_virtual_machine`, which is also called for creating
    a virtual machine for a vm-template out of an image.
    In that case, the server-groups of the instance requiring the image
    would be wrongly applied to the vm-template of the image.
    
    (cherry picked from commit 475fdb1)
    fwiesel authored and joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    93de60b View commit details
    Browse the repository at this point in the history
  62. Use tox to run flake8

    pre-commit uses the system flake8, which might be newer and cause
    warnings, which are partly in conflict with the old one.
    So run the pinned one of the project instead
    
    Change-Id: I8a854268d3c7ea8d885915105917041430871010
    (cherry picked from commit 3d81c49)
    fwiesel authored and joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    f0101a2 View commit details
    Browse the repository at this point in the history
  63. api: Pre-query not deleted members in server groups

    When retrieving multiple - or all - server groups, the code tries to
    find not deleted members for each server group in every cell
    individually. This is highly inefficient, which is especially noticable
    when the number of server groups rises.
    
    We change this to query all members of all server-groups we will reply
    with (i.e. from the already limited list) in advance and pass this set
    of existing uuids into the function formatting the server group. This is
    more efficient, because we only do one large query instead of up to 1000
    times the number of cells.
    
    Change-Id: I3459ce7a8bec9a9e6f3a3b496a3e441078b86af0
    (cherry picked from commit e676dd1)
    joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    afb751e View commit details
    Browse the repository at this point in the history
  64. VmWare: Remove unused legacy_nodename regex

    The regex isn't used anywhere, and uses an unescaped format
    
    Change-Id: I76aaf133af517eb70fcaf3783953625c63141083
    (cherry picked from commit cebfe18)
    fwiesel authored and joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    a6afd82 View commit details
    Browse the repository at this point in the history
  65. Vmwareapi: Fix some linting issues

    Order of imports
    Duplicate imports
    Spelling mistake
    Indentation
    
    Change-Id: I4ff5594b0a628fee9579761248627099b3f251b8
    (cherry picked from commit 0f68df4)
    fwiesel authored and joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    acae56c View commit details
    Browse the repository at this point in the history
  66. vmware: Restructure vm- and host-group functions in cluster_util

    We still have to old `_create_vm_group_spec()` to get the same behavior,
    but nobody outside of `cluster_util` uses it. The new functions
    `create_vm_group()`, `create_host_group()` and `create_group_spec()`
    expose a clearer interface and make it possible to overwrite hosts/vms
    instead of just appending to existing ones. This will be useful if
    customers can upgrade server-groups via API.
    
    Change-Id: I5444318994ac7929a24d357fabb8133410d5bd9d
    (cherry picked from commit 27a9012)
    joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    915b4cf View commit details
    Browse the repository at this point in the history
  67. vmware: Refactor VM rule creation in cluster_util

    This splits up functionality for creating rules between VMs into
    `create_vm_rule()` and `create_rule_spec()` and thus enables us to use
    it more controlled also from the outside, e.g. in the upcoming sync loop
    for server-groups.
    
    Since DRS ignores the "mandatory" attribute for VM-VM rules, we remove
    setting it here, so it doesn't look like changing the value would make a
    difference.
    
    Change-Id: Ib515e02226e674d0f7cbdc3c354ade5cd77a0b8c
    (cherry picked from commit 9386f11)
    joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    70eb9b0 View commit details
    Browse the repository at this point in the history
  68. vmware: _list_instances_in_cluster() returns morefs

    Sometimes it's helpful to get the moref of a VM together with a list of
    properties. Since we support getting properties in
    `_list_instances_in_cluster()`, we now also support returning the moref
    with it.
    
    Change-Id: Ie10b95c53595db62131789a8ada1c81b7a662780
    (cherry picked from commit bcc7f60)
    joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    36d3970 View commit details
    Browse the repository at this point in the history
  69. vmware: Warmup moref cache on startup

    When starting up, we start with an empty cache and thus need to query
    the vCenter for every instance we handle. To optimize this, we query the
    instances in bulk and update the cache once at startup.
    
    Change-Id: I56d746f79f1303bcf2f9ec3f66ec8b770b0e6e1c
    (cherry picked from commit 7c7bd31)
    joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    e84a0eb View commit details
    Browse the repository at this point in the history
  70. vmware: Move is_vim_instance() to vim_util

    It's more suited there - or even better in oslo.vmware - and we want to
    use it in cluster_util, which would result in a cyclic import if we keep
    it in vm_util.
    
    Change-Id: I6ed4006d568b0cd3614965b18af5a2927bf12728
    (cherry picked from commit fc17a44)
    joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    518bec1 View commit details
    Browse the repository at this point in the history
  71. vmware: Add function to retrieve all DRS groups of a cluster

    We will need this in the sync-loop after customers are enabled to change
    server-group memebers via API.
    
    Change-Id: I5e7f27251b6f2c09002445d6c374f887864ea19f
    (cherry picked from commit 3f27feb)
    joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    68cf471 View commit details
    Browse the repository at this point in the history
  72. vmware: Add function to retrieve all DRS rules of a cluster

    We will need this in the sync-loop after customers are enabled to change
    server-group members via API.
    
    Change-Id: Id2fbaa67b799e370331472ecccc79d99dd07e01f
    (cherry picked from commit eafa794)
    joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    fc3b930 View commit details
    Browse the repository at this point in the history
  73. Vmware: Query Drs state directly in driver

    No need for an additional indirection, as the driver is the only
    user of the information, and query the value directly
    
    Change-Id: I6d44910c420de4c76b6112904ccfebe3ec923098
    (cherry picked from commit ffd3160)
    fwiesel authored and joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    42a8f09 View commit details
    Browse the repository at this point in the history
  74. vmware: Save the ComputeNode host name for later

    When init_host() is called, we keep the provided host as an attribute so
    we can later refer to it when e.g. getting all instances for the host
    the driver is running for.
    
    Change-Id: I261006a727125a87c204564f95ec8797060cd557
    (cherry picked from commit 598016f)
    joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    e32f206 View commit details
    Browse the repository at this point in the history
  75. scheduler: Add HypervisorSizeMixin

    This moves out the method for getting and caching hypervisor sizes into
    its own class, that's supposed to be used as mixin by anybody that needs
    it.
    
    We keep the cache as single instance shared between everybody using the
    mixin, so we keep the requests to placement minimal. There shouldn't be
    much of a problem with concurrent access other than 2 threads updating
    the same value if they happen to be running at the same time. Since the
    value we're caching here is basically static, we don't need to change
    the retention time on per mixin-user basis.
    
    Change-Id: If7a3e49fad0061fcab4fe73cc792ca3b66a94003
    (cherry picked from commit 81cee5c)
    joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    38069c6 View commit details
    Browse the repository at this point in the history
  76. scheduler: Add VmSizeThresholdFilter

    This filter allows us to define a threshold for small VMs, so that only
    they are allowed to spawn on hypervisors up to a certain size - also
    defined by threshold.
    
    This filter is necessary to guard against few bigger VMs clogging the small
    hypervisors and thus forcing smaller VMs to big hypervisors. Having too
    many VMs on a single hypervisor can be problematic and there fit much
    more small VMs onto a big hypervisor.
    
    Change-Id: Idecbe624384ca3bf323ed53d98978791d04c25cb
    (cherry picked from commit 8977dc6)
    joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    7015a2d View commit details
    Browse the repository at this point in the history
  77. scheduler: Add HvRamClassWeigher

    This weigher takes a configurable list of static weights, that are
    assigned to a RAM class. We define a RAM class by its upper bounds, i.e.
    1024 means all HVs up to 1024 GiB of memory, not fitting in any class
    below.
    
    We need this weigher to prefer scheduling to certain HV sizes, while
    still making it possible to schedule to others if there aren't enought
    preferred HVs available.
    
    Co-authored-by: Fabian Wiesel <[email protected]>
    
    Change-Id: Ia042e466544b73b8dd15ee7231d3baf1da6069a1
    (cherry picked from commit 6d50363)
    joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    bd221f1 View commit details
    Browse the repository at this point in the history
  78. scheduler: HypervisorSizeMixin uses oslo.cache

    Instead of using our own cache implementation, which only cleared the
    cache 10 minutes after the last write to, we switch to oslo.cache's
    DictCacheBackend. We gain code-reuse and a retention time per entry.
    
    Change-Id: I302ebea93dfe30eb72c9a0dfe42f5e8c956f228a
    (cherry picked from commit b61765f)
    joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    1bf9fd0 View commit details
    Browse the repository at this point in the history
  79. vmware: Make volume attach and extraConfig update atomic

    We used to do 2 reconfigure calls for a vmdk attach, which could lead to
    inconsistencies if the second - the extraConfig update - did not
    succeed. For the VMware driver, the volume looked not attached to the
    server then and thus wouldn't get detached later on.
    
    We now change this procedure to attach the vmdk and add the extraConfig
    entry at the same time, thus making both fail or none of them - same for
    the detach case.
    
    Since we only need this for vmdks, attach_disk_to_vm() and
    detach_disk_from_vm() only do it if the right parameters were supplied.
    
    The underlying functions then only update a given config_spec instead of
    creating a new one and reconfiguring the cluster. We could remove that
    part as the changed code-path was the only consumer.
    
    Besides producing less inconsistencies, this should also make the
    volume-attachment/-detachment process a little faster, because it
    requires only a single reconfigure task instead of two.
    
    Change-Id: Icce0c39aaed523ce4e5df97d130a7a14cfabb9c5
    (cherry picked from commit 655a044)
    joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    97407bb View commit details
    Browse the repository at this point in the history
  80. vmware: Use a prefix in DRS groups and rules

    We want to create a sync-loop for applying/removing server-groups
    changed by the user via API. For this we need to be able to distinguish
    between driver-created DRS groups/rules and admin-created ones. To do
    this, we introduce a prefix for the DRS group/rule name, which will work
    as an identifier later on.
    
    Since we're now using not only a UUID, but a UUID with a prefix, we
    change GroupInfo to have a "name" attribute instead of a "uuid"
    attribute.
    
    As we're changing how DRS groups/rules look, we need a migration to run
    before deploying this to production.
    
    Change-Id: I07ecd1953a85d0f53082fa9b0c49b80c2c9bf9d3
    (cherry picked from commit 7984cae)
    joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    cf63411 View commit details
    Browse the repository at this point in the history
  81. vmware: Remove no longer used "fetch_cluster_properties()"

    It was previously used when deleting empty DRS groups.
    
    Change-Id: I82612decf3938285c43f5e97abb48da349ad3fba
    (cherry picked from commit 8e0836d)
    joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    77a26bd View commit details
    Browse the repository at this point in the history
  82. vmware: Don't add VmGroups for server-groups

    The DRS rules created by Nova do not use those VmGroups and they don't
    seem to server a purpose otherwise. Therfore, we only update/create a
    VmGroup that's not based on a server-group, i.e. an admin-defined group.
    We currently only have this kind of group for the special_spawning
    case to support a free host for big VMs.
    
    Change-Id: Ide011e157ad46037304cfdb52b1db397dde38cc8
    (cherry picked from commit 8c3640b)
    joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    ee0b59e View commit details
    Browse the repository at this point in the history
  83. vmware: Remove cleaning of empty DRS groups

    Since we don't create new ones, we also don't have to keep track of
    empty ones. Instead, we will just delete all the VmGroups once via an
    external script.
    
    Change-Id: I88baeadcea3a7bfe946596b169bc3abe6798d9d6
    (cherry picked from commit 89dcef0)
    joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    7c65bd9 View commit details
    Browse the repository at this point in the history
  84. Add a way to trigger a server-group sync on the driver

    We're going to change the API to allow updates of server-group members
    and need to trigger a sync in the backend, whenever such a change
    occurs. Therefore, we need a bunch of methods going through the whole
    stack to call the driver.
    
    We use a cast and not a call here, because we cannot let the API wait
    for all of this to happen. If the cast gets lost somehow, a sync-loop
    implemented in the driver will pick the change up, eventually
    
    The API will have to supply a list of hosts to call, so we only sync the
    group on the necessary hosts.
    
    Change-Id: I00e012ed52ba9fd36b094ecf2dc86b023f2f5a21
    (cherry picked from commit f3d2ad7)
    joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    7613013 View commit details
    Browse the repository at this point in the history
  85. vmware: Add DRS rule lifecycle helpers

    These functions are simple helper functions to create, update and delete
    a DRS rule, so we have to code necessary for that at a single place.
    
    Change-Id: I44aeed6f99b9803adca0062b2d7b12cc2e295f03
    (cherry picked from commit 8c1fbcf)
    joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    c29bd90 View commit details
    Browse the repository at this point in the history
  86. vmware: Implement sync_server_group()

    The VMware driver supports syncing server-groups as DRS rules into the
    cluster managed by the nova-compute node. The method will be called when
    a user updates a server-group via API and might get reused when spawning
    a VM, too.
    
    We might be able to optimize it a little more, by keeping a local list
    of DRS rules instead of querying the cluster in real-time. Tests have
    shown, that it takes < 500ms to query the cluster, though.
    
    Change-Id: I534c035a1e2d962cf5d187d56d104e743f7ade15
    (cherry picked from commit 52f42fc)
    joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    ff25aea View commit details
    Browse the repository at this point in the history
  87. vmware: Add a DRS rule sync-loop

    In case we missed an update to a server-group, we want to be able to
    recover at some point in time. Therefore, we implement a sync-loop to
    call the driver's sync_server_group() for every server-group UUID we
    find as belonging to our host and also for every DRS rule we find.
    
    Change-Id: I9a633dc87ad1aab7d5f00e5143fac97dd3b87176
    (cherry picked from commit 583b1fd)
    joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    c82b239 View commit details
    Browse the repository at this point in the history
  88. vmware: Use sync_server_group() in VM lifecycle

    We can use sync_server_group() in update_cluster_placement() to use the
    same mechanism that's used when a user edits a server-group in the DB.
    This makes sure, that a DRS rule is only created, if it has more than 1
    members and also syncs in the other member in case the newly-spawned
    instance is the second member.
    
    Change-Id: I62c1ae3d897f5ccda8788a4d4b23e553be8bc5cf
    (cherry picked from commit 2dd55ea)
    joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    3687a8e View commit details
    Browse the repository at this point in the history
  89. vmware: Set _compute_host on VMwareVMOps

    We need to know the compute host to query the instances from the API for
    sync_server_group(), so we save it onto the VMwareVMOps instance when
    the VMwareVCDriver instance receives it in the init_host() call.
    
    Change-Id: I473539000dde4629e2a251cd9145c8047ce60a41
    (cherry picked from commit 22ab4cb)
    joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    45a7a60 View commit details
    Browse the repository at this point in the history
  90. vmware: Exclude instances in certain states in sync-server-group

    When instances are going away during the sync, this can lead to an
    error. We want to avoid that.
    
    Additionally, VMs conatined in a DRS rule cannot be vMotioned, so we
    want to exclude them, too. This makes it easier for the live-migration
    code to update the cluster appropriately.
    
    Change-Id: I8bc94aebebcd878f60c33fe009f048afeb9a42c0
    (cherry picked from commit 17f48bf)
    joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    56ba32f View commit details
    Browse the repository at this point in the history
  91. vmware: server-group sync-loop handles exceptions

    Since we only start the sync-loop for server-groups once on startup, we
    have to make it resistent against any exceptions - e.g. if the DB is
    unreachable for a time or the vCenter is unreachable at the time.
    
    Change-Id: I74036b6bfc449b407f687afac1ba4365bcbdc2ee
    (cherry picked from commit 2120075)
    joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    b904cdc View commit details
    Browse the repository at this point in the history
  92. vmware: Do not sync "soft-affinity" server-groups

    The replaced codde in cluster_util already ignored server-groups having
    a policy of "soft-affinity", so we have to do it in the new
    sync_server_group(), too.
    
    The main reason is, that we wanted to give customers the possibility to
    end up on the same cluster, without the need to end up on the same host,
    too. We also cannot implement the "soft" part currently and  spawning a
    VM will fail if the host is too full, as there are no non-mandatory
    VM-to-VM rules.
    
    Change-Id: I745ac6616eefd193ce8c7a9a5cba3c68fc59ac75
    (cherry picked from commit 65558c7)
    joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    fcfd8f3 View commit details
    Browse the repository at this point in the history
  93. Add public method to remove members from InstanceGroup

    This function wasn't previously available, because - as the removed
    comment says - there was no user-facing API that would allow removal of
    instance group members. Since we plan on changing the API, we need to
    add a pulic method.
    
    Adding this method, we also need to send notifications like we do for
    "add_members()" and thus added the appropriate functionality.
    
    Change-Id: I4270212b57782e5ffeaf69dc3bd57c7c60a7ffe5
    (cherry picked from commit df3bd90)
    joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    8b89110 View commit details
    Browse the repository at this point in the history
  94. api: server_group's _get_not_deleted returns hosts

    We exted "server_groups._get_not_deleted()" function to return a
    dictionary of instance UUID to host mapping. This is a preliminary step
    to introducing an update of the server-groups, which will need both the
    instance UUIDs and the hosts to filter out to-be-removed instances and
    check the validity of the policy when adding instances.
    
    We cannot use "InstanceGroup.get_hosts()" for this, as it's not
    cell-aware and thus extend this function instead.
    
    TODO:
    	* tests
    
    Change-Id: I253ef54560c2422baec187b350f05b1b2affc34e
    (cherry picked from commit a94f6f7)
    joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    69d4241 View commit details
    Browse the repository at this point in the history
  95. api: Add an update method for server-groups

    This new API endpoint allows to add and/or remove members from a
    server-group. We found this to be necessary, because instances might get
    created without a server-group, but later need an HA-partner and
    re-installing would mean downtime or too much effort.
    
    The endpoint checks the validity of the policy is still given after all
    changes are done and strives for idempotency in that it allows
    removal/addition of already removed/added instance uuids to accommodate
    for requests built in parallel.
    
    TODO:
    	* api-request docs
    
    Change-Id: I30d5d1dc3a41553b4336aad3877018989159495c
    (cherry picked from commit 7220be3)
    joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    850ef7b View commit details
    Browse the repository at this point in the history
  96. objects: Add get_by_instance_uuids() to InstanceGroupList

    Searching for a list of instance groups belonging to a list of instances
    can be helpful for checking if a server-group update is valid and maybe
    if a nova-compute wants to sync down server-groups like the VMware HV is
    able to.
    
    Change-Id: Iec93becf0299ec0617e99ce16c06e37c84cb33ee
    (cherry picked from commit 838142b)
    joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    3dfcdfb View commit details
    Browse the repository at this point in the history
  97. api: Servers in a server-group cannot join another one

    This changes the server-group update API method to not allow a server to
    join a second server-group if it already joined one. The whole change to
    the server-group is errored out if any of the servers are already in
    another group to make it easier in validation and either apply the whole
    change requested by the user or nothing.
    
    Change-Id: I6a06e4b4a8c22737c20ab51e85ceb2bf98082b26
    (cherry picked from commit 02a4c82)
    joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    c4a87c7 View commit details
    Browse the repository at this point in the history
  98. vmware: Fix getting rules/groups in cluster_util

    The attribute group/rule gets removed by the underlying SOAP library if
    it's empty. Therefore, we have to check for it to exist instead of
    trying to iterate over it unconditionally.
    
    Change-Id: I443338bf97dcf83478e0a9971179480ecb01c009
    (cherry picked from commit 6d984f6)
    joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    9c1e4a1 View commit details
    Browse the repository at this point in the history
  99. Vmware: Disable DRS after migration

    If a VM moves across clusters, the DRS override for
    the cluster will be gone. So we need to set it again
    
    Change-Id: Ic5d010de95f194ea660f10805c46ad43762bd83e
    (cherry picked from commit 69e181a)
    fwiesel authored and joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    9d4176f View commit details
    Browse the repository at this point in the history
  100. special_spawning: Use get_moref_value()

    We're going to switch the backing SOAP library of oslo.vmware at some
    point in time and should already use the compatibility layer when
    accessing ManangedObjectReference attributes.
    
    Change-Id: I1b18a7b7db0452a10f0adc499be2df26d923f936
    (cherry picked from commit c45fa56)
    
    special_spawning: Also consider empty  hosts
    
    When clusters get build up, they don't contain VMs on all hosts, yet. We
    failed to consider these hosts as possible targets, because our list of
    candidates was created from the list of VMs. To fix that, we now
    retrieve all hosts of the cluster first and pre-populate the dict with
    hosts before retrieving the VMs.
    
    Change-Id: Ibd576211ac33f38ef0e1b9016381a955916d7c1c
    (cherry picked from commit 2bf8738)
    
    special_spawning: Remove safety check for no returned VMs
    
    With building up new capacity and dedicating whole compute nodes to big
    VMs, we can end up in a situation where a cluster contains no VMs, yet,
    and us not allowing to spawn new ones because of that check. Therefore,
    we remove the check.
    
    Change-Id: I2883c5c713ee657006c80d61ebc59a086ec22411
    (cherry picked from commit 3c8bd73)
    joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    0c174c7 View commit details
    Browse the repository at this point in the history
  101. Vmware: Refactor update_cluster_placement

    After the recent changes of syncing server-groups,
    the syncing of rules in cluster_util.update_placement has become unused.
    
    The code can be simplified as a preparation for live-migration,
    as it will add complexity.
    
    VMops.update_cluster_placement now calls
    - sync_instance_server_group, which fetches _the_ server-group
      for the instance and syncs it with the existing VMOps.sync_server_group
    - update_admin_vm_group_membership handles the membership to special_spawning_vm_group
      Since that is the only remaining use-case of cluster_util.update_placement,
      the function has been renamed to update_vm_group_membership and reduced to
      that use-case
    
    Change-Id: I94c311790610fc4658f9b06ac052ad46660f1cea
    (cherry picked from commit a05999b)
    fwiesel authored and joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    499e90a View commit details
    Browse the repository at this point in the history
  102. ResourceTracker: Fetch Bdms up front

    The resource tracker iterates over all instances in
    _update_usage_from_instances, where it will the fetch
    for each instance the block-device-mappings in an rpc call.
    
    By fetching the block-device-mappings up-front,
    we get a single rpc/db call, instead of several small ones,
    where the latencies then add up
    
    If a block-device-mapping list cannot be retrieved for an instance,
    it will fall back to the old behaviour and fetch it individually.
    
    Change-Id: Ice9ab0a9c1b783c059687e1d992eea1f97cb3193
    (cherry picked from commit 6caf72a)
    fwiesel authored and joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    4ab1c82 View commit details
    Browse the repository at this point in the history
  103. Vmware: Sync server-group during migration

    During a migration, we have to consider the following situations:
    
    - On the source-host, we have to remove the group constraints
      as soon as the migration has started, in order to avoid that
      the constraints disallow the movement
    - On the destination-host, we have to add the rules as soon
      as the vm is in the vcenter, before the instance.host has been
      updated. Otherwise, we might remove rules added by the migration
      itself.
    
    The parameters to specify the cluster and the host are in
    preparation for the migration-task, which needs to call
    the sync for the source-host on the destination-host
    
    Change-Id: I2b3c626ecf4a33c3baa20489b66bb7e6b69459b6
    (cherry picked from commit 81ba1e5)
    fwiesel authored and joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    d9aa19b View commit details
    Browse the repository at this point in the history
  104. Vmware: Handle missing propSet in list_vms

    If a vm has none of the requested properties,
    propSet will not be set. So, we need to skip over that instance
    
    Change-Id: Ia633fecb021bffe1557820e36c33ef53cf90db83
    (cherry picked from commit 6e8f9c7)
    fwiesel authored and joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    97ca614 View commit details
    Browse the repository at this point in the history
  105. Vmware: Add option to remove vm from vm-group

    In order to migrate a vm (within a vcenter) we need
    to be able to remove a vm from a vm-group constraining
    it to a set of hosts
    
    Change-Id: I8ef5ebdc54b3c3de0310e461132828aa251ee657
    (cherry picked from commit e2a08bd)
    fwiesel authored and joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    513071e View commit details
    Browse the repository at this point in the history
  106. Vmware: Handle deleted image graceful

    When migrating or resizing an instance, the driver looks
    for vsphere locations properties stored with the image.
    We can't do that though, if the image has been deleted,
    so we fall back gracefully.
    
    Change-Id: I55c4d1f49e3c6fc0bb89794ac1b44da99ce009ca
    (cherry picked from commit 39b4f57)
    fwiesel authored and joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    b761821 View commit details
    Browse the repository at this point in the history
  107. Vmwareapi: Live-migration changes on conductor level.

    Split out the changes for live-migration which are affecting the conductor
    Basing it on the upstream VMWareLiveMigrateData 1.0 version in the hope
    that we have an easier path merging the code with upstream eventually
    
    Change-Id: I5d8417c836d735122e033eeb72f7671b2558afc4
    (cherry picked from commit a2c335f)
    fwiesel authored and joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    6789c3d View commit details
    Browse the repository at this point in the history
  108. vmware: Fix special_spawning using _cluster_ref

    This fixes an AttributeError in _SpecialVmSpawningServer that was
    introduced in Ibd576211ac33f38ef0e1b9016381a955916d7c1c, where we access
    self._cluster_ref instead of self._cluster.
    
    Change-Id: I43d3d060f7c51841a8561a29d3189e37e4f87fb1
    (cherry picked from commit 6de3153)
    joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    7b8380c View commit details
    Browse the repository at this point in the history
  109. bigvm: HANA hosts need no thresholds

    When we mark resource-provider with the CUSTOM_HANA_EXCLUSIVE_HOST
    trait, only hana_* flavors can spawn on them. For these nova-compute
    nodes, we want to allow spawning of large/big VMs until the hosts are
    full. Since memory is reserved, we make sure there's enough failover
    available by having >= 2 failover hosts in the cluster.
    
    Change-Id: Iaa1d18eb0a3e78bf1e361c8e8d1040aa07344448
    (cherry picked from commit de76a14)
    joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    6514f03 View commit details
    Browse the repository at this point in the history
  110. bigvm: Refactor provider skipping in _check_and_clean_providers

    We used to keep multiple dictionaries around for the different reasons
    of marking providers for deletion, but we never used those individually
    and they cluttered the code with multiple ifs to skip over already
    marked providers.
    
    To fix this, we keep a single dictionary for providers to be deleted.
    
    Change-Id: Ie96a41bd9151e869fabd18d69777f38db85d0ca6
    (cherry picked from commit d9010c1)
    joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    d28f066 View commit details
    Browse the repository at this point in the history
  111. bigvm: Delete providers that don't have a valid parent

    If we change settings or marked a resource-provider for bigvms manually,
    it can happen, that we end up with a bigvm provider without a vmware
    provider. This led to KeyError and nova-bigvm not continuing to free up
    a bigvm host.
    
    We fix this by marking all bigvm providers for deletion, that don't have
    a matching vmware provider available.
    
    Change-Id: I1eec50a5b1b6206e5b3eab8d5f9fa891fecb4b25
    (cherry picked from commit a46e311)
    joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    5dd6c4a View commit details
    Browse the repository at this point in the history
  112. vmware: Use moref value for logging cluster_ref

    When we log the cluster moref in getting the list of instances, we log
    the whole moref and thus create multiple newlines in the log-line. To
    have all data in a single line, we now only log the moref value.
    
    Change-Id: I89d3908abdffe60570cec947e9a74b544488b535
    (cherry picked from commit 920d080)
    joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    3ede497 View commit details
    Browse the repository at this point in the history
  113. vmware: vm_ref cache healing handles empty cache

    If an instance cannot be found on VMware-side during an operation, we
    try to fetch the new moref and retry the operation. This can fail, if we
    passed in a moref explicitly and thus didn't look into the cache and the cache
    contains no value for the instance - this would raise an AttributeError.
    
    We fix this by checking the cache actually contained a value and return
    without retry if it didn't, as we cannot tell if the vm_ref passed
    belongs to the instance or not.
    
    	AttributeError: 'NoneType' object has no attribute 'value'
    	  ...
    	  File "nova/compute/manager.py", line 2975, in start_instance
    		self._power_on(context, instance)
    	  File "nova/compute/manager.py", line 2945, in _power_on
    		block_device_info)
    	  File "nova/virt/vmwareapi/driver.py", line 633, in power_on
    		self._vmops.power_on(instance)
    	  File "nova/virt/vmwareapi/vmops.py", line 1936, in power_on
    		vm_util.power_on_instance(self._session, instance)
    	  File "nova/virt/vmwareapi/vm_util.py", line 326, in wrapper
    		if obj != vm_ref.value:
    
    Change-Id: I093a0c6260cb19478c7c25c630e453dd77d39f40
    (cherry picked from commit 48416da)
    joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    a47d349 View commit details
    Browse the repository at this point in the history
  114. Vmware: Live-Migration of VMs

    Tested with attached VMs for old cinder api and new (3.44+)
    block-device-mapping.
    Requires patches to cinder to "migrate" the volumes between vcenter by creating
    an empty shadow-vm. Nova takes care of the rest
    - deleting the outdated shadow-vms
    - Attaching the volumes to the new shadow-vms
    
    PlaceVM api is called to chose the host, but the initial placement doesn't take
    the server groups into account, they are applied after the migration.
    
    BigVMs are missing the probably needed special treatment (emptying of the host)
    
    The code is written in a way, that if anything goes wrong during the migration,
    such as
    - the conductor
    - one of the compute nodes
    - one of the vcenter apis
    being offline/restarted, that simply a second migration can be started.
    
    The migration may happen in the backround of vcenter, and will be picked
    up when finished. The necessary book-keeping will be updated after the
    migration.
    
    The migration data will only be kept in memory, and will not be recovered on
    restart.
    
    Change-Id: If8c7265c53b64f00292e6689d1f6860ff29c671e
    (cherry picked from commit ee0d89c)
    fwiesel authored and joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    9daf84c View commit details
    Browse the repository at this point in the history
  115. vmware: Refactor get_info() for healing vmref cache

    When a VM gets reregistered in the vCenter, it changes its moref id. To
    catch this case, we previously introduced a decorator, that would react
    to ManagedObjectNotFoundException and retries the function without
    cached moref.
    
    We want the same to happen in VMwareVMOps.get_info(), as this function
    is called regularly to check the VM's state. To achieve this, we
    refactor get_info() a little to contain a local function matching the
    decorator's expected parameters.
    
    Change-Id: If798a3af4430a82dce9ef03a5ef097a215271b40
    (cherry picked from commit bb754e8)
    joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    bf2b2c7 View commit details
    Browse the repository at this point in the history
  116. Add a NetworkInfoAsyncWrapper serializer to raven

    When using python-raven to send exceptions to Sentry, the serialization
    might run into a deadlock if the exception happens during server build
    and the NetworkInfoWrapper object is not done.
    
    We mitigate this by registering our own serializer in raven, which does
    not go into the content, but just prints the greenthread.
    
    Change-Id: Ie170e951e4d8d007a48d5878ec957e2e95155628
    (cherry picked from commit f896d89)
    
    Fix NetworkInfoAsyncWrapper registration order
    As it turns out, the serializer for NetworkInfoAsyncWrapper we
    introduced in Ie170e951e4d8d007a48d5878ec957e2e95155628 and which's
    registration we already fixed in
    Ib6f436ded2481d99dc1b32c54974c37b94281b81 was never called, because the
    more generic IteratorSerializer is registered before it via the base.py
    of raven.utils.serializer. Since SerializationManager.register() always
    appends to the list, there's no way for us to let our serializer come
    earlier in the list than getting dirty. While we could monkeypatch our
    own implementation of a SerializationManager into the module, too, we
    rather just access private parts of the already existing
    SerializationManager and prepend our serializer to the list of
    serializers, as it's more specific than the others.
    joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    31e96dc View commit details
    Browse the repository at this point in the history
  117. vmware: Add helper to fetch DRS overrides

    This helper functions retrieves all DRS VM overrides of a cluster and
    provides them as a dict. We plan to use this in the special_spawning
    code.
    
    Change-Id: I091878d88b8545cb094b0f534f4fa57221c33719
    (cherry picked from commit 06cdb9e)
    joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    b57fb06 View commit details
    Browse the repository at this point in the history
  118. special_spawning: Ignore partiallyAutomated VMs

    With large VMs being set to partiallyAutomated, compute nodes stayed in
    "waiting" state more and more often and never came out of it. Since we
    still need to deploy big VMs to those compute nodes and they are usually
    just stopped from being free by a large VM that cannot move, we often
    mark those compute nodes free by hand. Since this doesn't scale, we want
    to automate this behavior by letting the nova-compute take the same
    decision.
    
    Therefore, if there are only partiallyAutomated VMs left on a host, we
    still report that host as freed up. This is done by ignoring all
    partiallyAutomated VMs during the check of running VMs on the host.
    
    Change-Id: I47feb6a34c0e210f0ebb0edd4479550750e605d7
    (cherry picked from commit 9d66fe3)
    joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    e6349a0 View commit details
    Browse the repository at this point in the history
  119. vmware: Specify volume profiles while snapshotting

    When at least one of the disks attached to the VM during snapshotting is
    associated with a profile containing storage IO control, the clone
    operation done during snapshotting fails with `VmConfigFault` and the
    more detailed "IO Filters are configured for the Source Disk
    vm-174314:2002, but no storage-policy selected for the destination.
    Select an appropriate storage-policy for destination disk.".
    
    We can mitigate this by specifying the profile in the RelocateSpec part
    of the CloneSpec, i.e. add an VirtualMachineRelocateSpecDiskLocator for
    each disk to be detached into the "location.disk" attribute. Setting
    the "profile" attribute on the VirtualDeviceConfigSpec we create for
    specifying the removal does not help.
    
    This will not add a profile for volumes attached before Cinder's queens
    release, as the "profile_id" was not part of connection_info before
    that.
    
    Change-Id: I2b71ef4a0b2ce79c287946dd15b7dc6af22439e9
    (cherry picked from commit 4e06287)
    joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    059030a View commit details
    Browse the repository at this point in the history
  120. special_spawning: Fix DRS rule not getting created

    The condition checking if the DRS rule should be created could never
    trigger, because we overwrite the checked "group" variable right before
    checking if it's None.
    
    We remove the condition, because there could be cases, where we had a
    leftover hostgroup around, but someone manually deleted the rule. In
    those cases, we should still make a request to create/edit the rule even
    if the hostgroup was not created, but just edited.
    
    Change-Id: Iac3b8c183f633bf5ddc59acf340b477bd1eb88cc
    (cherry picked from commit 36d3614)
    joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    3bd40aa View commit details
    Browse the repository at this point in the history
  121. special_spawning: Remove matching DRS rule when removing hostgroup

    DRS does not update its internal state for a rule, if the rule has
    become an invalid configuration. This effectively means, that the rule
    stays in place if we remove the hostgroup but not the rule using it.
    
    This commit changes the behavior to also remove all rules using the
    hostgroup, when removing the hostgroup used for special spawning.
    
    Change-Id: Ic57a71bc4e69c57833396690fc3fb5453aa122b3
    (cherry picked from commit 2cdc9d4)
    joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    002e51a View commit details
    Browse the repository at this point in the history
  122. api: Update RequestSpec when updating server-group membership

    When we add/remove a server to/from a server-group, we have to update
    the server's RequestSpec's instance_group attribute, because this is
    used during scheduling when resizing a server to record a list of
    appropriate hosts for the instance.
    
    	AttributeError: 'NoneType' object has no attribute 'hosts'
    	  File "oslo_messaging/rpc/server.py", line 166, in _process_incoming
    		res = self.dispatcher.dispatch(message)
    	  File "oslo_messaging/rpc/dispatcher.py", line 265, in dispatch
    		return self._do_dispatch(endpoint, method, ctxt, args)
    	  File "oslo_messaging/rpc/dispatcher.py", line 194, in _do_dispatch
    		result = func(ctxt, **new_args)
    	  File "oslo_messaging/rpc/server.py", line 229, in inner
    		return func(*args, **kwargs)
    	  File "nova/conductor/manager.py", line 94, in wrapper
    		return fn(self, context, *args, **kwargs)
    	  File "nova/compute/utils.py", line 1246, in decorated_function
    		return function(self, context, *args, **kwargs)
    	  File "nova/conductor/manager.py", line 298, in migrate_server
    		host_list)
    	  File "nova/conductor/manager.py", line 370, in _cold_migrate
    		updates, ex, request_spec)
    	  File "oslo_utils/excutils.py", line 220, in __exit__
    		self.force_reraise()
    	  File "oslo_utils/excutils.py", line 196, in force_reraise
    		six.reraise(self.type_, self.value, self.tb)
    	  File "nova/conductor/manager.py", line 339, in _cold_migrate
    		task.execute()
    	  File "nova/conductor/tasks/base.py", line 27, in wrap
    		self.rollback()
    	  File "oslo_utils/excutils.py", line 220, in __exit__
    		self.force_reraise()
    	  File "oslo_utils/excutils.py", line 196, in force_reraise
    		six.reraise(self.type_, self.value, self.tb)
    	  File "nova/conductor/tasks/base.py", line 24, in wrap
    		return original(self)
    	  File "nova/conductor/tasks/base.py", line 42, in execute
    		return self._execute()
    	  File "nova/conductor/tasks/migrate.py", line 174, in _execute
    		scheduler_utils.setup_instance_group(self.context, self.request_spec)
    	  File "nova/scheduler/utils.py", line 893, in setup_instance_group
    		request_spec.instance_group.hosts = list(group_info.hosts)
    
    Change-Id: Ic193dd3c59bc717ba5329f63054297f44127d76d
    (cherry picked from commit e01ef81)
    joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    b155b85 View commit details
    Browse the repository at this point in the history
  123. Skip recomputing baremetal quota usage

    We're using a single function to compute the usage of multiple quota
    reources. There's already a check in place that prohibits recomputing
    that data, if the resource's name is already in the data. When computing
    usage data, we use the instances belonging to a user/project. If that
    user/project does not have any baremetal instances of a flavor, we don't
    add the flavor's resource to the usage data. This means, the check is
    unable to skip recomputing the data.
    
    If "instances" is already in the usage data, our shared function must
    have been called already. Therefore, if we encounter a resource having a
    name starting with "instances_" - which should be only true for
    resources we created for baremetal flavors - we can skip the
    recomputation even if the resource's name is not in the usage data.
    
    (cherry picked from commit 3a820fd)
    joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    ce4f0de View commit details
    Browse the repository at this point in the history
  124. Optimize quota:separate query again

    Mariadb still doesn't do the best job in executing that query (takes
    ~3s currently). The subquery alone takes roughly 0.3s and if we use the
    result of that query (in my tests 5 UUIDs) in a new query's WHERE, it
    takes only 0.06s. This should make up for the additional round-trip to
    the DB.
    
    Change-Id: I73aa89b0b76a0620265fb20caf4a18eb1f5f8311
    (cherry picked from commit 6a60115)
    joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    0ef6d07 View commit details
    Browse the repository at this point in the history
  125. Vmware: No update_admin_vm_group_membership in post_live_migration

    Since the VM is gone after a migration,
    it won't be possible to update the vm-group membership
    
    Change-Id: I04c62f400a522bf9fe5828199c0dff80a1004f42
    (cherry picked from commit 2b0c7f4)
    fwiesel authored and joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    3541751 View commit details
    Browse the repository at this point in the history
  126. Vmware: Catch Exception in post-migration

    We have to catch all exceptions in the post-migration
    steps, otherwise a roll-back will be initiated which
    we cannot do properly as the VM has already been migrated
    with the vsphere api.
    
    Instead the VM will be set in error state as it will require
    a manual inspection and intervention by an operator.
    
    Change-Id: I75faecfdd48c9f40d243aecdd2b90b89e5158335
    (cherry picked from commit 93f6cc9)
    fwiesel authored and joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    e73ccda View commit details
    Browse the repository at this point in the history
  127. Vmwareapi: Handle missing details in ManagedObjectNotFoundException

    The attribute 'details' isn't always set,
    so we have to raise the exception without
    recovering the missing vm_ref
    
    Change-Id: Ib1fbd90e03a0f36fb5833c71ae6cdde454e74958
    (cherry picked from commit 31672d2)
    fwiesel authored and joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    8d38331 View commit details
    Browse the repository at this point in the history
  128. VMware: Split out VMwareAPISession

    The object is not only used by the driver,
    but in practically all modules of vmwareapi.
    
    It reduces a bit the scope of the driver module
    itself
    
    Change-Id: I76e446945c312e5b4fea54d04335d7d20ef3829d
    (cherry picked from commit 23eccae)
    fwiesel authored and joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    8dc9358 View commit details
    Browse the repository at this point in the history
  129. Let Nova migrate volumes between shards

    Until now, we handled shard-migration transparently in Cinder. This did
    not work well, if the migration took longer than Cinder's RPC-timeout -
    which is a given for volumes over a certain size.
    
    To address this, Nova will now initialize the migration and wait for it
    to finish. To help with that, Cinder gained a new endpoint to migrate
    volumes by connector - because we pass the vCenter UUID inside the
    connection_capabilities of it - and by returning a new error-message on
    attachment_update with HTTP error code 416, so Nova knows that the
    update failed because the volume is in another shard  - or rather
    assigned to another backend.
    
    Since we now have to migrate a volume, callers of attachment_update now
    have to provide a volume_id in addition to the previous parameters.
    
    Change-Id: I9f89f2887be6f5e2f2184cd771542007393af0dd
    (cherry picked from commit b691628)
    joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    136891b View commit details
    Browse the repository at this point in the history
  130. [cinder] Base migration timeout on volume size

    Instead of having a static timeout of 1 day for all volume sizes, we now
    compute a timeout based on the size of the volume and an assumed minimum
    speed that's configurable.
    
    Change-Id: I3896fdd2e368d60f75e48292af2ec201194316b3
    (cherry picked from commit d4b0802)
    joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    34d46c1 View commit details
    Browse the repository at this point in the history
  131. [SAP] Shard as volume creation scheduling-hint

    From the instance we can derive the shard and pass the value
    as a scheduling hint to cinder
    
    Change-Id: I81faa098634916b64af147d20427796036dd2cbb
    (cherry picked from commit 5ece029)
    fwiesel authored and joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    594318d View commit details
    Browse the repository at this point in the history
  132. cinder: Set global_request_id in migrate_by_connector

    Without setting this, it's hard to find the requests made to Cinder
    during the migration and thus harder to find out why the migration
    failed.
    
    Change-Id: I7b2cdb7fc750682f681a3457d2b5783423f896bf
    (cherry picked from commit ab28c80)
    joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    19659c5 View commit details
    Browse the repository at this point in the history
  133. cinder: Reduce time between checks in migrate_by_connector

    5min is quite a long time to keep a volume "hanging" in reserved state
    without a reason. We'll reduce it to a max of 1 min for now.
    
    Change-Id: I092f72db5257690e76b9e003552e7cbce3f991a7
    (cherry picked from commit afe5b85)
    joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    a66abd6 View commit details
    Browse the repository at this point in the history
  134. VmWare: Refactored resource and inventory collection

    Joined the collection of the cpu-info with the host-stats in order to
    avoid calling twice the property collector in different places to
    get information over all the host.
    
    Split the code into getting the data and aggregating it over the cluster.
    Partly to split the logic into more easily consumable parts,
    but it is also a preparation for exposing the individual esxi-hosts
    as hypervisors.
    
    Change-Id: I383854bb0e956519e3bdc42121b59d43ca54743d
    (cherry picked from commit 8bf970c)
    fwiesel authored and joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    fd4f43b View commit details
    Browse the repository at this point in the history
  135. Vmware: Aggregate cpu_info on item level

    Clients such as the cli expect cpu_info to be a json dict,
    so returning a string is breaking them.
    By aggregating the individual attributes, we also work around
    having different cpu models with the same flags in a cluster
    not yielding any information. Now we can see that the model
    mismatch, but the cpu feature flags are matching.
    
    Change-Id: Id552121b642ec90f6e06be09825ab7339531f9d6
    (cherry picked from commit a84a785)
    fwiesel authored and joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    451b6db View commit details
    Browse the repository at this point in the history
  136. Vmware: Handle empty quickStats in host stats

    Host in certain states (such as disconnected ones) do not report
    quickStats about memory usage, so we have to assume values
    here, which is better than failing updating the stats for all hosts.
    
    Since they are not available, both the usage as well as the free
    memory will not be added to the total for the cluster.
    
    Change-Id: I05b3a158a058034d9a50a6948cdf302769f2d0eb
    (cherry picked from commit c934ffc)
    fwiesel authored and joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    06d7fec View commit details
    Browse the repository at this point in the history
  137. ShardFilter: Allow cross-shard migrations

    The filter would block any operation with a source-host,
    that includes also migrations. Live-migrations already
    work across shards and cold will hopefully follow soon.
    
    Change-Id: I9f9b2c4f3eae642d78fb349f5c752711bcd94af2
    (cherry picked from commit d92c2a4)
    
    Scheduler: Pass also the source-node
    
    Each compute host may have multiple hypervisors by design
    The 'HostState' passed to the filter is per hypervisor,
    and just happens to be one per hypervisor in all cases
    except ironic currently
    
    Change-Id: Ifcf7d7ea390562c963f6cc3a9ae5bc7efe5a5e8f
    (cherry picked from commit 3423e23)
    
    CpuInfoMigrationFilter: Ensure CPU-compatibility
    
    We can only do a live-migration to a 'host' with a super-set
    of cpu-feature flags of the source. The filter ensures
    
    Change-Id: Icc4a6d1989fe055348a120abda24ba57048d3921
    (cherry picked from commit 93ac818)
    fwiesel authored and joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    3371277 View commit details
    Browse the repository at this point in the history
  138. VMwareapi: Use refs as index for fakedb

    The code was previously using the object-id to
    lookup an object, meaning that you couldn't pass
    a newly created Managed-object-reference like you
    could over the vmware-api.
    Now the lookup happens over the ref-id string,
    and in turn some functions were refactored
    to take that into account.
    
    Change-Id: I70b87ed5f4fe08076745f9bc389b0f42930395cf
    (cherry picked from commit 8e7609b)
    fwiesel authored and joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    23e6058 View commit details
    Browse the repository at this point in the history
  139. Do not block on reservation while attaching

    The attachment operation may potentially be long-running,
    so a batch of attachments/reservations may be causing a rpc timeout.
    The compute node still blocks on the lock and creates a bdm regardless.
    
    This replaces the instance lock with a bdm specific lock on
    bdm allocation.
    
    That means reserving a device name can run concurrently with
    detaching a volume.
    
    This will likely create a different behaviour, but since the
    value has a non-zero probability of being wrong anyway,
    we risk it being wrong more often.
    
    Change-Id: I99921bb0f22b02c51377ae276429319639e534df
    (cherry picked from commit d64c094)
    fwiesel authored and joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    2c8dc51 View commit details
    Browse the repository at this point in the history
  140. vmware: Reserve memory for large VMs

    Instead of only reserving memory for big VMs, we now also reserve memory
    for large VMs, because we want to help DRS with scheduling those VMs
    correctly (i.e. by configured memory) and improve performance for the
    VMs.
    
    Change-Id: I8b3a1a63ea4c1d459ed5c731d5ef94343d9e6046
    (cherry picked from commit 572ca67)
    joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    da3c340 View commit details
    Browse the repository at this point in the history
  141. vmware: Use vmdk size from .ova files

    Since .ova files contain more than just the .vmdk, the size reported by
    glance is not the size of the actual upload. Since olso.vmware now has a
    guard against not-finished uploads integrated, we cannot just use the
    size from glance. Instead, we switch to using the size of the .vmdk as
    reported by the tar file.
    
    Change-Id: I05cdc3fc47974ceb34a72704d79b3f7c54c05d41
    (cherry picked from commit 5d73d02)
    joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    e915df0 View commit details
    Browse the repository at this point in the history
  142. Transport context to all threads

    The nova.utils.spawn and spawn_n methods transport
    the context (and profiling information) to the
    newly created threads. But the same isn't done
    when submitting work to thread-pools in the
    ComputeManager.
    
    The code doing that is extracted to a new
    function and called to submit the work to the
    thread-pools.
    
    Change-Id: I9085deaa8cf0b167d87db68e4afc4a463c00569c
    (cherry picked from commit 57e1efc)
    fwiesel authored and joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    be295e4 View commit details
    Browse the repository at this point in the history
  143. Try migrate_by_connector on attachment_create

    As we cannot conform to the original design of
    volumes being accessible to all hosts, we raise an
    exception, in the case that is not possible and
    migrate the volume to a cinder-host, where it is possible.
    
    That is already done for attachment_update, which is called
    for the normal attachment of a volume to a server.
    
    In case of a live-migration (step pre_live_migration)
    creates a new attachment for the target hypervisor node,
    and passes the connection_info with it.
    
    It can fail for the same reason, as in attachment_update.
    As before, we need to migrate the volume to a cinder-host
    where the volume is accessible to the target hypervisor.
    
    Change-Id: I6232f34f47ae2bcb78d83f587d8edaf701c2341b
    (cherry picked from commit dd9c94a)
    fwiesel authored and joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    4ee520e View commit details
    Browse the repository at this point in the history
  144. [manage] Update purge_deleted_instances to include all cells

    In case all_cells is not supported, user needs to use separate nova.conf
    just to switch cells. Thus providing support to purge for all cells.
    
    (cherry picked from commit 07e2551)
    kpawar-sap authored and joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    860260b View commit details
    Browse the repository at this point in the history
  145. Vmwareapi: Replace nova is_vim_instance with oslo.vmware one

    The function is already implemented in oslo.vmware
    
    Change-Id: I48ab65502d1cd825fede6f73e764a5926d949beb
    (cherry picked from commit 7966aa1)
    fwiesel authored and joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    4058232 View commit details
    Browse the repository at this point in the history
  146. Vmwareapi: Use oslo_vmware get_object_property instead of nova one

    get_network_with_the_name is the last place where
    the nova function of the same name is used instead
    of the one implemented in oslo_vmware.
    
    Change-Id: I4293f3b2a7551793dd53dd427383583469ed0868
    (cherry picked from commit dcaf9c6)
    fwiesel authored and joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    c11e541 View commit details
    Browse the repository at this point in the history
  147. Vmware: Reduce vm search scope for images

    When importing an image, and having a duplicate name
    we were going through all the VMs in the vcenter to find matching one.
    This can be potentially 100k, while it is rather likely that it
    is in the same location we are trying to import.
    We now only search in the folder.
    As the name is specific for the datastore, it would should only fail,
    if someone uses the same naming convention and place it somewhere else.
    
    Change-Id: I9e1c55d560d36768037f4036b546b80eaa21ed32
    (cherry picked from commit 2ca317a)
    fwiesel authored and joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    d497886 View commit details
    Browse the repository at this point in the history
  148. Pass _nova_check_type through scheduler_hints & set live_migrate

    The confusing names led apparently to a parameter passed through
    scheduler_hint, which is containing a filter_property,
    which has scheduler_hints, which is where the parameter should
    have ended up.
    
    Additionally, set the live-migrate flag instead of inferring
    the state from other flags.
    
    The flags are persisted, so we need to overwrite them.
    
    Change-Id: Id80bfd2f3e4771856cc0d85bc1b85a7d14f3b136
    (cherry picked from commit 2c26b67)
    fwiesel authored and joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    c7806ad View commit details
    Browse the repository at this point in the history
  149. Handle cells in CpuInfoMigrationFilter

    We have to look up the cell of the compute node to query
    the database for the correct information.
    
    Change-Id: I90951af80091fe871de217bb17f98c67b1284722
    (cherry picked from commit 789bb9a)
    fwiesel authored and joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    0bb5022 View commit details
    Browse the repository at this point in the history
  150. Vmware: Use a more generic serialisation of specs to json

    The previous version was doing it hand-coded, while
    this one uses the type-system, which allows us to
    serialise only the necessary fields (non-null), and
    also serialise polymorphic objects.
    
    Change-Id: I0dcafb74b3494185b3b58d78cb501069675aea33
    (cherry picked from commit d77cdcb)
    fwiesel authored and joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    9ccf91a View commit details
    Browse the repository at this point in the history
  151. Vmware: Use Rpc instead of own session

    The advantage of this approach is, that we now properly
    encapsulate each compute node instead of 'messing'
    remotely around in another vcenter.
    So caching and etc should work as expected.
    
    The downside is additional hops:
    Before: compute -> other-vcenter
    Now:    compute -> messaging -> other-compute -> other-vc
    
    So more ways of going wrong.
    
    Change-Id: I37be358ff7c3bd9f786e6ce086e91ff2b2fc3861
    (cherry picked from commit da49100)
    fwiesel authored and joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    01a1095 View commit details
    Browse the repository at this point in the history
  152. Vmware: Remove uuid parameter from get_vmdk_info call

    We changed the code to ignore the file-name,
    as a vmotion will result in renaming of the files
    breaking the heuristic to detect the root disk.
    Instead we were taking the first disk,
    when the uuid parameter was set.
    
    The uuid parameter is not set when working with shadow-vms
    and vms for image import. So, no special handling is
    needed, we always want the first disk in those cases too
    and so we can scrap the uuid argument.
    
    Change-Id: Ib3088cfce4f7a0b24f05d45e7830b011c4a39f42
    (cherry picked from commit bd7925e)
    fwiesel authored and joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    52a877f View commit details
    Browse the repository at this point in the history
  153. Vmware: Move session test to down module

    VMwareAPISession has been moved to its own module,
    and this change should reflect that in the test case.
    
    Change-Id: Ie0878986db41887f9f0de0bc820135d5284df403
    (cherry picked from commit 9854168)
    fwiesel authored and joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    4b33ecd View commit details
    Browse the repository at this point in the history
  154. Vmwareapi: StableMoRefProxy for moref recovery

    The vmwareapi driver uses Managed-Object references throughout
    the code with the assumption that they are stable. It is however
    a database id, which may change during the runtime of the compute node
    If an instance is unregistered and re-registerd in the vcenter,
    the moref will change.
    
    By wrapping a moref in a proxy object, with an additional method
    to resolve the openstack object to a moref, we can hide those changes
    from a caller.
    
    For that the initial search/resolution needs to wrap the resulting
    moref in such a proxy.
    
    Change-Id: I40568d365e98359dbe90663c400e87be024df2eb
    (cherry picked from commit 89b5c6e)
    
    Vmware: MoRef implementation with closure
    
    This should ease the transition to stable mo-refs
    One simply has to pass the search function as a closure
    to he MoRef intstance, and the very same method will
    be called when an exception is raised for the stored
    reference.
    
    Change-Id: I98b59603a8ef3b91114f378d82cd7418d26a1c52
    (cherry picked from commit c854d41)
    
    Vmware: Implement StableMoRefProy for VM references
    
    By encapsulating all the parameters for searching for
    the vm-ref again, we can move the retry logic to the
    session object, where we can try to recover the vm-ref
    should it result in a ManagedObjectNotFound exception
    
    Change-Id: Id382cadd685a635cc7a4a83f69b58075521c8771
    (cherry picked from commit bc23e94)
    
    Vmwareapi: Move equality test to tests
    
    The equality test is only used by the tests
    so it is better implemented there.
    
    Change-Id: I51ee54265c4cc2b4f40c0b83f785a49f8a8ebce4
    (cherry picked from commit 84f3e06)
    
    Vmwareapi: Stable Volume refs
    
    The connection_info['data'] contains the managed-object
    reference (moref) as well as the uuid of the volume.
    
    Should the moref become invalid for some reason,
    we can recover it by searching for the volume-uuid
    as the `config.instanceUuid` attribute of the shadow-vm.
    
    Change-Id: I0ae008fa15a7894e485370e7b585821eeb389a93
    (cherry picked from commit a71ddf0)
    fwiesel authored and joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    afec39e View commit details
    Browse the repository at this point in the history
  155. Vmware: Remove nvp.vm-uuid on clone

    The clone created in a snapshot would also contain
    the nvp.vm-uuid field in the extra-config.
    If we delete then the original vm, the fallback mechanism
    of searching for the VM by extra-config would trigger,
    and find the snapshot and delete that instead.
    
    Change-Id: I6a66fa07dfe864ad4deedc1cafe537959cd969f4
    (cherry picked from commit 90a9f4e)
    fwiesel authored and joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    49c6c2d View commit details
    Browse the repository at this point in the history
  156. VMware: root disk anti affinity

    Remove datastore_regex from VMWareLiveMigrateData
    This was a leftover of some part of the development process and never
    used. Thus, we remove it again.
    
    Change-Id: I37ce67b4773375e31f18ac809a6029aa41702a3b
    (cherry picked from commit 17928f7)
    
    vmware: ds_util.get_datastore() supports hagroups
    
    We're going to implement hagroups of datastores and for that we need to
    be able to select a datastore from a specified hagroup. This is
    currently planned via matching the name of the datastore against a
    regex, that can extra the hagroup from the name.
    
    This commits adds retrieving the hagroup and checking it against the
    requested one to ds_util.get_datastore().
    
    Change-Id: Ie3432a8e0b020ca9bf41abc098c0fac059af0df9
    (cherry picked from commit f8e452a)
    
    vmware: Add setting datastore_hagroup_regex
    
    This setting will be used to enable distribution of ephemeral root disks
    between hagroups of datastores. The hagroups are found by applying this
    regex onto the found datastore names and should be named "a" or "b".
    
    Change-Id: I45da5dd5c46a4ba64ea521a0e0975f133b5801f1
    (cherry picked from commit c10d4e8)
    
    vmware: Distribute VM root disks via hagroups
    
    We want to distribute the ephemeral root-disk of VMs belonging to the
    same server-group between groups of datastores (hagroups). This commit
    adds the mentioned functionality for spawning new VMs, offline and
    online migration.
    
    Change-Id: I889514432f491bac7f7b6dccc4683f414baac167
    (cherry picked from commit 6feb47d)
    
    vmware: Add method to svMotion config/root-disk
    
    For distributing ephemeral root disks of VMs belonging in the same
    server-group between 2 hagroups, we need to be able to move the
    disk/config of a VM to another ephemeral datastore.
    
    This method will do an svMotion by specifying a datastore for all disks
    and the VMs. The ephemeral disks - found by using the datastore_regex -
    receive the target datastore while all other disks, which should be
    volumes, receive their current datastore as target.
    
    Change-Id: Iac9f2a2e35571bef3a58a22f6d96608f2b0bf343
    (cherry picked from commit 01b9876)
    
    vmware: Ignore bfv instances for hagroups
    
    Boot-from-volume instances do not matter for our ephemeral-root-disk
    anti-affinity as Cinder manages anti-affinity for volumes and
    config-files going down with a datastore do not bring the instance down,
    but only make it inaccessible / unmanagable. The swap file could become
    a problem if it lives on the same datastore as the config-files, but
    newer compute-nodes store the swap files on node-local NVMe swap
    datastores in our environment, so we ignore this for now. We could solve
    this by passing in a config option that determines whether we should
    ignore bfv instances or not depending on if we detect node-local swap
    datastores or not.
    
    We move the generation of hagroup-relevant members of a server-group
    into its own function.
    
    Change-Id: Id7a7186909e236b7c81b4b8c8489e84f1067f2d4
    (cherry picked from commit 2c7e2cc)
    
    vmware: Add hagroup disk placement remediation
    
    Every time a server-group is updated through the API, we call this
    method to verify and remedy the disk-placement of VMs in the
    server-group according to their hagroups.
    
    Change-Id: I7ba6b14f5c969fb77dc5ce0fed63a6d9251f556e
    (cherry picked from commit cc50e0d)
    
    vmware: Validate hagroup disk placement in server-group sync-loop
    
    This replaces adding an additional nanny to catch when Nova missed an
    update to a server-group e.g. because of a restart.
    
    Change-Id: I9aa516bfe6be127a011539d9d22a78d1f38aba13
    (cherry picked from commit 09a32e2)
    
    vmware: Use instance lock for ephemeral svMotion
    
    When moving the ephemeral root-disk and the VM's config files, we take
    the instance-lock to serialize changes onto the VM. This makes sure,
    that we don't squeeze our task between other tasks in the vCenter, which
    would make use read an inconsistent state of the VM.
    
    Change-Id: I04fc39bd48896bfd8010f17baa934f6f828edcef
    (cherry picked from commit 4f5eda3)
    
    vmware: Place VMs to hagroups more randomly
    
    The previous implementation of placing a VM onto an hagroup based on the
    index it has in the server-group has a big disadvantage for the common
    use-case of replacing instances during upgrades one by one. In doing so,
    every VM added to the end would end up on the same hagroup.
    
    To work against this, we put VMs onto hagroups randomly by taking their
    UUIDs first character and use this modulo 2 as the deciding factor.
    These UUIDs being already generated randomly, we don't need to hash them
    or anything.
    
    Change-Id: Ib0d9f24ae7d5e0d4e2dceeb77a1513a8657976d2
    (cherry picked from commit 52b5d4b)
    
    vmware: datastore_hagroup_regex ignores case
    When finding hagroups in datastores with the regex from
    datastore_hagroup_regex, we use re.I to ignore the case so that an error
    made by an operator in naming the datastore does not break the feature.
    
    Change-Id: I4de760d99513abc9977f698aaba85b6456709ca6
    joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    dccdab2 View commit details
    Browse the repository at this point in the history
  157. [vmware] cross-vcenter resize/migration according to nova principles

    Prior to this, the driver was performing migration/resize in a
    way that could lead a VM into an inconsistent state and was not
    following the way nova does the allocations during a migration.
    Nova expects the driver to do the following steps
    * mirate_disk_and_power_off() - copies the disk to the dest compute
    * finish_resize() - powers up the VM in the dest compute
    This change removes the RelocateVM_Task and introduces a new
    CloneVM_Task instead, in migrate_disk_and_power_off().
    
    The CloneVM_Task also allows now cross-vcenter migrations.
    
    Co-Authored-by: Marius Leustean <[email protected]>
    Change-Id: I9d6f715faecc6782f93a3cd7f83f85f5ece02e60
    (cherry picked from commit 95f9036)
    fwiesel authored and joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    2aeeae3 View commit details
    Browse the repository at this point in the history
  158. vmware: Set profile on volume attachment

    If we attach a volume to a VM, we have to set the storage-profile.
    Otherwise, the VM will not be compliant to the profile and - especially
    on VMFS datastores - cannot be storage-vMotioned around if the
    storage-profile includes storage-IO control. With setting the profile
    for each disk-attachment, the VM also shows compliant to these profiles
    in the UI.
    
    Change-Id: Idad6293dc7dfdf46fed584b9c116c03f928d44fe
    (cherry picked from commit dabcbca)
    joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    975577d View commit details
    Browse the repository at this point in the history
  159. VMwareapi: Raise proper exception for missing shadow-vms

    If a shadow-vm is missing, we raise an AttributeError,
    which is not clearly identifying the reason of the failure.
    We better re-raise the original ManagedObjectNotFound exception,
    so it is more clearly identifiable
    
    Change-Id: I954c57e97961833208743bc88e3ce75ad23cfe8c
    (cherry picked from commit a5a9dd9)
    fwiesel authored and joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    e478b7f View commit details
    Browse the repository at this point in the history
  160. Vmwareapi: Fix attachment of multiple nics

    If multiple nics are attached, they need different device-keys
    otherwise the vmwareapi will reject the request
    
    Change-Id: I0aa58ad11c499e9423c7ecc7998325b05dd9147e
    (cherry picked from commit 8ba8b32)
    fwiesel authored and joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    d2bc05d View commit details
    Browse the repository at this point in the history
  161. vmware: Set appropriate settings for resize to > 128 cores

    When spawning a VM with more than 128 cores, we set numCoresPerSocket
    and some flags e.g. vvtdEnabled. We missed to add the same flags when
    resizing to a VM having more than 128 cores. This patch remedies that.
    
    Change-Id: I381a413ecf80af14dd4bf1dfde2d070976b6477a
    (cherry picked from commit cfd906b)
    joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    1aa03bf View commit details
    Browse the repository at this point in the history
  162. vmware: Make sure instance memory is multiple of 4.

    (cherry picked from commit a0dc4cb)
    kpawar-sap authored and joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    8ef2955 View commit details
    Browse the repository at this point in the history
  163. Vmwareapi: Migrate disk with minimal vm

    When simply cloning the original VM, the size might not
    fit on the target hypervisor.
    Resizing it to the target size might not fit on the
    source hypervisor.
    So we simply scale it to minimal size, as we are going
    to reconfigure it to the proper size on the target
    hypervisor anyway.
    
    Change-Id: Ia05e5b3a5d6913bfcef01fa97465a1aaa69872d0
    (cherry picked from commit 40d6589)
    
    Vmware: Warn about failed drs override removal
    
    An error needs manual intervention, and an exception debugging from
    a developer. This, however, is a known behaviour which potentially
    can lead to problems, hence a warning
    
    Change-Id: I9479fb6405485e763a6344e7f44a60f75891adcb
    (cherry picked from commit f88a96c)
    fwiesel authored and joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    bc8e263 View commit details
    Browse the repository at this point in the history
  164. vmware: Set migration.dataTimeout for big VMs

    When VMs with lots of CPUs are running for a longer period of time, a
    task to reconfigure the VM might end up hanging in the vCenter.
    
    According to VMware support, this problem happens if those VMs are
    running for a longer period of time and with the large number of CPUs
    have accumulated enough differences between those CPUs, that getting
    them all into a state where a reconfigure can be executed takes more
    time than the default 2s (iirc). The advanced setting to increase this
    time is "migration.dataTimeout".
    
    For simplicity reasons and because it shouldn't hurt (according to
    VMware), we set it on all big VMs. That way, we do not have to figure
    out if the VM consumes enough CPUs of the hypervisor to need this
    setting.
    
    Change-Id: Id8bda847c9e48997b385d9e1079ee9e99af9b8e8
    (cherry picked from commit 2f7393c)
    joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    e1a2079 View commit details
    Browse the repository at this point in the history
  165. vmware: Update expired image-template-VM handling

    Until now, we only kept image-template-VMs that had tasks that showed
    their ussage - but VMs cloned from another image-template-VM don't have
    any tasks. Thus we immediately removed VMs we cloned to another BB. This
    could even happen when the copying of the disk into the cache directory
    was still in progress.
    
    To counter this, we now take the "createDate" of the VM into accound and
    only delete image-cache-VMs that were created more than
    CONF.remove_unused_original_minimum_age_seconds ago. Additionally, we
    take the same lock we also take in deploying image-cache-VMs and copying
    their files. This should protect from deleting the VM while a
    copy-operation is still in progress.
    
    Deleting the VM while copying is still in progress does not stop the
    copying. Therefore, this race-condition might be responsible for a lot
    of orphan vmdks of image-cache-VMs on our ephemeral datastores.
    
    Change-Id: Ic0a694a8c4df203c8c100abf5b8d2e9ee73866f7
    (cherry picked from commit d8f3ddf)
    joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    9de18bd View commit details
    Browse the repository at this point in the history
  166. Add PreferSameShardOnResizeWeigher.

    This filter enable to select same host-aggregate/shard/VC for instance
    resize because it could take more time to migrate the volumes over other
    shards.
    
    (cherry picked from commit f648b9b)
    kpawar-sap authored and joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    1b99a7d View commit details
    Browse the repository at this point in the history
  167. Vmwareapi: Set hw version on resize

    Resizing to a different flavour may imply also
    a different hw-version, so we need to set it
    otherwise it will stay on the previous one,
    which may be incompatible with the desired configuration
    
    Only upgrade is possible though.
    
    Change-Id: I7976a377c3e8944483a10fdada391e8c51640e30
    (cherry picked from commit 28fb1a4)
    
    Vmware: Only change hw_version by flavor
    Be more strict in the upgrade policy,
    and only upgrade on resize, if the flavor demands it.
    Not if the default has changed
    
    Change-Id: I25a6eb352316f986b179204199b098a418991860
    fwiesel authored and joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    213dba2 View commit details
    Browse the repository at this point in the history
  168. bigvm: Handle multiple aggregates on resource provider

    When switching to filtering the AZ via placement, we need the bigvm
    resource provider to be in the AZ aggregate in addition to being in the
    aggregate of the host's resource provider. Therefore, we find the host
    aggregate by seeing which aggregate is also a hypervisor uuid.
    
    Change-Id: I250f203b3bb24e084ec1b499a923f7f66e638102
    (cherry picked from commit 29ce312)
    
    bigvm: Do not remove parent provider's previous aggregates
    
    When we filter AZs in placement, we don't want the aggregates our
    resource providers removed by nova-bigvm, as they represent the AZ.
    Therefore, we query the aggregates of the "parent" provider and make
    sure to include these aggregates, if we have to set the resource
    provider's UUID as an aggregate, too.
    
    Change-Id: If3986df022273f20e109816f2752ce0254db4f10
    (cherry picked from commit 2e98cd4)
    
    bigvm: Ignore deleted ComputeNode instances
    
    Querying via ComputeNodeList also returns deleted ComputeNode instances.
    Therefore, we might create bigvm-deployment resource providers for a
    deleted instance instead of the right instance and thus for a wrong
    resource provider. With ignoring deleted ComputeNode instances, this
    should not happen anymore.
    
    Change-Id: I5a4c6c5a1894d1f6f5cff6e3475670c27bb97f28
    (cherry picked from commit f7f5f0c)
    joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    cf8ca9f View commit details
    Browse the repository at this point in the history
  169. nova-manage: Don't fail on no Ironic nodes found

    There can be Ironic hosts, that only have nodes assigned when those
    nodes are getting repairs or getting build up. Those Ironic hosts would
    come up empty when searching for ComputeNodes in the sync_aggregates
    command and would be reported as a problem, which makes the command
    fail with exit-code 5. Since it's no problem if an Ironic Host doesn't
    have a ComputeNode, because each node is its own resource provider in
    placement anyways, we ignore Ironic hosts without nodes now in the
    error-reporting.
    
    Change-Id: I163f3e46f2e375531b870a363b84bba67816954d
    (cherry picked from commit 67779eb)
    joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    fe889de View commit details
    Browse the repository at this point in the history
  170. vmware: fix get_rules_by_prefix() wrong attribute

    The DRS rules can be read from the "rule" attribute, not from the
    "rules" attribute. We found this, because Nova wasn't deleting
    DRS rules for no-longer-existing server-groups.
    
    Change-Id: I86f7ca85d9b0edc1406a54a6f392bfff8f0af00d
    (cherry picked from commit 562b084)
    joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    c891ce6 View commit details
    Browse the repository at this point in the history
  171. vmware: Enable disabled DRS rules in sync

    When syncing a server-group's DRS rule(s) we now also enable a found
    rule in case it is disabled. We don't know how this happens, but
    sometimes rules get disabled and we need them to be enabled to guarantee
    the customer the appropriate (anti-)affinity settings.
    
    Change-Id: Ibc8eb6800640855513716412266fcbb9fbc4db42
    (cherry picked from commit d712c23)
    joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    e7856b4 View commit details
    Browse the repository at this point in the history
  172. vmware: Fix UnboundLocal error in manage_image_cache

    When we don't find any datastores for whatever reason, we don't have the
    "dc_info" variable set and thus cannot call
    self._age_cached_image_templates() with it as it results in an
    UnboundLocal exception.
    
    Change-Id: I2dca6d2d6ab7ca5cbc4ef7d2c316faaf6edfee7d
    (cherry picked from commit d2cf44f)
    joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    7f41692 View commit details
    Browse the repository at this point in the history
  173. VMWareapi: Handle missing Host properties

    The properties may not be set, if the host is disconnected.
    
    Change-Id: I1c53477e891b5b95859ca267fcad8cd1bff260ef
    (cherry picked from commit 0cb8b61)
    fwiesel authored and joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    6fd07b7 View commit details
    Browse the repository at this point in the history
  174. Vmwareapi: Move pre_live_migration to vmops

    Most code related to vms is in vmops, not in the driver
    So we move this code there too
    
    Change-Id: I1b801c8f12b377dd74a31ef646216c564631fe7f
    (cherry picked from commit ade6f4c)
    fwiesel authored and joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    be7e305 View commit details
    Browse the repository at this point in the history
  175. Vmwareapi: Pass cookie-header as string

    This requires a change to oslo.vmware to accept a string
    instead of only a cookiejar.
    
    Depends-On: Ia9f16758c388afe0fe05034162f516844ebc6b2b
    Change-Id: I34a0c275ed48489954e50eb15f8ea11c4f6b1aa6
    (cherry picked from commit 726d7a2)
    fwiesel authored and joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    666f433 View commit details
    Browse the repository at this point in the history
  176. Vmwareapi: Workaround for Config-Drives with Live-Migrations

    While we cannot live-migrate CD-Roms directly between vcenters,
    we can copy the data and detach/reattach the device manually.
    
    Change-Id: I88b4903f745e1bcfe957ddc07c6e9c040820ed6b
    (cherry picked from commit 14f9a5f)
    fwiesel authored and joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    51444f1 View commit details
    Browse the repository at this point in the history
  177. volume: Treat 404 on attachment_delete as OK

    Since the mission is to delete the attachment, Cinder returning a 404 on
    attachment deletion call can be ignored. We've seen this happening where
    Cinder took some time to delete the attachment so Nova retried as it got
    a 500 back. On this retry, Nova got a 404 and left the BDM entry as
    leftover while aborting the deletion, that already happened by driver.
    
    Change-Id: I15dd7b59a2b3c528ecad3b337b92885b4d7bd68f
    (cherry picked from commit 82992a5)
    joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    8dd6b0c View commit details
    Browse the repository at this point in the history
  178. vmwareapi: Handle missing volume_id in connection_info

    Apparently, the volume-id is not consistently
    stored as volume_id in connection_info.
    Use the block_device.get_volume_id function to handle
    the fallback.
    
    Change-Id: If5a8527578db8e4690595524e0785ee8b4de1d79
    (cherry picked from commit 607fd0d)
    fwiesel authored and joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    bec171b View commit details
    Browse the repository at this point in the history
  179. vmware: Attach root disk first

    Since we don't explicitly set a disk as boot disk and instead rely on
    the order the disks have on the VirtualMachine, we need to make sure we
    attach the root disk first.
    
    Change-Id: I3ae6b5f053a3b171ed0a80215fc4204a2bf32481
    (cherry picked from commit 7e6dc54)
    joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    fcd91da View commit details
    Browse the repository at this point in the history
  180. Support split of large VMs and memory reservation handling

    We've recently changed that not all large VMs need DRS disabled - only
    the ones over 512 GiB memory. But we still need memory reservations for
    VMs of 230 GiB - 512, which was previously handled by them being large
    VMs. While we could do this via flavor, we failed to do so.
    Additionally, this would limit the amount of large VMs we can spawn on a
    cluster.
    
    To keep the same behavior we previously had for large VMs, we now split
    memory reservations from big/large VM detection with the following
    result:
    1) a big VM will get DRS disabled - big VMs are VMs bigger than 1024 GiB
    2) a large VM will get DRS disabled - large VMs are VMs bigger than 512
    GiB
    3) all VMs defining CUSTOM_MEMORY_RESERVABLE_MB resources in their
    flavor get that amount of memory reserved
    4) all VMs above full_reservation_memory_mb config setting get all their
    memory reserved
    
    Therefore, is_big_vm() and is_large_vm() now only handle DRS settings
    and special spawning behavior.
    
    Side effect is, that nova-bigvm or rather the special spawning code now
    doesn't consider 230 GiB - 512 GiB VMs as non-movable anymore and thus
    finds more free hosts.
    
    Change-Id: I2088afecf367efc380f9a0a88e5d18251a19e3a5
    (cherry picked from commit dca6fe6)
    joker-at-work committed Aug 8, 2022
    Configuration menu
    Copy the full SHA
    6574a3b View commit details
    Browse the repository at this point in the history

Commits on Aug 9, 2022

  1. Revert "db: Compact Train database migrations"

    This reverts commit 5c9c81e.
    
    We need the stein migrations non-compacted to be able to migrate from
    rocky. Our versions is 393 instead of upstream's 391 because, we added a
    migration 392 in our Rocky release and are thus already on 392 without
    having applied 391 coming later in Stein. Starting at 393, we have the
    possibility to add upstream's 391 as our 393.
    
    We did not add the nova/tests/* changes of the original commit back,
    because I don't think we need to test them if they were tested upstream
    before.
    
    Additionally, we had to change the version in the newer
    nova/db/migration.py instead of the previously changed
    nova/db/sqlalchemy/migration.py
    
    Change-Id: I57a163b1b603f0ac4a52ae7f6d58785cdd835530
    joker-at-work committed Aug 9, 2022
    Configuration menu
    Copy the full SHA
    b3ac2aa View commit details
    Browse the repository at this point in the history
  2. Revert "db: Compact Stein database migrations"

    This reverts commit f0175a3.
    
    We need the Stein migrations uncompacted to be able to migrate from
    Rocky. Since we already had a 392 migration in our downstream Rocky
    changes, we move upstream's 391 to 393 and add a placeholder for 391,
    since our DB is already at 392. Our downstream 392 will be added in a
    later commit.
    
    Change-Id: Ic8bebe7fb0770e60dd9856df9d529247e474e2c3
    joker-at-work committed Aug 9, 2022
    Configuration menu
    Copy the full SHA
    8e31619 View commit details
    Browse the repository at this point in the history
  3. Add migration for 'internal_access_path' column to allow more than 25…

    …5 chars.
    
    The 'internal_access_path' column of 'console_auth_tokens' table has the
    type String with max length of 255 characters which might not be enough
    when used with VmWare and that will cause a DBDataError.
    
    This migration changes the internal_access_path type to be Text with max
    length of 65535 characters. And also adds a placeholder for migration
    version number 391.
    
    Change-Id: I4463f01ae727edd5e76b4a50860b116cbdea6124
    Closes-Bug: #1900371
    galkindmitrii authored and joker-at-work committed Aug 9, 2022
    Configuration menu
    Copy the full SHA
    1c3d475 View commit details
    Browse the repository at this point in the history
  4. Revert "apidb: Compact Train database migrations"

    This reverts commit df89596.
    
    We need the migrations non-compacted, because we are upgrading from
    Rocky.
    
    Change-Id: I68bcbc90d543b526b6abed61f9326109d9727c4f
    joker-at-work committed Aug 9, 2022
    Configuration menu
    Copy the full SHA
    8b0ee3b View commit details
    Browse the repository at this point in the history
  5. Revert "apidb: Compact Stein database migrations"

    This reverts commit dae3c89.
    
    We need these migration non-compacted, because we're upgrading from
    Rocky.
    
    Change-Id: I41b5f8d46639810f3be139e18ac2f399e4f637f8
    joker-at-work committed Aug 9, 2022
    Configuration menu
    Copy the full SHA
    a5f3d90 View commit details
    Browse the repository at this point in the history

Commits on Aug 11, 2022

  1. Add oslo.vmware stable/xena-m3 as custom requirement

    We cannot use the upper-contraints.txt with URLs anymorme, because newer
    pip doesn't like that. Additionally, we want our repositories with full
    git history around for easier debugging - the other services are doing
    it like that, too.
    
    We also add our git version of oslo.vmware to the text-requirements and
    switch the tox default for the constraints file to our version, that
    does not contain an oslo.vmware pinning so we can install from git.
    
    Change-Id: Ia5d8fe096e15b9b244573d662ea73613b3c68744
    joker-at-work committed Aug 11, 2022
    Configuration menu
    Copy the full SHA
    407047c View commit details
    Browse the repository at this point in the history
  2. Fix tests with newer os-brick versions

    As described in [0] the newer versions of os-brick require the
    "lock_path" to be set - which is not done in the
    ComputeManagerUnitTestCase and TestDriverBlockDevice. Since detaching a
    volume uses os-brick to do the locking, the unit tests would fail. This
    commit fixes this setting REQUIRES_LOCKING as suggested in [0] - I don't
    know why that wasn't committed upstream, though.
    
    [0] https://bugs.launchpad.net/os-brick/+bug/1969794
    
    Change-Id: Ifd01a0b38143839719ab2de4ee53e6aa7752146b
    joker-at-work committed Aug 11, 2022
    Configuration menu
    Copy the full SHA
    01d67ab View commit details
    Browse the repository at this point in the history
  3. Fix uncompacting DB migrations

    There are some imports that changed as the compaction of the DB
    migrations was a little longer ago ...
    
    Change-Id: I3ad9a1dd3144ff09a73382d4233cf5f995dee2d8
    joker-at-work committed Aug 11, 2022
    Configuration menu
    Copy the full SHA
    bee9535 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    5c28cb6 View commit details
    Browse the repository at this point in the history
  5. Add concourse_unit_test_task for CI

    This task is run by our internal CI during image-build to check that the
    unit-tests pass.
    
    Change-Id: I89a03514093682b9bd2a1c48a13c6f7206b2e9e4
    joker-at-work committed Aug 11, 2022
    Configuration menu
    Copy the full SHA
    67b0046 View commit details
    Browse the repository at this point in the history

Commits on Aug 15, 2022

  1. Configuration menu
    Copy the full SHA
    7a12fb7 View commit details
    Browse the repository at this point in the history

Commits on Aug 19, 2022

  1. Add mitmproxy to custom requirements

    We previously built a separate image for the
    nova-console-shellinaboxproxy, but are now using the same images as the
    rest of Nova, because Nova's dependencies can now support mitmproxy.
    Therefore, we add mitmproxy as custom requirement to be installed in our
    image.
    
    Change-Id: Ib3b60f86434938b0805778650d6d9694cfd922bd
    joker-at-work committed Aug 19, 2022
    Configuration menu
    Copy the full SHA
    d4e56ed View commit details
    Browse the repository at this point in the history
  2. Update nova-shellinaboxproxy command for newer mitmproxy

    Newer versions of mitmproxy don't support "-R", "--port" and
    "--bind-address" anymore. Instead, the reverse-proxy mode is now set
    with "--mode reverse:URL", the port is defined with "--listen-port" and
    the binding address with "--listen-host".
    
    We switch to calling mitmproxy without a shell here, because we don't
    need any features of the shell. This makes the command definition more
    readable, too.
    
    Change-Id: Iaa75d1771f0b998484012debe408349ba139e6b5
    joker-at-work committed Aug 19, 2022
    Configuration menu
    Copy the full SHA
    8e6f7c9 View commit details
    Browse the repository at this point in the history
  3. Update shellinabox console for Xena

    The NovaProxyRequestHandlerBase class we based our NovaShellInaBoxProxy
    upon when implementing db token support in
    	shellinabox: add support for db tokens
    was merged into NovaProxyRequestHandler in
    	 trivial: Merge unnecessary 'NovaProxyRequestHandlerBase' separation
    Therefore, we now switch our base to NovaProxyRequestHandler, but do not
    call the parent's init as we do not need any functionality of
    websockify.ProxyRequestHandler in our class. The compatibility of the
    __init__() needs to be checked on upgrade.
    
    Change-Id: I4fda6d5251d671af161441c8cb8bbe091bb970b4
    joker-at-work committed Aug 19, 2022
    Configuration menu
    Copy the full SHA
    c1a8fcc View commit details
    Browse the repository at this point in the history
  4. Let nova-shellinaboxproxy URL be non-positional

    We previously had proxyclient_url as optional positional argument to
    nova-shellinaboxproxy, but with the update to Xena, this changed to a
    required argument. We usually configure nova-shellinaboxproxy via
    nova.conf and thus don't need the required argument. Therefore, we make
    it non-positional now and thus available via
    --shellinabox-proxyclient_url instead.
    
    Change-Id: Iefeeaa8169835c2bbbe74fe59ab1b9588b8ee636
    joker-at-work committed Aug 19, 2022
    Configuration menu
    Copy the full SHA
    b0efa22 View commit details
    Browse the repository at this point in the history

Commits on Aug 23, 2022

  1. cinder: Define minimum migration timeout

    When Nova computes the wait time for volume-migration, it uses the
    volume size. For really small volumes (e.g. 1 GiB) the computed time is
    lower than the overhead added by the API and RPC calls inside Cinder.
    Therefore, Nova times out to wait for the migration, even though the
    migration happens as expected.
    
    To fix this, we add an additional static overhead for the migration
    timeout, defaulting to 10min.
    
    Change-Id: I1532054524653bc9dfaf5010f3250ea6bff03701
    joker-at-work committed Aug 23, 2022
    Configuration menu
    Copy the full SHA
    7acf6d5 View commit details
    Browse the repository at this point in the history

Commits on Sep 1, 2022

  1. Make migration 403 optional

    In our Rocky code-base we already had migration 403 backported as
    migration 320 and thus unconditionally adding the UniqueConstraint does
    not work for 403. Instead, we now check if there's already a
    UniqueConstraint on the table containing the columns we would want to
    add and return without applying actions if that's the case.
    
    Change-Id: Ie0cba9500945cd08d6c418cc9719aea7ede80e90
    joker-at-work committed Sep 1, 2022
    Configuration menu
    Copy the full SHA
    8896cc9 View commit details
    Browse the repository at this point in the history

Commits on Oct 5, 2022

  1. [scheduler] Refactored PreferSameHostOnResizeWeigher

    We can make use of the changes to the scheduler to
    simplify the internal logic.
    
    If it is a resize is now stored in the request_spec,
    as well as the source host, so we do not have
    to reconstruct it from instance data anymore.
    
    Change-Id: I4b016448a5a905a5d9833aa821daed186d7f1f8a
    fwiesel committed Oct 5, 2022
    Configuration menu
    Copy the full SHA
    7260d83 View commit details
    Browse the repository at this point in the history

Commits on Oct 25, 2022

  1. [vmwareapi] Limit free space of datastore to capacity 2

    Apparently, the vmware api can report datastores with more free space
    than capacity, which causes an exception in the olso_vmware api.
    Therefore, we limit the free space to the capacity.
    
    Change-Id: If6013022a8a32029d43f9074eaaeea5b55855104
    joker-at-work committed Oct 25, 2022
    Configuration menu
    Copy the full SHA
    ef1973f View commit details
    Browse the repository at this point in the history

Commits on Nov 9, 2022

  1. [vmware] allow pulling images from swift URL

    oslo.vmware contains a new function `image_pull_from_url`
    to pass the image URL directly to VMware for downloading the image.
    
    This can be feature toggled on/off via:
    [vmware]/allow_pulling_images_from_url
    leust authored and joker-at-work committed Nov 9, 2022
    Configuration menu
    Copy the full SHA
    73ef235 View commit details
    Browse the repository at this point in the history
  2. No Flavor notification in Test Fixture setup

    The notification causes a load of all projects
    for each flavor created, times for each test set up.
    Patching it out reduces total runtime of the unit
    tests by 20%
    
    Change-Id: Ib3b1f2bc401be67d043b723ecff59a0c45d9f81d
    fwiesel authored and joker-at-work committed Nov 9, 2022
    Configuration menu
    Copy the full SHA
    fb5bab0 View commit details
    Browse the repository at this point in the history
  3. [vmware] Fallback to uploading the image via HttpNfcLease URL

    A few vCenters have shown intermitent errors while verifying the
    SSL connection to Swift endpoint, thus throwing ocassionally the
    vim.fault.SSLVerifyFault exception.
    In such unexpected scenarios we can still fallback to transfer
    the image by uploading it to the HttpNfcLease URL.
    leust authored and joker-at-work committed Nov 9, 2022
    Configuration menu
    Copy the full SHA
    c4220d8 View commit details
    Browse the repository at this point in the history
  4. Vmwareapi: Moved serialization to oslo.vmware

    Depends-On: Ie76b1e6940b5022563ce91d5692df589573704d0
    Change-Id: I6fe097a9d2a83115f73c51016914ea18b708292b
    fwiesel authored and joker-at-work committed Nov 9, 2022
    Configuration menu
    Copy the full SHA
    4de4efb View commit details
    Browse the repository at this point in the history
  5. Remove tenant filter for Security group search in VM creation.

    This commit removes tenant (project) filter for security group list in
    two places `nova-api` and `nova-compute`. Nova will not be able to get
    the wrong security groups (that are not allowed) because in these places
    user's context is used.
    
    Security
    By adding additional debug logs I printed the list of security groups
    that Nova got from Neutron API and compare it with the available security
    group list. Both lists match 100%.
    
    Non-unique name problems
    If the user will have non-unique names for security groups: one in the
    project and the second one shared Nova API will return the error `More
    than one SecurityGroup exists with the name '...'`. Example:
    
       $ openstack server create --network=test-net --flavor=24 \
         --image=test-image --security-group=test-sg-rbac test
       More than one SecurityGroup exists with the name 'test-sg-rbac'.
    
    Performance
    As a result of these changes, Nova gets all security groups without
    filtering from Neutron API and it will be slower. Performance comparison:
    
       API call with project_id filter:
         time_namelookup:  0.004326s
            time_connect:  0.041808s
         time_appconnect:  0.137256s
        time_pretransfer:  0.137411s
           time_redirect:  0.000000s
      time_starttransfer:  0.611862s
                         ----------
              time_total:  0.778839s
    
       API call without filtering:
         time_namelookup:  0.006605s
            time_connect:  0.046886s
         time_appconnect:  0.145570s
        time_pretransfer:  0.145793s
           time_redirect:  0.000000s
      time_starttransfer:  0.773550s
                         ----------
              time_total:  0.938802s
    
    Change-Id: Ic859328ddc907311537a680b3aa18b1983474c14
    velp authored and joker-at-work committed Nov 9, 2022
    Configuration menu
    Copy the full SHA
    7efed66 View commit details
    Browse the repository at this point in the history
  6. Idempotent binding creation

    In case we already created a binding for a live-migration
    but crashed during the ongoing process, neutron will already
    hold a port binding for the host.
    Instead of failing, we can simply take the existing port-binding
    and continue
    
    Change-Id: If84c74e258084d4ab648a6a413896eda087317d7
    See: https://specs.openstack.org/openstack/neutron-specs/specs/backlog/pike/portbinding_information_for_nova.html
    fwiesel authored and joker-at-work committed Nov 9, 2022
    Configuration menu
    Copy the full SHA
    d955aa3 View commit details
    Browse the repository at this point in the history
  7. special_spawning: Check DRS behavior later

    Instead of immediately checking the default DRS behavior setting, we now
    try to find a free host first and only error out if we would need to
    rely on DRS to free up a host. This makes it possible to support BBs
    where an operator already freed up a host manually.
    
    Change-Id: I22dbdcc9f135bbfc9ef05e13c801e88a78e64236
    joker-at-work committed Nov 9, 2022
    Configuration menu
    Copy the full SHA
    971dd82 View commit details
    Browse the repository at this point in the history

Commits on Nov 10, 2022

  1. Add python-ironicclient into custom-requirements.txt

    Change-Id: I05dffb99a2f4ae5d629871c96642983435ac79b4
    joker-at-work committed Nov 10, 2022
    Configuration menu
    Copy the full SHA
    b3335c3 View commit details
    Browse the repository at this point in the history
  2. vmareapi: Fix jsonutils.load() needing bytes in Python3

    This is a fixup for commit 072b15f [vmware] Add configurable reservations per hostgroup
    
    Change-Id: I191aedb0e3bba7698825771089cf134f320368ec
    joker-at-work committed Nov 10, 2022
    Configuration menu
    Copy the full SHA
    5d56537 View commit details
    Browse the repository at this point in the history

Commits on Nov 11, 2022

  1. vmware: Report DISK_GB as integers

    With upgrading to Xena and thus Nova using newer microversions to talk
    to Placement, we cannot report floating point numbers for the resources
    anymore. This came up in DISK_GB, which is computed from MB and thus not
    always a round number.
    
    We fix this by converting the numbers to int and thus cutting off the
    values after the comma. With that, we under-report resources, which
    seems better than over-reporting them, because we could not fit
    everything in when over-reporting.
    
    Change-Id: I0d364f347afa235ed2b7e8ae90f5851275b7738e
    joker-at-work committed Nov 11, 2022
    Configuration menu
    Copy the full SHA
    48cefa6 View commit details
    Browse the repository at this point in the history
  2. bigvm: Replace rc_fields and scheduler.client imports

    rc_fields got removed with the placement-api removal from Nova and is
    replaced by its own package os_resource_classes.
    
    SchedulerClient got resolved, because it was only proxying functions
    through to SchedulerReportClient at some point in time, so we use
    SchedulerReportClient directly now.
    
    Change-Id: Iabeae6e01f9615be7c122d1e3fd719a1e53762d9
    joker-at-work committed Nov 11, 2022
    Configuration menu
    Copy the full SHA
    ffa931d View commit details
    Browse the repository at this point in the history
  3. Add target_has_no_project_id policy check

    Since we don't want to enable scoped tokens, yet, we still have policies
    relying on checking the owner like so "project_id:%(project_id)s". With
    all the patches to "Pass the actual target" landing since Rocky, there
    are multiple APIs that do not work with the old owner check anymore,
    because the patches change away from the old default behavior of passing
    the token's project_id. Instead, they pass an empty dict.
    
    Since we want to change to scoped tokens at some point in time, we add
    our own "target_has_no_project_id" so we can support both resources
    assigned to a project and generic requests e.g. for listing availability
    zones more easily by specifying "rule:owner or
    target_has_no_project_id:True".
    
    Change-Id: Ia41f120cdc5f9eaea9b119e15115033964113085
    joker-at-work committed Nov 11, 2022
    Configuration menu
    Copy the full SHA
    a4469a1 View commit details
    Browse the repository at this point in the history

Commits on Nov 17, 2022

  1. Revert "vmware: Report DISK_GB as integers"

    This reverts commit 48cefa6.
    
    The description given in the commit is not correct. We switched from
    Python 2 to Python 3 and thus got some changes in how "/" is
    interpreted. Previously, if there were only integers involved, the
    outcome would be an integer. With Python 3 the outcome is of type float
    and one needs to use // for integer division. We will accommodate for
    that in the next commit.
    joker-at-work committed Nov 17, 2022
    Configuration menu
    Copy the full SHA
    eda534b View commit details
    Browse the repository at this point in the history
  2. vmware: Integer division Python 2 -> 3 fix

    In Python 2, division with the / operator results in an integer-type
    result if divisor and dividend are integer-typed - if one of them is of
    type float, then the result will be a float.
    
    Python 3 made this more explicit by introducing the // operator which
    always results in an integer result while changing the / operator to
    always return floats.
    
    Some of our code relied on the old behavior and we need to update it to
    use // so we don't put floats to placement or the vCenter where they
    don't understand them.
    
    There's even some code that relied on the new behavior and newer worked
    before - we don't update it here.
    
    Change-Id: Ib81728cc8dcde852a035bfbbd380435ed06c56ba
    joker-at-work committed Nov 17, 2022
    Configuration menu
    Copy the full SHA
    73a36fa View commit details
    Browse the repository at this point in the history
  3. [scheduler] Only fetch instances when weighers/filters request any

    Only in a subset of the situations the filters or weighers
    are interested in the placement of very specific instances
    for the scheduling decision.
    
    But the host-manager fetches/holds all the instances for all
    the hosts, which at a sufficient scale occupies the scheduler
    fully with book-keeping of the instances.
    
    As a first step, return the instance-ids each filter/weigher
    is interested in, and skip on updating that information,
    if none is required.
    
    At a later step, the update can be limited to those instances.
    
    Change-Id: I3ea05f98e300bbf0e4b0b42ad334e86d34b21ab6
    fwiesel authored and joker-at-work committed Nov 17, 2022
    Configuration menu
    Copy the full SHA
    12b46de View commit details
    Browse the repository at this point in the history
  4. [scheduler] _get_host_states instance_uuids may be None

    The function is called from other code-paths as well,
    and we need to preserve the old semantics for use-cases
    besides the FilteringScheduler (such as the CachingScheduler)
    
    Change-Id: Id99e08fa5e833b197324ccf525a5fbcdfcce318a
    fwiesel authored and joker-at-work committed Nov 17, 2022
    Configuration menu
    Copy the full SHA
    1421c15 View commit details
    Browse the repository at this point in the history
  5. [scheduler] Fix host_info_requiring_instance_ids in AffinityFilter

    The refactoring was incomplete, still having the old function
    names instead of host_info_requiring_instance_ids
    
    Change-Id: Ibb69e15654ec6818a1bc920b1c8197f6a3c52080
    fwiesel authored and joker-at-work committed Nov 17, 2022
    Configuration menu
    Copy the full SHA
    2002293 View commit details
    Browse the repository at this point in the history
  6. [scheduler] Consolidate host_info/passes steps in filter & weigher

    Both host_info_requiring_instance_ids as well as host_passes/_weigh_object
    had duplicated code for extracting the instance-ids needed
    By consolidating them we reduce the code duplication.
    
    Change-Id: Icfc1d3e554ff0834dec35d52772996284dc0a5da
    fwiesel authored and joker-at-work committed Nov 17, 2022
    Configuration menu
    Copy the full SHA
    2a6be98 View commit details
    Browse the repository at this point in the history

Commits on Nov 23, 2022

  1. Fix pep8 check for "[scheduler] Only fetch instances when weighers/fi…

    …lters request any"
    
    Change-Id: I9b82a2f6367f8b77c9e1ca3296eced66b094c628
    joker-at-work committed Nov 23, 2022
    Configuration menu
    Copy the full SHA
    d74ad27 View commit details
    Browse the repository at this point in the history

Commits on Nov 24, 2022

  1. Log disabled/forced_down changes on services

    We want to know when a service was first activated and last deactivated.
    For this, we log a line every time a service is enabled/disabled. The
    log line can then be saved in long-term storage and looked up again.
    
    Change-Id: Ia904ac8108dd384d1675eba5250a38b77a5a8184
    joker-at-work committed Nov 24, 2022
    Configuration menu
    Copy the full SHA
    4f96b35 View commit details
    Browse the repository at this point in the history
  2. vmware: Pass VCState into VMwareVMOps

    We need the VCState inside the VMwareVMOps instance to access
    information about different hosts. We plan to use this for splitting
    server-groups based on the amount of available hosts in the cluster, but
    it can also be used for scheduling big VMs.
    
    Change-Id: I47edac9a81ef9a02cf07ab05e63edb9ed02d17b7
    joker-at-work committed Nov 24, 2022
    Configuration menu
    Copy the full SHA
    dddadec View commit details
    Browse the repository at this point in the history
  3. vmware: soft-anti-affinity can use multiple DRS rules

    Since VMware doesn't support the "mandatory" setting on VM-VM DRS rules,
    all rules we create are mandatory. This leads to VMs being unable to
    spawn in a cluster, if there are already as many VMs in the same
    soft-anti-affinity server-group as there are hosts in the cluster.
    Customers expect soft-anti-affinity to work also in this case.
    
    To accommodate for that, we now split members of soft-anti-affinity
    server-groups into multiple chunks. Each chunk has the size of available
    hosts in the cluster. For each chunk, we create an anti-affinity DRS
    rule.
    
    We try to make the members of each chunk stable by sorting the members
    in the server-group, but this commit probably still leads to more
    updates of DRS rules in those bigger server-groups.
    
    Since there can be rules in a no-longer-used chunk and since there are
    currently already rules with a different naming scheme (i.e. without the
    trailing number), we also fetch all rules of the same prefix and
    "update" them. Updating a rule without members leads to deletion of this
    rule.
    
    Change-Id: Id28fcc71193b491a1ac57e5c4f28c3b4862eeee5
    joker-at-work committed Nov 24, 2022
    Configuration menu
    Copy the full SHA
    0634209 View commit details
    Browse the repository at this point in the history

Commits on Nov 28, 2022

  1. vmware: Add get_datastore_ref_by_name helper

    This function iterates over all datastores in the vCenter trying to find
    a datastore with the given name.
    
    Might be a good candidate for downporting into oslo.vmware.
    
    Change-Id: I3fc7f171592c2cd21b765e0eb0218bf87d45a37c
    joker-at-work committed Nov 28, 2022
    Configuration menu
    Copy the full SHA
    4681253 View commit details
    Browse the repository at this point in the history
  2. vmware: Add get_vmx_path helper

    This function returns the path to a VM's .vmx file parsed into a
    DatastorePath object.
    
    While this method is small enough for inlining it into any code, that
    code is easier to unit-test with this function.
    
    Change-Id: If37768910803a9b456c0328a6904c2d53b96cccf
    joker-at-work committed Nov 28, 2022
    Configuration menu
    Copy the full SHA
    8c078b6 View commit details
    Browse the repository at this point in the history
  3. vmware: Support rescuing bfv instances

    This sets the supports_bfv_rescue capability for the vmwareapi driver
    and updates the rescue function to put the rescue disk next to the vmx
    file of the VM.
    
    While the previous code would have worked - at least on VMFS datastores,
    not sure about vVol - , it would have put the rescue disk onto the
    volume's datastore into the volume's directory. We don't want this as it
    skews Cinder's resource counting.
    
    Therefore, we now change the code to look up the path of the vmx file
    instead of taking the first vmdk's path and use its datastore and folder
    to place the rescue disk.
    
    Change-Id: Id707de9f273f618711dab8aa2e2a88dd8d942a6e
    joker-at-work committed Nov 28, 2022
    Configuration menu
    Copy the full SHA
    69f17dd View commit details
    Browse the repository at this point in the history

Commits on Nov 30, 2022

  1. Add more password generation options

    Some password policies require more than one occurence of symbols of one
    kind, or make restrictions about their occurence effectively requiring
    them to occure more often.
    
    By providing the configuration value 'password_all_group_samples' the
    administrator can increase the rounds to sample from all groups to
    adhere to such policies
    
    Often, password policies require not only ascii letters
    (upper/lower-case) and numbers, but also other printable characters in
    the password as a fourth symbol group.
    
    By makeing the symbol-classes the multi-string list
    'password_symbol_groups', the administrator can add those and
    also override the other classes, if desired.
    
    Change-Id: I5b995883a41f65296de86f3effa0102ecb12c1fa
    fwiesel authored and joker-at-work committed Nov 30, 2022
    Configuration menu
    Copy the full SHA
    0d3591c View commit details
    Browse the repository at this point in the history

Commits on Jan 26, 2023

  1. Add MKS VNC proxy for VMware

    Original code can be found at: https://opendev.org/x/nova-mksproxy
    
    Integrate the code into nova as a builtin command.
    
    Subclass `base.SecurityProxy` into `MksSecurityProxy`. Because MKS'
    `connect_info.internal_access_path` contains more than just a `path`
    (namely a JSON object with "ticket", "cfgFile" & "thumbprint") add a
    new `parse_internal_access_path()` method in
    `NovaProxyRequestHandler`. This method tries to parse
    `internal_acess_path` as JSON and if that fails, puts the contents in
    the "path" key of a new `internal_access_path_data` dict.
    
    Co-authored-by: Johannes Kulik <[email protected]>
    grandchild and joker-at-work committed Jan 26, 2023
    Configuration menu
    Copy the full SHA
    2d3da64 View commit details
    Browse the repository at this point in the history

Commits on Jan 27, 2023

  1. Configuration menu
    Copy the full SHA
    c52003a View commit details
    Browse the repository at this point in the history

Commits on Feb 1, 2023

  1. Configuration menu
    Copy the full SHA
    24ff3f7 View commit details
    Browse the repository at this point in the history
  2. mks-proxy: Fix entrypoint name

    Has to be "nova-{NAME}proxy".
    grandchild committed Feb 1, 2023
    Configuration menu
    Copy the full SHA
    90ed620 View commit details
    Browse the repository at this point in the history

Commits on Feb 3, 2023

  1. mks-proxy: REVERTME Add compat --verbose flag

    The old script had a --verbose flag that we are using in the k8s
    deployment definition for all regions. This will include both Xena
    and older Rocky deployments. The new name of this flag is
    --mks-verbose.
    
    Instead of adding an image version check in the helm-chart, add an
    alias for the flag instead, for the duration of the update, and revert
    this after the Xena upgrade is through everywhere.
    grandchild committed Feb 3, 2023
    Configuration menu
    Copy the full SHA
    83f9f4c View commit details
    Browse the repository at this point in the history

Commits on Feb 13, 2023

  1. vmware: Set preferHT on all CUSTOM_NUMASIZE flavors

    Physical CPUs are faster than hyperthreaded CPUs. So by default VMware
    spreads a VM's CPU cores onto physical cores, spanning multiple NUMA
    nodes if needed although the vCPUs would fit onto a smaller number of
    NUMA nodes.
    
    HANA VMs prefer low-latency (and thus NUMA-locality) over raw
    performance, so they set the VMware config `preferHT` to not get
    spread out. Before this setting was only applied if the VM's flavor
    qualified as a "Big VM" flavor (i.e. >1GiB) in size. This excluded
    smaller HANA flavors and a single-NUMA-node VM got spread over two
    nodes.
    
    Make `preferHT` depend on whether the flavor requires one of our
    `CUSTOM_NUMASIZE_*` traits instead.
    grandchild committed Feb 13, 2023
    Configuration menu
    Copy the full SHA
    620f38a View commit details
    Browse the repository at this point in the history
  2. vmware: Restart BigVMs with "high" priority

    Another best-practice recommendation for HANA-on-VMware.
    
    The code pretty much duplicates the one for setting the DRS partially-
    automated behavior. I decided against deduplication and abstraction in
    favor of readability and at the cost of LOC.
    grandchild committed Feb 13, 2023
    Configuration menu
    Copy the full SHA
    89c9596 View commit details
    Browse the repository at this point in the history
  3. vmware: Set maxPerVirtualNode for HANA flavors

    Testing showed that single- and half-numa-node VMs get spawned across
    multiple NUMA nodes due to `numa.autosize.vcpu.maxPerVirtualNode`
    being too low. Set the value explictly to the same value that's used
    for the `CoresPerSocket` cluster-vm setting.
    grandchild committed Feb 13, 2023
    Configuration menu
    Copy the full SHA
    0b77bb8 View commit details
    Browse the repository at this point in the history

Commits on Feb 17, 2023

  1. vmware: Fix update_cluster_das_vm_override

    Introduced in 'vmware: Restart BigVMs with "high" priority', the
    function "update_cluster_das_vm_override()" did not work, because it
    created a "ClusterDasVmConfigInfo" instead of a "ClusterDasVmConfigSpec"
    object. This lead to us not being able to spawn BigVMs with the
    following error:
    
        Exception in ReconfigureComputeResource_Task.
        Cause: Type not found: 'operation'
    
    Change-Id: If9acf9ee07e373b7b24c14c642d0d99fe2a41db1
    joker-at-work authored and grandchild committed Feb 17, 2023
    Configuration menu
    Copy the full SHA
    1d712df View commit details
    Browse the repository at this point in the history
  2. vmware: Add resource overcommit ratio support

    Copy the relevant code from the libvirt driver.
    
    This reintroduces the code that was changed (presumably by accident) in
      fd4f43b VmWare: Refactored resource and inventory collection
    Fixup that commit with this one when forward porting!
    grandchild committed Feb 17, 2023
    Configuration menu
    Copy the full SHA
    3f75d35 View commit details
    Browse the repository at this point in the history

Commits on Feb 21, 2023

  1. vmware: Fix missing image_cache conf group usage

    `remove_unused_original_minimum_age_seconds` along with several other
    config options related to image cache management were moved to their
    own config group `image_cache` in Ussuri (nova 21), but the cherry-
    pick in
        9de18bd "vmware: Update expired image-template-VM handling"
    failed to account for this.
    grandchild committed Feb 21, 2023
    Configuration menu
    Copy the full SHA
    37fb054 View commit details
    Browse the repository at this point in the history
  2. vmware: Remove get_vif_info() is_neutron param usage

    Was removed in
        f3cc311 "vmware: Remove vestigial nova-network support"
    but not taken into account by
        aba0dbb "VMware: Image as VM template".
    grandchild committed Feb 21, 2023
    Configuration menu
    Copy the full SHA
    81f4adb View commit details
    Browse the repository at this point in the history

Commits on Mar 2, 2023

  1. limits: Fix per-flavor limit API value type

    In
        daeeafd "baremetal quota handling"
    the `_build_per_flavor_limits()` method is incorrectly
    rewritten to match surrounding code, esp. `_build_absolute_limits()`,
    and uses not only the limit, but limit and in_use as value for the
    resulting limit dict.
    
    The original commit,
        16857ed "reintroduce baremetal quota handling"
    used `absolute_limits` directly as method parameter.
    grandchild committed Mar 2, 2023
    Configuration menu
    Copy the full SHA
    32f88ea View commit details
    Browse the repository at this point in the history

Commits on Mar 3, 2023

  1. [vmwareapi] Look up vmdk volume by uuid directly first

    For historic reasons, there is a mapping between
    the volume-uuid and the device[].backing.uuid in vmware
    through extraConfig.
    In most cases they are the same, but sometimes they
    are not.
    And apparently, the extraProperties can even hold
    wrong information.
    
    Since the risk of a collision of uuids is quite low,
    and lookup is the iteration over a small list of volumes
    we first try it that way, avoiding possibly incorrect
    information in the extraConfig.
    Failing that, we still do the additional api call
    to get the mapping, and try that.
    
    Work around type errors mypy raises in the changed file
    
    Change-Id: Ifcdf96cfc6d00473299c1f2a7cb9d23d03294027
    fwiesel committed Mar 3, 2023
    Configuration menu
    Copy the full SHA
    5fcb1a9 View commit details
    Browse the repository at this point in the history

Commits on Mar 10, 2023

  1. [ironic] IPA imports VMDKs via qemu-img just fine

    The ironic-python-agent uses qemu-img to convert
    various image formats to the raw disk format.
    QCow2 is already correctly marked accordingly,
    but vmdk was missing
    
    Change-Id: Ifd868e6951452b291184bd848d4244d37649f1ed
    fwiesel committed Mar 10, 2023
    Configuration menu
    Copy the full SHA
    41a2827 View commit details
    Browse the repository at this point in the history

Commits on Mar 15, 2023

  1. Fix linting with tox4

    We need to seperate the arguments for tox
    from the ones to the linter
    
    Change-Id: I6d7ee1d95f0ca17fa6c04a3546a7f907cc6393f1
    fwiesel committed Mar 15, 2023
    Configuration menu
    Copy the full SHA
    e1f3e4c View commit details
    Browse the repository at this point in the history

Commits on Mar 16, 2023

  1. vmware: Fix AttrError when setting restart-priority

    Nested `ClusterDasVmSettings` inside new `ClusterDasVmConfigInfo`
    objects don't get created properly. This seems to have worked on Rocky
    but fails on Xena.
    grandchild committed Mar 16, 2023
    Configuration menu
    Copy the full SHA
    a9435f4 View commit details
    Browse the repository at this point in the history
  2. vmware: Read iso file in binary mode during upload

    Otherwise will try and decode as UTF-8-encoded text during read().
    grandchild committed Mar 16, 2023
    Configuration menu
    Copy the full SHA
    686b516 View commit details
    Browse the repository at this point in the history

Commits on Mar 17, 2023

  1. [SAP] No custom filters/weighers for Non-Vmware

    We have added custom filters and weighers for our common
    use-case VMware, and excluded there already baremetal.
    
    For libvirt as additional hypervisor, we also want
    to skip those filters/weighers, so we test now
    if the request is scheduled for vmware either
    due to a flavor extra_spec or an image property
    If nothing is set, we default to vmware for backwards
    compatibility
    
    Change-Id: I3223aee433ba9009d2cd6387eeda8e70dbbb3cde
    fwiesel committed Mar 17, 2023
    Configuration menu
    Copy the full SHA
    aa99d4b View commit details
    Browse the repository at this point in the history

Commits on Mar 21, 2023

  1. [SAP] Workaround libvirt version mismatch

    The code assumes that the libvirt version reported
    by the hypervisor matches the version of the python library installed.
    Since we run the agent in a container, that is not correct,
    and the attribute will not be in the library as it has been
    build for an older version.
    
    Change-Id: I156047b1f9829a49b429d51ca7f7777606b10a56
    fwiesel committed Mar 21, 2023
    Configuration menu
    Copy the full SHA
    a7b1957 View commit details
    Browse the repository at this point in the history
  2. Use pre-commit from openstack/nova master

    Change-Id: I3edf0d3c6deb8385adc1095c7e3c09526e2d4d65
    fwiesel committed Mar 21, 2023
    Configuration menu
    Copy the full SHA
    f4b04c6 View commit details
    Browse the repository at this point in the history

Commits on Mar 27, 2023

  1. nova-manage: add sync_instances_flavor command

    This command detects changes in the flavors and syncs the
    instances flavor-info with the new changes.
    
    `nova-manage sap sync_instances_flavor`
    leust authored and joker-at-work committed Mar 27, 2023
    Configuration menu
    Copy the full SHA
    d314bd1 View commit details
    Browse the repository at this point in the history

Commits on Mar 31, 2023

  1. vmware: Do not reconfigure vm without nics in finsh_migration

    The vcenter doesn't like a reconfiguration call with an empty spec, and
    raises an error. So we skip that and save ourselves from doing some work
    on top.
    
    Change-Id: I1da3309500a2cd384c5d7cd431e71297ef5d5a3a
    fwiesel committed Mar 31, 2023
    Configuration menu
    Copy the full SHA
    63be3bc View commit details
    Browse the repository at this point in the history
  2. VmWare: Expose the hypervisor

    You can configure now
    [vmware]
    hypervisor_mode=<cluster,esxi_to_cluster,cluster_to_esxi>
    
    to chose to expose the individual esxi hosts instead of the cluster.
    It will only show the esxi-host, as hypervisor node, but will not
    honour the node parameter for placeing the instance for instance-creation,
    migration, etc...
    
    Change-Id: I264fd242c0de01ae8442c03bc726a0abfbe176ef
    fwiesel committed Mar 31, 2023
    Configuration menu
    Copy the full SHA
    f130dc1 View commit details
    Browse the repository at this point in the history
  3. Host-Api: Create summary over all nodes

    The code assumes that there is a single compute-node per host,
    which is incorrect for ironic. By summarising over all hosts,
    we get a correct response also for compute-hosts with more than
    one compute-node.
    
    Change-Id: Iaf6e2a72f6649e234660de47bec8b1da1ea1571e
    fwiesel committed Mar 31, 2023
    Configuration menu
    Copy the full SHA
    30fc702 View commit details
    Browse the repository at this point in the history