Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allow instance_reconfigure to resize a stopped instance #6774

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

iximeow
Copy link
Member

@iximeow iximeow commented Oct 4, 2024

mostly just adapted #6321 to the new instance update endpoint machinery. same general behavior as outlined there: can only resize stopped instances, updated instance still returns the updated InstanceAndActiveVmm. same size validation as we do for a new instance, tests checking all that as well.

Also fixes #3769.

@hawkw hawkw self-requested a review October 4, 2024 18:05
Comment on lines 1060 to 1068
// Set vCPUs and memory size.
self.instance_set_size_on_conn(
&conn,
&err,
authz_instance,
ncpus,
memory,
)
.await?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling instance_set_size_on_conn will unconditionally check that the instance is in a state in which it can be resized. This means that an InstanceUpdate that doesn't change the number of vCPUs or the amount of memory will always check whether the instance is in a resizeable state, and fail if it is not --- meaning that we can no longer set auto-restart policies for running instances. I think we need to check whether the instance's memory/vCPUs are already set to the requested values, and not call this if not.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that the test that attempts to change the auto-restart policy for a running instance will fail due to this, BTW. So there's already a way to check if you're doing the right thing. :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh yes, that's absolutely right. i've certainly broken that test in the process, even.

Comment on lines 1255 to 1260
let maybe_old_sizes = instance_dsl::instance
.filter(instance_dsl::id.eq(authz_instance.id()))
.filter(instance_dsl::time_deleted.is_null())
.select((instance_dsl::ncpus, instance_dsl::memory))
.first_async::<(InstanceCpuCount, ByteCount)>(conn)
.await;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, we are now fetching fields from the instance record in a bunch of different places in the instance_reconfigure method...perhaps it would be nicer if we just did one instance_fetch query and then passed an &Instance into both this function and instance_set_boot_disk to perform the validation? Not sure how big a deal this is, though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, it occurs to me that we could just fold this into the update query, by adding a

.filter(instance_dsl::ncpus.ne(ncpus).or(instance_dsl::memory.ne(memory))

clause to make the update conditional on there actually being a change?

We would then have to handle that case in NotUpdatedButExists as well as the "instance is in invalid state" case, but I think that's probably fine and might be a bit more efficient than doing a query, checking the results in Nexus, and then doing another query.

Copy link
Member Author

@iximeow iximeow Oct 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, i'd thought about that and written it off because i'd also been thinking about the caller checking instance state as well, but Instance has .state as well, so the state check can stay in these helpers while &Instance-to-be-updated comes from the caller. and i wasn't enthused about loading the instance again in sic_set_boot_disk, but we already have that in the saga parameters!

i'll give that a try, especially because it means we don't even have to do these prepatory queries over and over. adding .filter for the fields not matching would be nice but, yeah, that's extra queries.

using the Instance as retrieved from the saga is out of data as soon as sic_set_boot_disk runs. so, either sic_set_boot_disk updates the instance record (shadow copy of a database row.. not great) or sic_set_boot_disk_undo tries to run from an instance whose boot_disk is None (from before it was assigned), fails to detach a boot disk if one was set, and gets wedged. surmountable, i think, but really gross to do. so, more careful .filter() it is!

even though update is probably going to be relatively infrequent, i'd like to make this not awful to read :) i'll see how it goes.


.filter() trip report: you might think the boot disk filter stanza would look something like:

.filter(instance_dsl::boot_disk_id.ne(boot_disk_id))

but NULL compares false with everything, so this skips updating the boot disk if either of the prior or new boot_disk_id are None. okay. so the real precise filter would be more like:

.filter(
    instance_dsl::boot_disk_id.ne(boot_disk_id)
        .or(instance_dsl::boot_disk_id.is_null().and(boot_disk_id.is_some()))
        .or(instance_dsl::boot_disk_id.is_not_null().and(boot_disk_id.is_none()))
)

but this starts not working for "the types are not what diesel wants to see" reasons.

a third thing we can do, and the commit i'm actually going to push does this, is just not .filter for the boot_disk_id case. then we're still going to end up in NotUpdatedButExists if we have not updated the record, so it does all actually work.

unfortunately that's extra work if you're not actually intending to update boot_disk_id. it just seems... least bad...?

@karencfv
Copy link
Contributor

karencfv commented Oct 4, 2024

Heya! Can we hold off on merging this until v12 please? It may break the TF provider which I've already released for v11 😅
and I'll be out for 2 weeks

@iximeow iximeow added this to the 12 milestone Oct 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Want ability to resize the vcpu and memory of stopped instances
3 participants