allow instance_reconfigure to resize a stopped instance #6774

iximeow · 2024-10-04T17:45:12Z

mostly just adapted #6321 to the new instance update endpoint machinery. same general behavior as outlined there: can only resize stopped instances, updated instance still returns the updated InstanceAndActiveVmm. same size validation as we do for a new instance, tests checking all that as well.

Also fixes #3769.

Co-authored-by: Greg Colombo <[email protected]>

hawkw · 2024-10-04T18:12:56Z

nexus/db-queries/src/db/datastore/instance.rs

+ // Set vCPUs and memory size.
+ self.instance_set_size_on_conn(
+ &conn,
+ &err,
+ authz_instance,
+ ncpus,
+ memory,
+ )
+ .await?;


Calling instance_set_size_on_conn will unconditionally check that the instance is in a state in which it can be resized. This means that an InstanceUpdate that doesn't change the number of vCPUs or the amount of memory will always check whether the instance is in a resizeable state, and fail if it is not --- meaning that we can no longer set auto-restart policies for running instances. I think we need to check whether the instance's memory/vCPUs are already set to the requested values, and not call this if not.

I believe that the test that attempts to change the auto-restart policy for a running instance will fail due to this, BTW. So there's already a way to check if you're doing the right thing. :)

oh yes, that's absolutely right. i've certainly broken that test in the process, even.

hawkw · 2024-10-04T19:29:47Z

nexus/db-queries/src/db/datastore/instance.rs

+ let maybe_old_sizes = instance_dsl::instance
+ .filter(instance_dsl::id.eq(authz_instance.id()))
+ .filter(instance_dsl::time_deleted.is_null())
+ .select((instance_dsl::ncpus, instance_dsl::memory))
+ .first_async::<(InstanceCpuCount, ByteCount)>(conn)
+ .await;


hmm, we are now fetching fields from the instance record in a bunch of different places in the instance_reconfigure method...perhaps it would be nicer if we just did one instance_fetch query and then passed an &Instance into both this function and instance_set_boot_disk to perform the validation? Not sure how big a deal this is, though.

Alternatively, it occurs to me that we could just fold this into the update query, by adding a

.filter(instance_dsl::ncpus.ne(ncpus).or(instance_dsl::memory.ne(memory))

clause to make the update conditional on there actually being a change?

We would then have to handle that case in NotUpdatedButExists as well as the "instance is in invalid state" case, but I think that's probably fine and might be a bit more efficient than doing a query, checking the results in Nexus, and then doing another query.

ah, i'd thought about that and written it off because i'd also been thinking about the caller checking instance state as well, but Instance has .state as well, so the state check can stay in these helpers while &Instance-to-be-updated comes from the caller. and i wasn't enthused about loading the instance again in sic_set_boot_disk, but we already have that in the saga parameters!

i'll give that a try, especially because it means we don't even have to do these prepatory queries over and over. adding .filter for the fields not matching would be nice but, yeah, that's extra queries.

using the Instance as retrieved from the saga is out of data as soon as sic_set_boot_disk runs. so, either sic_set_boot_disk updates the instance record (shadow copy of a database row.. not great) or sic_set_boot_disk_undo tries to run from an instance whose boot_disk is None (from before it was assigned), fails to detach a boot disk if one was set, and gets wedged. surmountable, i think, but really gross to do. so, more careful .filter() it is!

even though update is probably going to be relatively infrequent, i'd like to make this not awful to read :) i'll see how it goes.

.filter() trip report: you might think the boot disk filter stanza would look something like:

.filter(instance_dsl::boot_disk_id.ne(boot_disk_id))

but NULL compares false with everything, so this skips updating the boot disk if either of the prior or new boot_disk_id are None. okay. so the real precise filter would be more like:

.filter( instance_dsl::boot_disk_id.ne(boot_disk_id) .or(instance_dsl::boot_disk_id.is_null().and(boot_disk_id.is_some())) .or(instance_dsl::boot_disk_id.is_not_null().and(boot_disk_id.is_none())) )

but this starts not working for "the types are not what diesel wants to see" reasons.

a third thing we can do, and the commit i'm actually going to push does this, is just not .filter for the boot_disk_id case. then we're still going to end up in NotUpdatedButExists if we have not updated the record, so it does all actually work.

unfortunately that's extra work if you're not actually intending to update boot_disk_id. it just seems... least bad...?

karencfv · 2024-10-04T19:39:21Z

Heya! Can we hold off on merging this until v12 please? It may break the TF provider which I've already released for v11 😅
and I'll be out for 2 weeks

allow instance_reconfigure to resize a stopped instance

0159e95

Co-authored-by: Greg Colombo <[email protected]>

iximeow mentioned this pull request Oct 4, 2024

add external API to resize a stopped instance #6321

Closed

i cannot add four lines without doing a cargo-fmt violation

60afb8c

hawkw self-requested a review October 4, 2024 18:05

dont forget to commit the updated openapi document

fd7e5a1

hawkw requested changes Oct 4, 2024

View reviewed changes

david-crespo mentioned this pull request Oct 4, 2024

Instance resize oxidecomputer/console#2487

Draft

if a size update would be no change, succeed early

e76814f

hawkw reviewed Oct 4, 2024

View reviewed changes

iximeow added this to the 12 milestone Oct 4, 2024

taking &Instance is annoying, use more filter

57087c4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

allow instance_reconfigure to resize a stopped instance #6774

allow instance_reconfigure to resize a stopped instance #6774

iximeow commented Oct 4, 2024

hawkw Oct 4, 2024

hawkw Oct 4, 2024

iximeow Oct 4, 2024

hawkw Oct 4, 2024

hawkw Oct 4, 2024

iximeow Oct 4, 2024 •

edited

Loading

karencfv commented Oct 4, 2024

allow instance_reconfigure to resize a stopped instance #6774

Are you sure you want to change the base?

allow instance_reconfigure to resize a stopped instance #6774

Conversation

iximeow commented Oct 4, 2024

hawkw Oct 4, 2024

Choose a reason for hiding this comment

hawkw Oct 4, 2024

Choose a reason for hiding this comment

iximeow Oct 4, 2024

Choose a reason for hiding this comment

hawkw Oct 4, 2024

Choose a reason for hiding this comment

hawkw Oct 4, 2024

Choose a reason for hiding this comment

iximeow Oct 4, 2024 • edited Loading

Choose a reason for hiding this comment

karencfv commented Oct 4, 2024

iximeow Oct 4, 2024 •

edited

Loading