Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update virtual provisioning counters on instance stop/start #4277

Merged
merged 3 commits into from
Oct 20, 2023
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
54 changes: 47 additions & 7 deletions nexus/src/app/instance.rs
Original file line number Diff line number Diff line change
Expand Up @@ -1284,6 +1284,14 @@ impl super::Nexus {
"propolis_id" => %propolis_id,
"vmm_state" => ?new_runtime_state.vmm_state);

// Grab the current state of the instance in the DB to reason about
// whether this update is stale or not.
let (.., authz_instance, db_instance) =
LookupPath::new(&opctx, &self.db_datastore)
.instance_id(*instance_id)
.fetch()
.await?;

// Update OPTE and Dendrite if the instance's active sled assignment
// changed or a migration was retired. If these actions fail, sled agent
// is expected to retry this update.
Expand All @@ -1297,12 +1305,6 @@ impl super::Nexus {
//
// In the future, this should be replaced by a call to trigger a
// networking state update RPW.
let (.., authz_instance, db_instance) =
LookupPath::new(&opctx, &self.db_datastore)
.instance_id(*instance_id)
.fetch()
.await?;

self.ensure_updated_instance_network_config(
opctx,
&authz_instance,
Expand All @@ -1311,6 +1313,31 @@ impl super::Nexus {
)
.await?;

// If the supplied instance state is at least as new as what's currently
// in the database, and it indicates the instance has no active VMM, the
// instance has been stopped and should have its virtual provisioning
// charges released.
//
// As with updating networking state, this must be done before
// committing the new runtime state to the database: once the DB is
// written, a new start saga can arrive and start the instance, which
// will try to create its own virtual provisioning charges, which will
// race with this operation.
if new_runtime_state.instance_state.propolis_id.is_none()
&& new_runtime_state.instance_state.gen
>= db_instance.runtime().gen.0
{
self.db_datastore
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check is made based off of new_runtime_state (from the sled agent) as well as db_instance.runtime() (from the DB), but both are cached -- couldn't both be out-of-date?

For example:

  1. External API: Create instance, start it. Begin the call to "stop", which should send a request to stop down to the sled.
  2. Sled Agent -> Nexus: Calls this function, indicating the instance should be stopped. Let's suppose sled agent thinks gen == N, the db thinks gen == N. But maybe calling ensure_updated_instance_network_config yields for a really long time.
  3. Sled Agent -> Nexus: Calls this function again, from a retry-loop, but this time it succeeds. The instance is now stopped.
  4. External API: Start the instance again. This re-provisions the resources used by the instance...
  5. ... and, while the instance is running, the request from (2) comes back. It'll see a state of the world from before (3) and (4), so it'll call the virtual_provisioning_collection_delete_instance function. Now the accounting is wrong! This is a running instance, but we would have deleted the resources it's currently using.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is related to what I was touching on here: #4194 (comment)

If we're associating these resources with VMMs (AKA, "what is actually running") rather than Instances (the opaque thing that exists even without a propolis), should we alter the calls to virtual_provisioning_collection_{create, delete}_instance to act on VMM UUIDs instead, rather than instance UUIDs?

That way, it wouldn't be possible to do a "double delete" -- once a VMM is gone, the UUID for it should never be re-used/re-added, so repeated/delayed requests to delete should have no further effect after the first successful call.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. Let me think about this one a little more--the race you've identified is definitely a problem. I'll mull it over to see if there's a good way to resolve it without dipping into transaction-land (such that we could say "only proceed with the deletion if the instance generation number is less than the one I think I'm trying to apply" or suchlike).

It's true that VMMs don't share that problem because they can't come back from a terminal state, but I want to be careful about charging counters on a per-VMM basis if those counters are going to be used for quota management purposes (which AIUI is part of the discussion in RFD 427). For example, suppose I have a quota of 32 vCPUs on my project; I start an instance with 16 vCPUs; the rack operator starts an upgrade that migrates my instance; then I try to start a second instance with 16 vCPUs; if we track per-VMM, this will fail because my quota's exhausted--but what do you mean it's exhausted? I only have the one running instance! (cc @askfongjojo to help weigh in on this part; if we're OK with charging on a per-VMM basis even though this can cause transient double-counting, then switching to VMM IDs is a much simpler option here.)

Copy link
Contributor Author

@gjcolombo gjcolombo Oct 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 16b7fe6 (or so I think...) by putting a sub-select into the instance provisioning counter deletion query that makes it apply only if the caller has furnished a sufficiently new generation number.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Transient double-counting is probably fine if it's indeed transient (i.e. seconds, not minutes). But if we foresee failure modes that can cause in-progress migration to double count for an extended period of time, it'll be better for the system rather than the user to "absorb" the migration-in-transit usage.

.virtual_provisioning_collection_delete_instance(
opctx,
*instance_id,
db_instance.project_id,
i64::from(db_instance.ncpus.0 .0),
db_instance.memory,
)
.await?;
}

// Write the new instance and VMM states back to CRDB. This needs to be
// done before trying to clean up the VMM, since the datastore will only
// allow a VMM to be marked as deleted if it is already in a terminal
Expand All @@ -1331,7 +1358,20 @@ impl super::Nexus {

// If the VMM is now in a terminal state, make sure its resources get
// cleaned up.
if let Ok((_, true)) = result {
//
// For idempotency, only check to see if the update was successfully
// processed and ignore whether the VMM record was actually updated.
// This is required to handle the case where this routine is called
// once, writes the terminal VMM state, fails before all per-VMM
// resources are released, returns a retriable error, and is retried:
// the per-VMM resources still need to be cleaned up, but the DB update
// will return Ok(_, false) because the database was already updated.
//
// Unlike the pre-update cases, it is legal to do this cleanup *after*
// committing state to the database, because a terminated VMM cannot be
// reused (restarting or migrating its former instance will use new VMM
// IDs).
if result.is_ok() {
let propolis_terminated = matches!(
new_runtime_state.vmm_state.state,
InstanceState::Destroyed | InstanceState::Failed
Expand Down
56 changes: 0 additions & 56 deletions nexus/src/app/sagas/instance_create.rs
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,6 @@ use crate::external_api::params;
use nexus_db_model::NetworkInterfaceKind;
use nexus_db_queries::db::identity::Resource;
use nexus_db_queries::db::lookup::LookupPath;
use nexus_db_queries::db::model::ByteCount as DbByteCount;
use nexus_db_queries::db::queries::network_interface::InsertError as InsertNicError;
use nexus_db_queries::{authn, authz, db};
use nexus_defaults::DEFAULT_PRIMARY_NIC_NAME;
Expand Down Expand Up @@ -75,10 +74,6 @@ struct DiskAttachParams {

declare_saga_actions! {
instance_create;
VIRTUAL_RESOURCES_ACCOUNT -> "no_result" {
+ sic_account_virtual_resources
- sic_account_virtual_resources_undo
}
CREATE_INSTANCE_RECORD -> "instance_record" {
+ sic_create_instance_record
- sic_delete_instance_record
Expand Down Expand Up @@ -131,7 +126,6 @@ impl NexusSaga for SagaInstanceCreate {
})?,
));

builder.append(virtual_resources_account_action());
builder.append(create_instance_record_action());

// Helper function for appending subsagas to our parent saga.
Expand Down Expand Up @@ -728,56 +722,6 @@ async fn ensure_instance_disk_attach_state(
Ok(())
}

async fn sic_account_virtual_resources(
sagactx: NexusActionContext,
) -> Result<(), ActionError> {
let osagactx = sagactx.user_data();
let params = sagactx.saga_params::<Params>()?;
let instance_id = sagactx.lookup::<Uuid>("instance_id")?;

let opctx = crate::context::op_context_for_saga_action(
&sagactx,
&params.serialized_authn,
);
osagactx
.datastore()
.virtual_provisioning_collection_insert_instance(
&opctx,
instance_id,
params.project_id,
i64::from(params.create_params.ncpus.0),
DbByteCount(params.create_params.memory),
)
.await
.map_err(ActionError::action_failed)?;
Ok(())
}

async fn sic_account_virtual_resources_undo(
sagactx: NexusActionContext,
) -> Result<(), anyhow::Error> {
let osagactx = sagactx.user_data();
let params = sagactx.saga_params::<Params>()?;
let instance_id = sagactx.lookup::<Uuid>("instance_id")?;

let opctx = crate::context::op_context_for_saga_action(
&sagactx,
&params.serialized_authn,
);
osagactx
.datastore()
.virtual_provisioning_collection_delete_instance(
&opctx,
instance_id,
params.project_id,
i64::from(params.create_params.ncpus.0),
DbByteCount(params.create_params.memory),
)
.await
.map_err(ActionError::action_failed)?;
Ok(())
}

async fn sic_create_instance_record(
sagactx: NexusActionContext,
) -> Result<db::model::Instance, ActionError> {
Expand Down
29 changes: 0 additions & 29 deletions nexus/src/app/sagas/instance_delete.rs
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,6 @@ use super::NexusActionContext;
use super::NexusSaga;
use crate::app::sagas::declare_saga_actions;
use nexus_db_queries::{authn, authz, db};
use nexus_types::identity::Resource;
use omicron_common::api::external::{Error, ResourceType};
use omicron_common::api::internal::shared::SwitchLocation;
use serde::Deserialize;
Expand Down Expand Up @@ -40,9 +39,6 @@ declare_saga_actions! {
DEALLOCATE_EXTERNAL_IP -> "no_result3" {
+ sid_deallocate_external_ip
}
VIRTUAL_RESOURCES_ACCOUNT -> "no_result4" {
+ sid_account_virtual_resources
}
}

// instance delete saga: definition
Expand All @@ -64,7 +60,6 @@ impl NexusSaga for SagaInstanceDelete {
builder.append(instance_delete_record_action());
builder.append(delete_network_interfaces_action());
builder.append(deallocate_external_ip_action());
builder.append(virtual_resources_account_action());
Ok(builder.build()?)
}
}
Expand Down Expand Up @@ -135,30 +130,6 @@ async fn sid_deallocate_external_ip(
Ok(())
}

async fn sid_account_virtual_resources(
sagactx: NexusActionContext,
) -> Result<(), ActionError> {
let osagactx = sagactx.user_data();
let params = sagactx.saga_params::<Params>()?;
let opctx = crate::context::op_context_for_saga_action(
&sagactx,
&params.serialized_authn,
);

osagactx
.datastore()
.virtual_provisioning_collection_delete_instance(
&opctx,
params.instance.id(),
params.instance.project_id,
i64::from(params.instance.ncpus.0 .0),
params.instance.memory,
)
.await
.map_err(ActionError::action_failed)?;
Ok(())
}

#[cfg(test)]
mod test {
use crate::{
Expand Down
56 changes: 56 additions & 0 deletions nexus/src/app/sagas/instance_start.rs
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,11 @@ declare_saga_actions! {
+ sis_alloc_propolis_ip
}

ADD_VIRTUAL_RESOURCES -> "virtual_resources" {
+ sis_account_virtual_resources
- sis_account_virtual_resources_undo
}

CREATE_VMM_RECORD -> "vmm_record" {
+ sis_create_vmm_record
- sis_destroy_vmm_record
Expand Down Expand Up @@ -96,6 +101,7 @@ impl NexusSaga for SagaInstanceStart {

builder.append(alloc_server_action());
builder.append(alloc_propolis_ip_action());
builder.append(add_virtual_resources_action());
builder.append(create_vmm_record_action());
builder.append(mark_as_starting_action());
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This ordering is wrong. The saga should only add virtual resources once it's successfully marked the instance as starting, i.e., it should get through the "only one start can proceed at a time" interlock before charging for anything. Will fix in the next push.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 16b7fe6.

builder.append(dpd_ensure_action());
Expand Down Expand Up @@ -149,6 +155,56 @@ async fn sis_alloc_propolis_ip(
allocate_sled_ipv6(&opctx, sagactx.user_data().datastore(), sled_uuid).await
}

async fn sis_account_virtual_resources(
sagactx: NexusActionContext,
) -> Result<(), ActionError> {
let osagactx = sagactx.user_data();
let params = sagactx.saga_params::<Params>()?;
let instance_id = params.db_instance.id();

let opctx = crate::context::op_context_for_saga_action(
&sagactx,
&params.serialized_authn,
);
osagactx
.datastore()
.virtual_provisioning_collection_insert_instance(
&opctx,
instance_id,
params.db_instance.project_id,
i64::from(params.db_instance.ncpus.0 .0),
nexus_db_model::ByteCount(*params.db_instance.memory),
)
.await
.map_err(ActionError::action_failed)?;
Ok(())
}

async fn sis_account_virtual_resources_undo(
sagactx: NexusActionContext,
) -> Result<(), anyhow::Error> {
let osagactx = sagactx.user_data();
let params = sagactx.saga_params::<Params>()?;
let instance_id = params.db_instance.id();

let opctx = crate::context::op_context_for_saga_action(
&sagactx,
&params.serialized_authn,
);
osagactx
.datastore()
.virtual_provisioning_collection_delete_instance(
&opctx,
instance_id,
params.db_instance.project_id,
i64::from(params.db_instance.ncpus.0 .0),
nexus_db_model::ByteCount(*params.db_instance.memory),
)
.await
.map_err(ActionError::action_failed)?;
Ok(())
}

async fn sis_create_vmm_record(
sagactx: NexusActionContext,
) -> Result<db::model::Vmm, ActionError> {
Expand Down
Loading