From 9b595e985721f8ab83d13c4dc4f257cbf8ac525c Mon Sep 17 00:00:00 2001 From: Eliza Weisman Date: Fri, 9 Aug 2024 15:19:54 -0700 Subject: [PATCH] Perform instance state transitions in `instance-update` saga (#5749) A number of bugs relating to guest instance lifecycle management have been observed. These include: - Instances getting "stuck" in a transient state, such as `Starting` or `Stopping`, with no way to forcibly terminate them (#4004) - Race conditions between instances starting and receiving state updates, which cause provisioning counters to underflow (#5042) - Instances entering and exiting the `Failed` state when nothing is actually wrong with them, potentially leaking virtual resources (#4226) These typically require support intervention to resolve. Broadly , these issues exist because the control plane's current mechanisms for understanding and managing an instance's lifecycle state machine are "kind of a mess". In particular: - **(Conceptual) ownership of the CRDB `instance` record is currently split between Nexus and sled-agent(s).** Although Nexus is the only entity that actually reads or writes to the database, the instance's runtime state is also modified by the sled-agents that manage its active Propolis (and, if it's migrating, it's target Propolis), and written to CRDB on their behalf by Nexus. This means that there are multiple copies of the instance's state in different places at the same time, which can potentially get out of sync. When an instance is migrating, its state is updated by two different sled-agents, and they may potentially generate state updates that conflict with each other. And, splitting the responsibility between Nexus and sled-agent makes the code more complex and harder to understand: there is no one place where all instance state machine transitions are performed. - **Nexus doesn't ensure that instance state updates are processed reliably.** Instance state transitions triggered by user actions, such as `instance-start` and `instance-delete`, are performed by distributed sagas, ensuring that they run to completion even if the Nexus instance executing them comes to an untimely end. This is *not* the case for operations that result from instance state transitions reported by sled-agents, which just happen in the HTTP APIs for reporting instance states. If the Nexus processing such a transition crashes, gets network partition'd, or encountering a transient error, the instance is left in an incomplete state and the remainder of the operation will not be performed. This branch rewrites much of the control plane's instance state management subsystem to resolve these issues. At a high level, it makes the following high-level changes: - **Nexus is now the sole owner of the `instance` record.** Sled-agents no longer have their own copies of an instance's `InstanceRuntimeState`, and do not generate changes to that state when reporting instance observations to Nexus. Instead, the sled-agent only publishes updates to the `vmm` and `migration` records (which are never modified by Nexus directly) and Nexus is the only entity responsible for determining how an instance's state should change in response to a VMM or migration state update. - **When an instance has an active VMM, its effective external state is determined primarily by the active `vmm` record**, so that fewer state transitions *require* changes to the `instance` record. PR #5854 laid the ground work for this change, but it's relevant here as well. - **All updates to an `instance` record (and resources conceptually owned by that instance) are performed by a distributed saga.** I've introduced a new `instance-update` saga, which is responsible for performing all changes to the `instance` record, virtual provisioning resources, and instance network config that are performed as part of a state transition. Moving this to a saga helps us to ensure that these operations are always run to completion, even in the event of a sudden Nexus death. - **Consistency of instance state changes is ensured by distributed locking.** State changes may be published by multiple sled-agents to different Nexus replicas. If one Nexus replica is processing a state change received from a sled-agent, and then the instance's state changes again, and the sled-agent publishes that state change to a *different* Nexus...lots of bad things can happen, since the second state change may be performed from the previous initial state, when it *should* have a "happens-after" relationship with the other state transition. And, some operations may contradict each other when performed concurrently. To prevent these race conditions, this PR has the dubious honor of using the first _distributed lock_ in the Oxide control plane, the "instance updater lock". I introduced the locking primitives in PR #5831 --- see that branch for more discussion of locking. - **Background tasks are added to prevent missed updates**. To ensure we cannot accidentally miss an instance update even if a Nexus dies, hits a network partition, or just chooses to eat the state update accidentally, we add a new `instance-updater` background task, which queries the database for instances that are in states that require an update saga without such a saga running, and starts the requisite sagas. Currently, the instance update saga runs in the following cases: - An instance's active VMM transitions to `Destroyed`, in which case the instance's virtual resources are cleaned up and the active VMM is unlinked. - Either side of an instance's live migration reports that the migration has completed successfully. - Either side of an instance's live migration reports that the migration has failed. The inner workings of the instance-update saga itself is fairly complex, and has some kind of interesting idiosyncrasies relative to the existing sagas. I've written up a [lengthy comment] that provides an overview of the theory behind the design of the saga and its principles of operation, so I won't reproduce that in this commit message. [lengthy comment]: https://github.com/oxidecomputer/omicron/blob/357f29c8b532fef5d05ed8cbfa1e64a07e0953a5/nexus/src/app/sagas/instance_update/mod.rs#L5-L254 --- clients/nexus-client/src/lib.rs | 33 +- clients/sled-agent-client/src/lib.rs | 79 +- common/src/api/internal/nexus.rs | 59 +- dev-tools/omdb/src/bin/omdb/nexus.rs | 88 +- dev-tools/omdb/tests/env.out | 12 + dev-tools/omdb/tests/successes.out | 16 + nexus-config/src/nexus_config.rs | 26 + nexus/db-model/src/instance_state.rs | 5 + nexus/db-model/src/migration.rs | 18 + nexus/db-model/src/migration_state.rs | 12 + nexus/db-model/src/schema.rs | 2 + nexus/db-model/src/vmm_state.rs | 12 +- nexus/db-queries/src/db/datastore/instance.rs | 1490 +++++++-- .../db-queries/src/db/datastore/migration.rs | 77 +- nexus/db-queries/src/db/datastore/mod.rs | 3 +- .../virtual_provisioning_collection.rs | 34 +- nexus/db-queries/src/db/datastore/vmm.rs | 490 ++- nexus/db-queries/src/db/queries/instance.rs | 390 --- nexus/db-queries/src/db/queries/mod.rs | 1 - .../virtual_provisioning_collection_update.rs | 41 +- ...ning_collection_update_delete_instance.sql | 10 +- ...gration_update_vmm_and_both_migrations.sql | 93 + ..._migration_update_vmm_and_migration_in.sql | 61 + ...migration_update_vmm_and_migration_out.sql | 61 + .../vmm_and_migration_update_vmm_only.sql | 24 + nexus/examples/config-second.toml | 2 + nexus/examples/config.toml | 2 + nexus/src/app/background/init.rs | 26 +- .../app/background/tasks/instance_updater.rs | 270 ++ .../app/background/tasks/instance_watcher.rs | 88 +- nexus/src/app/background/tasks/mod.rs | 1 + nexus/src/app/instance.rs | 647 ++-- nexus/src/app/instance_network.rs | 209 -- nexus/src/app/saga.rs | 6 - nexus/src/app/sagas/instance_create.rs | 40 +- nexus/src/app/sagas/instance_migrate.rs | 266 +- nexus/src/app/sagas/instance_start.rs | 127 +- .../app/sagas/instance_update/destroyed.rs | 127 + nexus/src/app/sagas/instance_update/mod.rs | 2778 +++++++++++++++++ nexus/src/app/sagas/instance_update/start.rs | 308 ++ nexus/src/app/sagas/mod.rs | 4 + nexus/src/app/sagas/snapshot_create.rs | 14 + nexus/src/app/sagas/test_helpers.rs | 316 +- nexus/src/internal_api/http_entrypoints.rs | 2 +- nexus/tests/config.test.toml | 13 + nexus/tests/integration_tests/disks.rs | 18 +- nexus/tests/integration_tests/external_ips.rs | 3 + nexus/tests/integration_tests/instances.rs | 303 +- nexus/tests/integration_tests/ip_pools.rs | 3 + nexus/tests/integration_tests/pantry.rs | 3 + nexus/tests/integration_tests/vpc_subnets.rs | 3 + openapi/nexus-internal.json | 81 +- openapi/sled-agent.json | 120 +- sled-agent/src/common/instance.rs | 806 ++--- sled-agent/src/http_entrypoints.rs | 29 +- sled-agent/src/instance.rs | 144 +- sled-agent/src/instance_manager.rs | 69 +- sled-agent/src/params.rs | 17 - sled-agent/src/sim/collection.rs | 20 +- sled-agent/src/sim/http_entrypoints.rs | 61 +- sled-agent/src/sim/instance.rs | 178 +- sled-agent/src/sim/sled_agent.rs | 57 +- sled-agent/src/sled_agent.rs | 23 +- smf/nexus/multi-sled/config-partial.toml | 1 + smf/nexus/single-sled/config-partial.toml | 1 + 65 files changed, 7305 insertions(+), 3018 deletions(-) delete mode 100644 nexus/db-queries/src/db/queries/instance.rs create mode 100644 nexus/db-queries/tests/output/vmm_and_migration_update_vmm_and_both_migrations.sql create mode 100644 nexus/db-queries/tests/output/vmm_and_migration_update_vmm_and_migration_in.sql create mode 100644 nexus/db-queries/tests/output/vmm_and_migration_update_vmm_and_migration_out.sql create mode 100644 nexus/db-queries/tests/output/vmm_and_migration_update_vmm_only.sql create mode 100644 nexus/src/app/background/tasks/instance_updater.rs create mode 100644 nexus/src/app/sagas/instance_update/destroyed.rs create mode 100644 nexus/src/app/sagas/instance_update/mod.rs create mode 100644 nexus/src/app/sagas/instance_update/start.rs diff --git a/clients/nexus-client/src/lib.rs b/clients/nexus-client/src/lib.rs index 162c3f4dbf..b7722144fe 100644 --- a/clients/nexus-client/src/lib.rs +++ b/clients/nexus-client/src/lib.rs @@ -122,22 +122,6 @@ impl From for omicron_common::api::internal::nexus::VmmState { } } -impl From - for types::InstanceRuntimeState -{ - fn from( - s: omicron_common::api::internal::nexus::InstanceRuntimeState, - ) -> Self { - Self { - dst_propolis_id: s.dst_propolis_id, - gen: s.gen, - migration_id: s.migration_id, - propolis_id: s.propolis_id, - time_updated: s.time_updated, - } - } -} - impl From for types::VmmRuntimeState { @@ -153,10 +137,10 @@ impl From s: omicron_common::api::internal::nexus::SledInstanceState, ) -> Self { Self { - instance_state: s.instance_state.into(), propolis_id: s.propolis_id, vmm_state: s.vmm_state.into(), - migration_state: s.migration_state.map(Into::into), + migration_in: s.migration_in.map(Into::into), + migration_out: s.migration_out.map(Into::into), } } } @@ -169,7 +153,6 @@ impl From ) -> Self { Self { migration_id: s.migration_id, - role: s.role.into(), state: s.state.into(), gen: s.gen, time_updated: s.time_updated, @@ -177,18 +160,6 @@ impl From } } -impl From - for types::MigrationRole -{ - fn from(s: omicron_common::api::internal::nexus::MigrationRole) -> Self { - use omicron_common::api::internal::nexus::MigrationRole as Input; - match s { - Input::Source => Self::Source, - Input::Target => Self::Target, - } - } -} - impl From for types::MigrationState { diff --git a/clients/sled-agent-client/src/lib.rs b/clients/sled-agent-client/src/lib.rs index 4e7a4a72db..4ed5aaa1cb 100644 --- a/clients/sled-agent-client/src/lib.rs +++ b/clients/sled-agent-client/src/lib.rs @@ -5,6 +5,9 @@ //! Interface for making API requests to a Sled Agent use async_trait::async_trait; +use schemars::JsonSchema; +use serde::Deserialize; +use serde::Serialize; use std::convert::TryFrom; use uuid::Uuid; @@ -162,10 +165,10 @@ impl From { fn from(s: types::SledInstanceState) -> Self { Self { - instance_state: s.instance_state.into(), propolis_id: s.propolis_id, vmm_state: s.vmm_state.into(), - migration_state: s.migration_state.map(Into::into), + migration_in: s.migration_in.map(Into::into), + migration_out: s.migration_out.map(Into::into), } } } @@ -177,25 +180,12 @@ impl From Self { migration_id: s.migration_id, state: s.state.into(), - role: s.role.into(), gen: s.gen, time_updated: s.time_updated, } } } -impl From - for omicron_common::api::internal::nexus::MigrationRole -{ - fn from(r: types::MigrationRole) -> Self { - use omicron_common::api::internal::nexus::MigrationRole as Output; - match r { - types::MigrationRole::Source => Output::Source, - types::MigrationRole::Target => Output::Target, - } - } -} - impl From for omicron_common::api::internal::nexus::MigrationState { @@ -457,12 +447,29 @@ impl From /// are bonus endpoints, not generated in the real client. #[async_trait] pub trait TestInterfaces { + async fn instance_single_step(&self, id: Uuid); async fn instance_finish_transition(&self, id: Uuid); + async fn instance_simulate_migration_source( + &self, + id: Uuid, + params: SimulateMigrationSource, + ); async fn disk_finish_transition(&self, id: Uuid); } #[async_trait] impl TestInterfaces for Client { + async fn instance_single_step(&self, id: Uuid) { + let baseurl = self.baseurl(); + let client = self.client(); + let url = format!("{}/instances/{}/poke-single-step", baseurl, id); + client + .post(url) + .send() + .await + .expect("instance_single_step() failed unexpectedly"); + } + async fn instance_finish_transition(&self, id: Uuid) { let baseurl = self.baseurl(); let client = self.client(); @@ -484,4 +491,46 @@ impl TestInterfaces for Client { .await .expect("disk_finish_transition() failed unexpectedly"); } + + async fn instance_simulate_migration_source( + &self, + id: Uuid, + params: SimulateMigrationSource, + ) { + let baseurl = self.baseurl(); + let client = self.client(); + let url = format!("{baseurl}/instances/{id}/sim-migration-source"); + client + .post(url) + .json(¶ms) + .send() + .await + .expect("instance_simulate_migration_source() failed unexpectedly"); + } +} + +/// Parameters to the `/instances/{id}/sim-migration-source` test API. +/// +/// This message type is not included in the OpenAPI spec, because this API +/// exists only in test builds. +#[derive(Serialize, Deserialize, JsonSchema)] +pub struct SimulateMigrationSource { + /// The ID of the migration out of the instance's current active VMM. + pub migration_id: Uuid, + /// What migration result (success or failure) to simulate. + pub result: SimulatedMigrationResult, +} + +/// The result of a simulated migration out from an instance's current active +/// VMM. +#[derive(Serialize, Deserialize, JsonSchema)] +pub enum SimulatedMigrationResult { + /// Simulate a successful migration out. + Success, + /// Simulate a failed migration out. + /// + /// # Note + /// + /// This is not currently implemented by the simulated sled-agent. + Failure, } diff --git a/common/src/api/internal/nexus.rs b/common/src/api/internal/nexus.rs index d4ed1773f6..7f4eb358a4 100644 --- a/common/src/api/internal/nexus.rs +++ b/common/src/api/internal/nexus.rs @@ -117,18 +117,38 @@ pub struct VmmRuntimeState { /// specific VMM and the instance it incarnates. #[derive(Clone, Debug, Deserialize, Serialize, JsonSchema)] pub struct SledInstanceState { - /// The sled's conception of the state of the instance. - pub instance_state: InstanceRuntimeState, - /// The ID of the VMM whose state is being reported. pub propolis_id: PropolisUuid, /// The most recent state of the sled's VMM process. pub vmm_state: VmmRuntimeState, - /// The current state of any in-progress migration for this instance, as - /// understood by this sled. - pub migration_state: Option, + /// The current state of any inbound migration to this VMM. + pub migration_in: Option, + + /// The state of any outbound migration from this VMM. + pub migration_out: Option, +} + +#[derive(Copy, Clone, Debug, Default)] +pub struct Migrations<'state> { + pub migration_in: Option<&'state MigrationRuntimeState>, + pub migration_out: Option<&'state MigrationRuntimeState>, +} + +impl Migrations<'_> { + pub fn empty() -> Self { + Self { migration_in: None, migration_out: None } + } +} + +impl SledInstanceState { + pub fn migrations(&self) -> Migrations<'_> { + Migrations { + migration_in: self.migration_in.as_ref(), + migration_out: self.migration_out.as_ref(), + } + } } /// An update from a sled regarding the state of a migration, indicating the @@ -137,7 +157,6 @@ pub struct SledInstanceState { pub struct MigrationRuntimeState { pub migration_id: Uuid, pub state: MigrationState, - pub role: MigrationRole, pub gen: Generation, /// Timestamp for the migration state update. @@ -192,32 +211,6 @@ impl fmt::Display for MigrationState { } } -#[derive( - Clone, Copy, Debug, PartialEq, Eq, Deserialize, Serialize, JsonSchema, -)] -#[serde(rename_all = "snake_case")] -pub enum MigrationRole { - /// This update concerns the source VMM of a migration. - Source, - /// This update concerns the target VMM of a migration. - Target, -} - -impl MigrationRole { - pub fn label(&self) -> &'static str { - match self { - Self::Source => "source", - Self::Target => "target", - } - } -} - -impl fmt::Display for MigrationRole { - fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result { - f.write_str(self.label()) - } -} - // Oximeter producer/collector objects. /// The kind of metric producer this is. diff --git a/dev-tools/omdb/src/bin/omdb/nexus.rs b/dev-tools/omdb/src/bin/omdb/nexus.rs index 8649d15aa6..ec3e519cbc 100644 --- a/dev-tools/omdb/src/bin/omdb/nexus.rs +++ b/dev-tools/omdb/src/bin/omdb/nexus.rs @@ -929,6 +929,9 @@ fn print_task_details(bgtask: &BackgroundTask, details: &serde_json::Value) { /// number of stale instance metrics that were deleted pruned_instances: usize, + /// update sagas queued due to instance updates. + update_sagas_queued: usize, + /// instance states from completed checks. /// /// this is a mapping of stringified instance states to the number @@ -970,6 +973,7 @@ fn print_task_details(bgtask: &BackgroundTask, details: &serde_json::Value) { ), Ok(TaskSuccess { total_instances, + update_sagas_queued, pruned_instances, instance_states, failed_checks, @@ -987,7 +991,7 @@ fn print_task_details(bgtask: &BackgroundTask, details: &serde_json::Value) { for (state, count) in &instance_states { println!(" -> {count} instances {state}") } - + println!(" update sagas queued: {update_sagas_queued}"); println!(" failed checks: {total_failures}"); for (failure, count) in &failed_checks { println!(" -> {count} {failure}") @@ -1239,11 +1243,6 @@ fn print_task_details(bgtask: &BackgroundTask, details: &serde_json::Value) { } else if name == "lookup_region_port" { match serde_json::from_value::(details.clone()) { - Err(error) => eprintln!( - "warning: failed to interpret task details: {:?}: {:?}", - error, details - ), - Ok(LookupRegionPortStatus { found_port_ok, errors }) => { println!(" total filled in ports: {}", found_port_ok.len()); for line in &found_port_ok { @@ -1255,6 +1254,83 @@ fn print_task_details(bgtask: &BackgroundTask, details: &serde_json::Value) { println!(" > {line}"); } } + + Err(error) => eprintln!( + "warning: failed to interpret task details: {:?}: {:?}", + error, details, + ), + } + } else if name == "instance_updater" { + #[derive(Deserialize)] + struct UpdaterStatus { + /// number of instances found with destroyed active VMMs + destroyed_active_vmms: usize, + + /// number of instances found with terminated active migrations + terminated_active_migrations: usize, + + /// number of update sagas started. + sagas_started: usize, + + /// number of sagas completed successfully + sagas_completed: usize, + + /// number of sagas which failed + sagas_failed: usize, + + /// number of sagas which could not be started + saga_start_failures: usize, + + /// the last error that occurred during execution. + error: Option, + } + match serde_json::from_value::(details.clone()) { + Err(error) => eprintln!( + "warning: failed to interpret task details: {:?}: {:?}", + error, details + ), + Ok(UpdaterStatus { + destroyed_active_vmms, + terminated_active_migrations, + sagas_started, + sagas_completed, + sagas_failed, + saga_start_failures, + error, + }) => { + if let Some(error) = error { + println!(" task did not complete successfully!"); + println!(" most recent error: {error}"); + } + + println!( + " total instances in need of updates: {}", + destroyed_active_vmms + terminated_active_migrations + ); + println!( + " instances with destroyed active VMMs: {}", + destroyed_active_vmms, + ); + println!( + " instances with terminated active migrations: {}", + terminated_active_migrations, + ); + println!(" update sagas started: {sagas_started}"); + println!( + " update sagas completed successfully: {}", + sagas_completed, + ); + + let total_failed = sagas_failed + saga_start_failures; + if total_failed > 0 { + println!(" unsuccessful update sagas: {total_failed}"); + println!( + " sagas which could not be started: {}", + saga_start_failures + ); + println!(" sagas failed: {sagas_failed}"); + } + } }; } else { println!( diff --git a/dev-tools/omdb/tests/env.out b/dev-tools/omdb/tests/env.out index a6bf4d4667..67f113a801 100644 --- a/dev-tools/omdb/tests/env.out +++ b/dev-tools/omdb/tests/env.out @@ -86,6 +86,10 @@ task: "external_endpoints" on each one +task: "instance_updater" + detects if instances require update sagas and schedules them + + task: "instance_watcher" periodically checks instance states @@ -231,6 +235,10 @@ task: "external_endpoints" on each one +task: "instance_updater" + detects if instances require update sagas and schedules them + + task: "instance_watcher" periodically checks instance states @@ -363,6 +371,10 @@ task: "external_endpoints" on each one +task: "instance_updater" + detects if instances require update sagas and schedules them + + task: "instance_watcher" periodically checks instance states diff --git a/dev-tools/omdb/tests/successes.out b/dev-tools/omdb/tests/successes.out index cec3fa3052..d4c07899f4 100644 --- a/dev-tools/omdb/tests/successes.out +++ b/dev-tools/omdb/tests/successes.out @@ -287,6 +287,10 @@ task: "external_endpoints" on each one +task: "instance_updater" + detects if instances require update sagas and schedules them + + task: "instance_watcher" periodically checks instance states @@ -482,6 +486,17 @@ task: "external_endpoints" TLS certificates: 0 +task: "instance_updater" + configured period: every s + currently executing: no + last completed activation: , triggered by a periodic timer firing + started at (s ago) and ran for ms + total instances in need of updates: 0 + instances with destroyed active VMMs: 0 + instances with terminated active migrations: 0 + update sagas started: 0 + update sagas completed successfully: 0 + task: "instance_watcher" configured period: every s currently executing: no @@ -490,6 +505,7 @@ task: "instance_watcher" total instances checked: 0 checks completed: 0 successful checks: 0 + update sagas queued: 0 failed checks: 0 checks that could not be completed: 0 stale instance metrics pruned: 0 diff --git a/nexus-config/src/nexus_config.rs b/nexus-config/src/nexus_config.rs index 6e9d6b0cf0..9d8bf1ac9b 100644 --- a/nexus-config/src/nexus_config.rs +++ b/nexus-config/src/nexus_config.rs @@ -379,6 +379,8 @@ pub struct BackgroundTaskConfig { pub region_replacement_driver: RegionReplacementDriverConfig, /// configuration for instance watcher task pub instance_watcher: InstanceWatcherConfig, + /// configuration for instance updater task + pub instance_updater: InstanceUpdaterConfig, /// configuration for service VPC firewall propagation task pub service_firewall_propagation: ServiceFirewallPropagationConfig, /// configuration for v2p mapping propagation task @@ -560,6 +562,23 @@ pub struct InstanceWatcherConfig { pub period_secs: Duration, } +#[serde_as] +#[derive(Clone, Debug, Deserialize, Eq, PartialEq, Serialize)] +pub struct InstanceUpdaterConfig { + /// period (in seconds) for periodic activations of this background task + #[serde_as(as = "DurationSeconds")] + pub period_secs: Duration, + + /// disable background checks for instances in need of updates. + /// + /// This config is intended for use in testing, and should generally not be + /// enabled in real life. + /// + /// Default: Off + #[serde(default)] + pub disable: bool, +} + #[serde_as] #[derive(Clone, Debug, Deserialize, Eq, PartialEq, Serialize)] pub struct ServiceFirewallPropagationConfig { @@ -848,6 +867,8 @@ mod test { region_replacement.period_secs = 30 region_replacement_driver.period_secs = 30 instance_watcher.period_secs = 30 + instance_updater.period_secs = 30 + instance_updater.disable = false service_firewall_propagation.period_secs = 300 v2p_mapping_propagation.period_secs = 30 abandoned_vmm_reaper.period_secs = 60 @@ -995,6 +1016,10 @@ mod test { instance_watcher: InstanceWatcherConfig { period_secs: Duration::from_secs(30), }, + instance_updater: InstanceUpdaterConfig { + period_secs: Duration::from_secs(30), + disable: false, + }, service_firewall_propagation: ServiceFirewallPropagationConfig { period_secs: Duration::from_secs(300), @@ -1081,6 +1106,7 @@ mod test { region_replacement.period_secs = 30 region_replacement_driver.period_secs = 30 instance_watcher.period_secs = 30 + instance_updater.period_secs = 30 service_firewall_propagation.period_secs = 300 v2p_mapping_propagation.period_secs = 30 abandoned_vmm_reaper.period_secs = 60 diff --git a/nexus/db-model/src/instance_state.rs b/nexus/db-model/src/instance_state.rs index 673b06e2cd..5925e92ae0 100644 --- a/nexus/db-model/src/instance_state.rs +++ b/nexus/db-model/src/instance_state.rs @@ -59,3 +59,8 @@ impl From for omicron_common::api::external::InstanceState { } } } + +impl diesel::query_builder::QueryId for InstanceStateEnum { + type QueryId = (); + const HAS_STATIC_QUERY_ID: bool = false; +} diff --git a/nexus/db-model/src/migration.rs b/nexus/db-model/src/migration.rs index 4e3ca1b35d..d7c18ae5dd 100644 --- a/nexus/db-model/src/migration.rs +++ b/nexus/db-model/src/migration.rs @@ -89,4 +89,22 @@ impl Migration { time_target_updated: None, } } + + /// Returns `true` if either side reports that the migration is in a + /// terminal state. + pub fn is_terminal(&self) -> bool { + self.source_state.is_terminal() || self.target_state.is_terminal() + } + + /// Returns `true` if either side of the migration has failed. + pub fn either_side_failed(&self) -> bool { + self.source_state == MigrationState::FAILED + || self.target_state == MigrationState::FAILED + } + + /// Returns `true` if either side of the migration has completed. + pub fn either_side_completed(&self) -> bool { + self.source_state == MigrationState::COMPLETED + || self.target_state == MigrationState::COMPLETED + } } diff --git a/nexus/db-model/src/migration_state.rs b/nexus/db-model/src/migration_state.rs index 694198eb56..e1662f2c28 100644 --- a/nexus/db-model/src/migration_state.rs +++ b/nexus/db-model/src/migration_state.rs @@ -28,6 +28,18 @@ impl_enum_wrapper!( ); impl MigrationState { + pub const COMPLETED: MigrationState = + MigrationState(nexus::MigrationState::Completed); + pub const FAILED: MigrationState = + MigrationState(nexus::MigrationState::Failed); + pub const PENDING: MigrationState = + MigrationState(nexus::MigrationState::Pending); + pub const IN_PROGRESS: MigrationState = + MigrationState(nexus::MigrationState::InProgress); + + pub const TERMINAL_STATES: &'static [MigrationState] = + &[Self::COMPLETED, Self::FAILED]; + /// Returns `true` if this migration state means that the migration is no /// longer in progress (it has either succeeded or failed). #[must_use] diff --git a/nexus/db-model/src/schema.rs b/nexus/db-model/src/schema.rs index 246edecd33..845da13a44 100644 --- a/nexus/db-model/src/schema.rs +++ b/nexus/db-model/src/schema.rs @@ -425,6 +425,8 @@ table! { } } +joinable!(instance -> vmm (active_propolis_id)); + table! { vmm (id) { id -> Uuid, diff --git a/nexus/db-model/src/vmm_state.rs b/nexus/db-model/src/vmm_state.rs index 121daaf7dd..7d44bbedbd 100644 --- a/nexus/db-model/src/vmm_state.rs +++ b/nexus/db-model/src/vmm_state.rs @@ -8,7 +8,7 @@ use serde::Serialize; use std::fmt; impl_enum_type!( - #[derive(SqlType, Debug)] + #[derive(SqlType, Debug, Clone)] #[diesel(postgres_type(name = "vmm_state", schema = "public"))] pub struct VmmStateEnum; @@ -41,6 +41,11 @@ impl VmmState { VmmState::SagaUnwound => "saga_unwound", } } + + /// States in which it is safe to deallocate a VMM's sled resources and mark + /// it as deleted. + pub const DESTROYABLE_STATES: &'static [Self] = + &[Self::Destroyed, Self::SagaUnwound]; } impl fmt::Display for VmmState { @@ -119,3 +124,8 @@ impl From for omicron_common::api::external::InstanceState { } } } + +impl diesel::query_builder::QueryId for VmmStateEnum { + type QueryId = (); + const HAS_STATIC_QUERY_ID: bool = false; +} diff --git a/nexus/db-queries/src/db/datastore/instance.rs b/nexus/db-queries/src/db/datastore/instance.rs index 9fb94f043e..455aa62192 100644 --- a/nexus/db-queries/src/db/datastore/instance.rs +++ b/nexus/db-queries/src/db/datastore/instance.rs @@ -22,10 +22,12 @@ use crate::db::model::Generation; use crate::db::model::Instance; use crate::db::model::InstanceRuntimeState; use crate::db::model::Migration; +use crate::db::model::MigrationState; use crate::db::model::Name; use crate::db::model::Project; use crate::db::model::Sled; use crate::db::model::Vmm; +use crate::db::model::VmmState; use crate::db::pagination::paginated; use crate::db::update_and_check::UpdateAndCheck; use crate::db::update_and_check::UpdateAndQueryResult; @@ -35,9 +37,9 @@ use chrono::Utc; use diesel::prelude::*; use nexus_db_model::ApplySledFilterExt; use nexus_db_model::Disk; -use nexus_db_model::VmmRuntimeState; use nexus_types::deployment::SledFilter; use omicron_common::api; +use omicron_common::api::external; use omicron_common::api::external::http_pagination::PaginatedBy; use omicron_common::api::external::CreateResult; use omicron_common::api::external::DataPageParams; @@ -46,8 +48,8 @@ use omicron_common::api::external::Error; use omicron_common::api::external::ListResultVec; use omicron_common::api::external::LookupResult; use omicron_common::api::external::LookupType; +use omicron_common::api::external::MessagePair; use omicron_common::api::external::ResourceType; -use omicron_common::api::internal::nexus::MigrationRuntimeState; use omicron_common::bail_unless; use omicron_uuid_kinds::GenericUuid; use omicron_uuid_kinds::InstanceUuid; @@ -59,8 +61,8 @@ use uuid::Uuid; /// Wraps a record of an `Instance` along with its active `Vmm`, if it has one. #[derive(Clone, Debug)] pub struct InstanceAndActiveVmm { - instance: Instance, - vmm: Option, + pub instance: Instance, + pub vmm: Option, } impl InstanceAndActiveVmm { @@ -76,13 +78,98 @@ impl InstanceAndActiveVmm { self.vmm.as_ref().map(|v| SledUuid::from_untyped_uuid(v.sled_id)) } - pub fn effective_state( - &self, - ) -> omicron_common::api::external::InstanceState { - if let Some(vmm) = &self.vmm { - vmm.runtime.state.into() - } else { - self.instance.runtime().nexus_state.into() + /// Returns the operator-visible [external API + /// `InstanceState`](external::InstanceState) for this instance and its + /// active VMM. + pub fn effective_state(&self) -> external::InstanceState { + Self::determine_effective_state(&self.instance, self.vmm.as_ref()) + } + + /// Returns the operator-visible [external API + /// `InstanceState`](external::InstanceState) for the provided [`Instance`] + /// and its active [`Vmm`], if one exists. + /// + /// # Arguments + /// + /// - `instance`: the instance + /// - `active_vmm`: the instance's active VMM, if one exists. + /// + /// # Notes + /// + /// Generally, the value of `active_vmm` should be + /// the VMM pointed to by `instance.runtime_state.propolis_id`. However, + /// this is not enforced by this function, as the `instance_migrate` saga + /// must in some cases determine an effective instance state from the + /// instance and *target* VMM states. + pub fn determine_effective_state( + instance: &Instance, + active_vmm: Option<&Vmm>, + ) -> external::InstanceState { + use crate::db::model::InstanceState; + use crate::db::model::VmmState; + + let instance_state = instance.runtime_state.nexus_state; + let vmm_state = active_vmm.map(|vmm| vmm.runtime.state); + + // We want to only report that an instance is `Stopped` when a new + // `instance-start` saga is able to proceed. That means that: + match (instance_state, vmm_state) { + // - If there's an active migration ID for the instance, *always* + // treat its state as "migration" regardless of the VMM's state. + // + // This avoids an issue where an instance whose previous active + // VMM has been destroyed as a result of a successful migration + // out will appear to be "stopping" for the time between when that + // VMM was reported destroyed and when the instance record was + // updated to reflect the migration's completion. + // + // Instead, we'll continue to report the instance's state as + // "migrating" until an instance-update saga has resolved the + // outcome of the migration, since only the instance-update saga + // can complete the migration and update the instance record to + // point at its new active VMM. No new instance-migrate, + // instance-stop, or instance-delete saga can be started + // until this occurs. + // + // If the instance actually *has* stopped or failed before a + // successful migration out, this is fine, because an + // instance-update saga will come along and remove the active VMM + // and migration IDs. + // + (InstanceState::Vmm, Some(_)) + if instance.runtime_state.migration_id.is_some() => + { + external::InstanceState::Migrating + } + // - An instance with a "stopped" or "destroyed" VMM needs to be + // recast as a "stopping" instance, as the virtual provisioning + // resources for that instance have not been deallocated until the + // active VMM ID has been unlinked by an update saga. + ( + InstanceState::Vmm, + Some(VmmState::Stopped | VmmState::Destroyed), + ) => external::InstanceState::Stopping, + // - An instance with a "saga unwound" VMM, on the other hand, can + // be treated as "stopped", since --- unlike "destroyed" --- a new + // start saga can run at any time by just clearing out the old VMM + // ID. + (InstanceState::Vmm, Some(VmmState::SagaUnwound)) => { + external::InstanceState::Stopped + } + // - An instance with no VMM is always "stopped" (as long as it's + // not "starting" etc.) + (InstanceState::NoVmm, _vmm_state) => { + debug_assert_eq!(_vmm_state, None); + external::InstanceState::Stopped + } + // If there's a VMM state, and none of the above rules apply, use + // that. + (_instance_state, Some(vmm_state)) => { + debug_assert_eq!(_instance_state, InstanceState::Vmm); + vmm_state.into() + } + // If there's no VMM state, use the instance's state. + (instance_state, None) => instance_state.into(), } } } @@ -93,18 +180,13 @@ impl From<(Instance, Option)> for InstanceAndActiveVmm { } } -impl From for omicron_common::api::external::Instance { +impl From for external::Instance { fn from(value: InstanceAndActiveVmm) -> Self { - let run_state: omicron_common::api::external::InstanceState; - let time_run_state_updated: chrono::DateTime; - (run_state, time_run_state_updated) = if let Some(vmm) = value.vmm { - (vmm.runtime.state.into(), vmm.runtime.time_state_updated) - } else { - ( - value.instance.runtime_state.nexus_state.into(), - value.instance.runtime_state.time_updated, - ) - }; + let time_run_state_updated = value + .vmm + .as_ref() + .map(|vmm| vmm.runtime.time_state_updated) + .unwrap_or(value.instance.runtime_state.time_updated); Self { identity: value.instance.identity(), @@ -116,21 +198,21 @@ impl From for omicron_common::api::external::Instance { .hostname .parse() .expect("found invalid hostname in the database"), - runtime: omicron_common::api::external::InstanceRuntimeState { - run_state, + runtime: external::InstanceRuntimeState { + run_state: value.effective_state(), time_run_state_updated, }, } } } -/// A complete snapshot of the database records describing the current state of +/// The totality of database records describing the current state of /// an instance: the [`Instance`] record itself, along with its active [`Vmm`], /// target [`Vmm`], and current [`Migration`], if they exist. /// /// This is returned by [`DataStore::instance_fetch_all`]. #[derive(Clone, Debug, serde::Serialize, serde::Deserialize)] -pub struct InstanceSnapshot { +pub struct InstanceGestalt { /// The instance record. pub instance: Instance, /// The [`Vmm`] record pointed to by the instance's `active_propolis_id`, if @@ -152,12 +234,14 @@ pub struct InstanceSnapshot { /// when the lock is released. #[derive(Debug, serde::Serialize, serde::Deserialize)] pub struct UpdaterLock { - saga_lock_id: Uuid, + pub updater_id: Uuid, locked_gen: Generation, } /// Errors returned by [`DataStore::instance_updater_lock`]. -#[derive(Debug, thiserror::Error, PartialEq)] +#[derive( + Debug, thiserror::Error, PartialEq, serde::Serialize, serde::Deserialize, +)] pub enum UpdaterLockError { /// The instance was already locked by another saga. #[error("instance already locked by another saga")] @@ -167,25 +251,6 @@ pub enum UpdaterLockError { Query(#[from] Error), } -/// The result of an [`DataStore::instance_and_vmm_update_runtime`] call, -/// indicating which records were updated. -#[derive(Copy, Clone, Debug)] -pub struct InstanceUpdateResult { - /// `true` if the instance record was updated, `false` otherwise. - pub instance_updated: bool, - /// `true` if the VMM record was updated, `false` otherwise. - pub vmm_updated: bool, - /// Indicates whether a migration record for this instance was updated, if a - /// [`MigrationRuntimeState`] was provided to - /// [`DataStore::instance_and_vmm_update_runtime`]. - /// - /// - `Some(true)` if a migration record was updated - /// - `Some(false)` if a [`MigrationRuntimeState`] was provided, but the - /// migration record was not updated - /// - `None` if no [`MigrationRuntimeState`] was provided - pub migration_updated: Option, -} - impl DataStore { /// Idempotently insert a database record for an Instance /// @@ -295,6 +360,74 @@ impl DataStore { .collect()) } + /// List all instances with active VMMs in the `Destroyed` state that don't + /// have currently-running instance-updater sagas. + /// + /// This is used by the `instance_updater` background task to ensure that + /// update sagas are scheduled for these instances. + pub async fn find_instances_with_destroyed_active_vmms( + &self, + opctx: &OpContext, + ) -> ListResultVec { + use db::model::VmmState; + use db::schema::instance::dsl; + use db::schema::vmm::dsl as vmm_dsl; + + vmm_dsl::vmm + .filter(vmm_dsl::state.eq(VmmState::Destroyed)) + // If the VMM record has already been deleted, we don't need to do + // anything about it --- someone already has. + .filter(vmm_dsl::time_deleted.is_null()) + .inner_join( + dsl::instance.on(dsl::active_propolis_id + .eq(vmm_dsl::id.nullable()) + .and(dsl::time_deleted.is_null()) + .and(dsl::updater_id.is_null())), + ) + .select(Instance::as_select()) + .load_async::( + &*self.pool_connection_authorized(opctx).await?, + ) + .await + .map_err(|e| public_error_from_diesel(e, ErrorHandler::Server)) + } + + /// List all instances with active migrations that have terminated (either + /// completed or failed) and don't have currently-running instance-updater + /// sagas. + /// + /// This is used by the `instance_updater` background task to ensure that + /// update sagas are scheduled for these instances. + pub async fn find_instances_with_terminated_active_migrations( + &self, + opctx: &OpContext, + ) -> ListResultVec { + use db::model::MigrationState; + use db::schema::instance::dsl; + use db::schema::migration::dsl as migration_dsl; + + dsl::instance + .filter(dsl::time_deleted.is_null()) + .filter(dsl::migration_id.is_not_null()) + .filter(dsl::updater_id.is_null()) + .inner_join( + migration_dsl::migration.on(dsl::migration_id + .eq(migration_dsl::id.nullable()) + .and( + migration_dsl::target_state + .eq_any(MigrationState::TERMINAL_STATES) + .or(migration_dsl::source_state + .eq_any(MigrationState::TERMINAL_STATES)), + )), + ) + .select(Instance::as_select()) + .load_async::( + &*self.pool_connection_authorized(opctx).await?, + ) + .await + .map_err(|e| public_error_from_diesel(e, ErrorHandler::Server)) + } + /// Fetches information about an Instance that the caller has previously /// fetched /// @@ -359,7 +492,7 @@ impl DataStore { /// instance in a single atomic query. /// /// If an instance with the provided UUID exists, this method returns an - /// [`InstanceSnapshot`], which contains the following: + /// [`InstanceGestalt`], which contains the following: /// /// - The [`Instance`] record itself, /// - The instance's active [`Vmm`] record, if the `active_propolis_id` @@ -372,7 +505,7 @@ impl DataStore { &self, opctx: &OpContext, authz_instance: &authz::Instance, - ) -> LookupResult { + ) -> LookupResult { opctx.authorize(authz::Action::Read, authz_instance).await?; use db::schema::instance::dsl as instance_dsl; @@ -438,7 +571,7 @@ impl DataStore { ) })?; - Ok(InstanceSnapshot { instance, migration, active_vmm, target_vmm }) + Ok(InstanceGestalt { instance, migration, active_vmm, target_vmm }) } // TODO-design It's tempting to return the updated state of the Instance @@ -484,83 +617,180 @@ impl DataStore { Ok(updated) } - /// Updates an instance record and a VMM record with a single database - /// command. + /// Updates an instance record by setting the instance's migration ID to the + /// provided `migration_id` and the target VMM ID to the provided + /// `target_propolis_id`, if the instance does not currently have an active + /// migration, and the active VMM is in the [`VmmState::Running`] or + /// [`VmmState::Rebooting`] states. /// - /// This is intended to be used to apply updates from sled agent that - /// may change a VMM's runtime state (e.g. moving an instance from Running - /// to Stopped) and its corresponding instance's state (e.g. changing the - /// active Propolis ID to reflect a completed migration) in a single - /// transaction. The caller is responsible for ensuring the instance and - /// VMM states are consistent with each other before calling this routine. - /// - /// # Arguments - /// - /// - instance_id: The ID of the instance to update. - /// - new_instance: The new instance runtime state to try to write. - /// - vmm_id: The ID of the VMM to update. - /// - new_vmm: The new VMM runtime state to try to write. - /// - /// # Return value - /// - /// - `Ok(`[`InstanceUpdateResult`]`)` if the query was issued - /// successfully. The returned [`InstanceUpdateResult`] indicates which - /// database record(s) were updated. Note that an update can fail because - /// it was inapplicable (i.e. the database has state with a newer - /// generation already) or because the relevant record was not found. - /// - `Err` if another error occurred while accessing the database. - pub async fn instance_and_vmm_update_runtime( + /// Note that a non-NULL `target_propolis_id` will be overwritten, if (and + /// only if) the target VMM record is in [`VmmState::SagaUnwound`], + /// indicating that it was left behind by a failed `instance-migrate` saga + /// unwinding. + pub async fn instance_set_migration_ids( &self, - instance_id: &InstanceUuid, - new_instance: &InstanceRuntimeState, - vmm_id: &PropolisUuid, - new_vmm: &VmmRuntimeState, - migration: &Option, - ) -> Result { - let query = crate::db::queries::instance::InstanceAndVmmUpdate::new( - *instance_id, - new_instance.clone(), - *vmm_id, - new_vmm.clone(), - migration.clone(), - ); + opctx: &OpContext, + instance_id: InstanceUuid, + src_propolis_id: PropolisUuid, + migration_id: Uuid, + target_propolis_id: PropolisUuid, + ) -> Result { + use db::schema::instance::dsl; + use db::schema::migration::dsl as migration_dsl; + use db::schema::vmm::dsl as vmm_dsl; - // The InstanceAndVmmUpdate query handles and indicates failure to find - // either the instance or the VMM, so a query failure here indicates - // some kind of internal error and not a failed lookup. - let result = query - .execute_and_check(&*self.pool_connection_unauthorized().await?) + // Only allow migrating out if the active VMM is running or rebooting. + const ALLOWED_ACTIVE_VMM_STATES: &[VmmState] = + &[VmmState::Running, VmmState::Rebooting]; + + let instance_id = instance_id.into_untyped_uuid(); + let target_propolis_id = target_propolis_id.into_untyped_uuid(); + let src_propolis_id = src_propolis_id.into_untyped_uuid(); + + // Subquery for determining whether the active VMM is in a state where + // it can be migrated out of. This returns the VMM row's instance ID, so + // that we can use it in a `filter` on the update query. + let vmm_ok = vmm_dsl::vmm + .filter(vmm_dsl::id.eq(src_propolis_id)) + .filter(vmm_dsl::time_deleted.is_null()) + .filter(vmm_dsl::state.eq_any(ALLOWED_ACTIVE_VMM_STATES)) + .select(vmm_dsl::instance_id); + // Subquery for checking if a present target VMM ID points at a VMM + // that's in the saga-unwound state (in which it would be okay to clear + // out that VMM). + let target_vmm_unwound = vmm_dsl::vmm + .filter(vmm_dsl::id.nullable().eq(dsl::target_propolis_id)) + // Don't filter out target VMMs with `time_deleted` set here --- we + // *shouldn't* have deleted the VMM without unlinking it from the + // instance record, but if something did, we should still allow the + // ID to be clobbered. + .filter(vmm_dsl::state.eq(VmmState::SagaUnwound)) + .select(vmm_dsl::instance_id); + // Subquery for checking if an already present migration ID points at a + // migration where both the source- and target-sides are marked as + // failed. If both are failed, *and* the target VMM is `SagaUnwound` as + // determined by the query above, then it's okay to clobber that + // migration, as it was left behind by a previous migrate saga unwinding. + let current_migration_failed = migration_dsl::migration + .filter(migration_dsl::id.nullable().eq(dsl::migration_id)) + .filter(migration_dsl::target_state.eq(MigrationState::FAILED)) + .filter(migration_dsl::source_state.eq(MigrationState::FAILED)) + .select(migration_dsl::instance_id); + + diesel::update(dsl::instance) + .filter(dsl::time_deleted.is_null()) + .filter(dsl::id.eq(instance_id)) + .filter( + // Update the row if and only if one of the following is true: + // + // - The migration and target VMM IDs are not present + (dsl::migration_id + .is_null() + .and(dsl::target_propolis_id.is_null())) + // - The migration and target VMM IDs are set to the values + // we are trying to set. + // + // This way, we can use a `RETURNING` clause to fetch the + // current state after the update, rather than + // `check_if_exists` which returns the prior state, and still + // fail to update the record if another migration/target VMM + // ID is already there. + .or(dsl::migration_id + .eq(Some(migration_id)) + .and(dsl::target_propolis_id.eq(Some(target_propolis_id)))) + // - The migration and target VMM IDs are set to another + // migration, but the target VMM state is `SagaUnwound` and + // the migration is `Failed` on both sides. + // + // This would indicate that the migration/VMM IDs are left + // behind by another migrate saga failing, and are okay to get + // rid of. + .or( + // Note that both of these queries return the instance ID + // from the VMM and migration records, so we check if one was + // found by comparing it to the actual instance ID. + dsl::id + .eq_any(target_vmm_unwound) + .and(dsl::id.eq_any(current_migration_failed)), + ), + ) + .filter(dsl::active_propolis_id.eq(src_propolis_id)) + .filter(dsl::id.eq_any(vmm_ok)) + .set(( + dsl::migration_id.eq(Some(migration_id)), + dsl::target_propolis_id.eq(Some(target_propolis_id)), + // advance the generation + dsl::state_generation.eq(dsl::state_generation + 1), + dsl::time_state_updated.eq(Utc::now()), + )) + .returning(Instance::as_returning()) + .get_result_async::( + &*self.pool_connection_authorized(opctx).await?, + ) .await - .map_err(|e| public_error_from_diesel(e, ErrorHandler::Server))?; - - let instance_updated = match result.instance_status { - Some(UpdateStatus::Updated) => true, - Some(UpdateStatus::NotUpdatedButExists) => false, - None => false, - }; + .map_err(|error| Error::Conflict { + message: MessagePair::new_full( + "another migration is already in progress".to_string(), + format!( + "cannot set migration ID {migration_id} for instance \ + {instance_id} (perhaps another migration ID is \ + already present): {error:#}" + ), + ), + }) + } - let vmm_updated = match result.vmm_status { - Some(UpdateStatus::Updated) => true, - Some(UpdateStatus::NotUpdatedButExists) => false, - None => false, - }; + /// Unsets the migration IDs set by + /// [`DataStore::instance_set_migration_ids`]. + /// + /// This method will only unset the instance's migration IDs if they match + /// the provided ones. + /// # Returns + /// + /// - `Ok(true)` if the migration IDs were unset, + /// - `Ok(false)` if the instance IDs have *already* been unset (this method + /// is idempotent) + /// - `Err` if the database query returned an error. + pub async fn instance_unset_migration_ids( + &self, + opctx: &OpContext, + instance_id: InstanceUuid, + migration_id: Uuid, + target_propolis_id: PropolisUuid, + ) -> Result { + use db::schema::instance::dsl; - let migration_updated = if migration.is_some() { - Some(match result.migration_status { - Some(UpdateStatus::Updated) => true, - Some(UpdateStatus::NotUpdatedButExists) => false, - None => false, + let instance_id = instance_id.into_untyped_uuid(); + let target_propolis_id = target_propolis_id.into_untyped_uuid(); + let updated = diesel::update(dsl::instance) + .filter(dsl::time_deleted.is_null()) + .filter(dsl::id.eq(instance_id)) + .filter(dsl::migration_id.eq(migration_id)) + .filter(dsl::target_propolis_id.eq(target_propolis_id)) + .set(( + dsl::migration_id.eq(None::), + dsl::target_propolis_id.eq(None::), + // advance the generation + dsl::state_generation.eq(dsl::state_generation + 1), + dsl::time_state_updated.eq(Utc::now()), + )) + .check_if_exists::(instance_id.into_untyped_uuid()) + .execute_and_check(&*self.pool_connection_authorized(&opctx).await?) + .await + .map(|r| match r.status { + UpdateStatus::Updated => true, + UpdateStatus::NotUpdatedButExists => false, }) - } else { - debug_assert_eq!(result.migration_status, None); - None - }; - - Ok(InstanceUpdateResult { - instance_updated, - vmm_updated, - migration_updated, - }) + .map_err(|e| { + public_error_from_diesel( + e, + ErrorHandler::NotFoundByLookup( + ResourceType::Instance, + LookupType::ById(instance_id), + ), + ) + })?; + Ok(updated) } /// Lists all instances on in-service sleds with active Propolis VMM @@ -706,23 +936,28 @@ impl DataStore { } /// Attempts to lock an instance's record to apply state updates in an - /// instance-update saga, returning the state of the instance when the lock - /// was acquired. + /// instance-update saga, returning an [`UpdaterLock`] if the lock is + /// successfully acquired. /// /// # Notes /// /// This method MUST only be called from the context of a saga! The /// calling saga must ensure that the reverse action for the action that /// acquires the lock must call [`DataStore::instance_updater_unlock`] to - /// ensure that the lock is always released if the saga unwinds. + /// ensure that the lock is always released if the saga unwinds. If the saga + /// locking the instance completes successfully, it must release the lock + /// using [`DataStore::instance_updater_unlock`], or use + /// [`DataStore::instance_commit_update`] to release the lock and write back + /// a new [`InstanceRuntimeState`] in a single atomic query. /// /// This method is idempotent: if the instance is already locked by the same /// saga, it will succeed, as though the lock was acquired. /// /// # Arguments /// - /// - `authz_instance`: the instance to attempt to lock to lock - /// - `saga_lock_id`: the UUID of the saga that's attempting to lock this + /// - `opctx`: the [`OpContext`] for this operation. + /// - `authz_instance`: the instance to attempt to lock. + /// - `updater_id`: the UUID of the saga that's attempting to lock this /// instance. /// /// # Returns @@ -737,7 +972,7 @@ impl DataStore { &self, opctx: &OpContext, authz_instance: &authz::Instance, - saga_lock_id: Uuid, + updater_id: Uuid, ) -> Result { use db::schema::instance::dsl; @@ -758,22 +993,21 @@ impl DataStore { // *same* instance at the same time. So, idempotency is probably more // important than handling that extremely unlikely edge case. let mut did_lock = false; + let mut locked_gen = instance.updater_gen; loop { match instance.updater_id { // If the `updater_id` field is not null and the ID equals this // saga's ID, we already have the lock. We're done here! - Some(lock_id) if lock_id == saga_lock_id => { - slog::info!( + Some(lock_id) if lock_id == updater_id => { + slog::debug!( &opctx.log, "instance updater lock acquired!"; "instance_id" => %instance_id, - "saga_id" => %saga_lock_id, + "updater_id" => %updater_id, + "locked_gen" => ?locked_gen, "already_locked" => !did_lock, ); - return Ok(UpdaterLock { - saga_lock_id, - locked_gen: instance.updater_gen, - }); + return Ok(UpdaterLock { updater_id, locked_gen }); } // The `updater_id` field is set, but it's not our ID. The instance // is locked by a different saga, so give up. @@ -783,7 +1017,7 @@ impl DataStore { "instance is locked by another saga"; "instance_id" => %instance_id, "locked_by" => %lock_id, - "saga_id" => %saga_lock_id, + "updater_id" => %updater_id, ); return Err(UpdaterLockError::AlreadyLocked); } @@ -794,11 +1028,12 @@ impl DataStore { // Okay, now attempt to acquire the lock let current_gen = instance.updater_gen; + locked_gen = Generation(current_gen.0.next()); slog::debug!( &opctx.log, "attempting to acquire instance updater lock"; "instance_id" => %instance_id, - "saga_id" => %saga_lock_id, + "updater_id" => %updater_id, "current_gen" => ?current_gen, ); @@ -816,8 +1051,8 @@ impl DataStore { // of a non-distributed, single-process mutex. .filter(dsl::updater_gen.eq(current_gen)) .set(( - dsl::updater_gen.eq(dsl::updater_gen + 1), - dsl::updater_id.eq(Some(saga_lock_id)), + dsl::updater_gen.eq(locked_gen), + dsl::updater_id.eq(Some(updater_id)), )) .check_if_exists::(instance_id) .execute_and_check( @@ -846,11 +1081,290 @@ impl DataStore { } } - /// Release the instance-updater lock acquired by - /// [`DataStore::instance_updater_lock`]. + /// Attempts to "inherit" the lock acquired by + /// [`DataStore::instance_updater_lock`] by setting a new `child_lock_id` as + /// the current updater, if (and only if) the lock is held by the provided + /// `parent_lock`. + /// + /// This essentially performs the equivalent of a [compare-exchange] + /// operation on the instance record's lock ID field, which succeeds if the + /// current lock ID matches the parent. Using this method ensures that, if a + /// parent saga starts multiple child sagas, only one of them can + /// successfully acquire the lock. + /// + /// # Notes + /// + /// This method MUST only be called from the context of a saga! The + /// calling saga must ensure that the reverse action for the action that + /// acquires the lock must call [`DataStore::instance_updater_unlock`] to + /// ensure that the lock is always released if the saga unwinds. If the saga + /// locking the instance completes successfully, it must release the lock + /// using [`DataStore::instance_updater_unlock`], or use + /// [`DataStore::instance_commit_update`] to release the lock and write back + /// a new [`InstanceRuntimeState`] in a single atomic query. + + /// + /// This method is idempotent: if the instance is already locked by the same + /// saga, it will succeed, as though the lock was acquired. + /// + /// # Arguments + /// + /// - `opctx`: the [`OpContext`] for this operation. + /// - `authz_instance`: the instance to attempt to inherit the lock on. + /// - `parent_lock`: the [`UpdaterLock`] to attempt to inherit the lock + /// from. If the current updater UUID and generation matches this, the + /// lock can be inherited by `child_id`. + /// - `child_lock_id`: the UUID of the saga that's attempting to lock this + /// instance. + /// + /// # Returns + /// + /// - [`Ok`]`(`[`UpdaterLock`]`)` if the lock was successfully inherited. + /// - [`Err`]`([`UpdaterLockError::AlreadyLocked`])` if the instance was + /// locked by a different saga, other than the provided `parent_lock`. + /// - [`Err`]`([`UpdaterLockError::Query`]`(...))` if the query to fetch + /// the instance or lock it returned another error (such as if the + /// instance no longer exists, or if the database connection failed). + pub async fn instance_updater_inherit_lock( + &self, + opctx: &OpContext, + authz_instance: &authz::Instance, + parent_lock: UpdaterLock, + child_lock_id: Uuid, + ) -> Result { + use db::schema::instance::dsl; + let UpdaterLock { updater_id: parent_id, locked_gen } = parent_lock; + let instance_id = authz_instance.id(); + let new_gen = Generation(locked_gen.0.next()); + + let result = diesel::update(dsl::instance) + .filter(dsl::time_deleted.is_null()) + .filter(dsl::id.eq(instance_id)) + .filter(dsl::updater_gen.eq(locked_gen)) + .filter(dsl::updater_id.eq(parent_id)) + .set(( + dsl::updater_gen.eq(new_gen), + dsl::updater_id.eq(Some(child_lock_id)), + )) + .check_if_exists::(instance_id) + .execute_and_check(&*self.pool_connection_authorized(opctx).await?) + .await + .map_err(|e| { + public_error_from_diesel( + e, + ErrorHandler::NotFoundByLookup( + ResourceType::Instance, + LookupType::ById(instance_id), + ), + ) + })?; + + match result { + // If we updated the record, the lock has been successfully + // inherited! Return `Ok(true)` to indicate that we have acquired + // the lock successfully. + UpdateAndQueryResult { status: UpdateStatus::Updated, .. } => { + slog::debug!( + &opctx.log, + "inherited lock from {parent_id} to {child_lock_id}"; + "instance_id" => %instance_id, + "updater_id" => %child_lock_id, + "locked_gen" => ?new_gen, + "parent_id" => %parent_id, + "parent_gen" => ?locked_gen, + ); + Ok(UpdaterLock { + updater_id: child_lock_id, + locked_gen: new_gen, + }) + } + // The generation has advanced past the generation at which the + // lock was held. This means that we have already inherited the + // lock. Return `Ok(false)` here for idempotency. + UpdateAndQueryResult { + status: UpdateStatus::NotUpdatedButExists, + ref found, + } if found.updater_id == Some(child_lock_id) => { + slog::debug!( + &opctx.log, + "previously inherited lock from {parent_id} to \ + {child_lock_id}"; + "instance_id" => %instance_id, + "updater_id" => %child_lock_id, + "locked_gen" => ?found.updater_gen, + "parent_id" => %parent_id, + "parent_gen" => ?locked_gen, + ); + debug_assert_eq!(found.updater_gen, new_gen); + Ok(UpdaterLock { + updater_id: child_lock_id, + locked_gen: new_gen, + }) + } + // The instance exists, but it's locked by a different saga than the + // parent we were trying to inherit the lock from. We cannot acquire + // the lock at this time. + UpdateAndQueryResult { ref found, .. } => { + slog::debug!( + &opctx.log, + "cannot inherit instance-updater lock from {parent_id} to \ + {child_lock_id}: this instance is not locked by the \ + expected parent saga"; + "instance_id" => %instance_id, + "updater_id" => %child_lock_id, + "parent_id" => %parent_id, + "actual_lock_id" => ?found.updater_id, + ); + Err(UpdaterLockError::AlreadyLocked) + } + } + } + + /// Release the instance-updater lock on this instance, if (and only if) the + /// lock is currently held by the saga represented by the provided + /// [`UpdaterLock`] token. + pub async fn instance_updater_unlock( + &self, + opctx: &OpContext, + authz_instance: &authz::Instance, + lock: &UpdaterLock, + ) -> Result { + use db::schema::instance::dsl; + + let instance_id = authz_instance.id(); + let UpdaterLock { updater_id, locked_gen } = *lock; + + let result = diesel::update(dsl::instance) + .filter(dsl::time_deleted.is_null()) + .filter(dsl::id.eq(instance_id)) + // Only unlock the instance if: + // - the provided updater ID matches that of the saga that has + // currently locked this instance. + .filter(dsl::updater_id.eq(Some(updater_id))) + // - the provided updater generation matches the current updater + // generation. + .filter(dsl::updater_gen.eq(locked_gen)) + .set(( + dsl::updater_gen.eq(Generation(locked_gen.0.next())), + dsl::updater_id.eq(None::), + )) + .check_if_exists::(instance_id) + .execute_and_check(&*self.pool_connection_authorized(opctx).await?) + .await + .map_err(|e| { + public_error_from_diesel( + e, + ErrorHandler::NotFoundByLookup( + ResourceType::Instance, + LookupType::ById(instance_id), + ), + ) + })?; + + match result { + // If we updated the record, the lock has been released! Return + // `Ok(true)` to indicate that we released the lock successfully. + UpdateAndQueryResult { status: UpdateStatus::Updated, .. } => { + return Ok(true); + } + + // The instance exists, but we didn't unlock it. In almost all + // cases, that's actually *fine*, since this suggests we didn't + // actually have the lock to release, so we don't need to worry + // about unlocking the instance. However, depending on the + // particular reason we didn't actually unlock the instance, this + // may be more or less likely to indicate a bug. Remember that saga + // actions --- even unwind actions --- must be idempotent, so we + // *may* just be trying to unlock an instance we already + // successfully unlocked, which is fine. + UpdateAndQueryResult { ref found, .. } + if found.time_deleted().is_some() => + { + debug!( + &opctx.log, + "attempted to unlock an instance that has been deleted"; + "instance_id" => %instance_id, + "updater_id" => %updater_id, + "time_deleted" => ?found.time_deleted(), + ); + return Ok(false); + } + + // If the instance is no longer locked by this saga, that's probably fine. + // We don't need to unlock it. + UpdateAndQueryResult { ref found, .. } + if found.updater_id != Some(updater_id) => + { + if found.updater_gen > locked_gen { + // The generation has advanced past the generation where we + // acquired the lock. That's totally fine: a previous + // execution of the same saga action must have unlocked it, + // and now it is either unlocked, or locked by a different + // saga. + debug!( + &opctx.log, + "attempted to unlock an instance that is no longer \ + locked by this saga"; + "instance_id" => %instance_id, + "updater_id" => %updater_id, + "actual_id" => ?found.updater_id.as_ref(), + "found_gen" => ?found.updater_gen, + "locked_gen" => ?locked_gen, + ); + } else { + // On the other hand, if the generation is less than or + // equal to the generation at which we locked the instance, + // that eems kinda suspicious --- perhaps we believed we + // held the lock, but didn't actually, which could be + // programmer error. + // + // However, this *could* conceivably happen: the same saga + // node could have executed previously and released the + // lock, and then the generation counter advanced enough + // times to wrap around, and then the same action tried to + // release its lock again. 64-bit generation counters + // overflowing in an instance's lifetime seems unlikely, but + // nothing is impossible... + warn!( + &opctx.log, + "attempted to release a lock held by another saga \ + at the same generation! this seems suspicious..."; + "instance_id" => %instance_id, + "updater_id" => %updater_id, + "actual_id" => ?found.updater_id.as_ref(), + "found_gen" => ?found.updater_gen, + "locked_gen" => ?locked_gen, + ); + } + + Ok(false) + } + + // If we *are* still holding the lock, we must be trying to + // release it at the wrong generation. That seems quite + // suspicious. + UpdateAndQueryResult { ref found, .. } => { + warn!( + &opctx.log, + "attempted to release a lock at the wrong generation"; + "instance_id" => %instance_id, + "updater_id" => %updater_id, + "found_gen" => ?found.updater_gen, + "locked_gen" => ?locked_gen, + ); + Err(Error::internal_error( + "instance is locked by this saga, but at a different \ + generation", + )) + } + } + } + + /// Write the provided `new_runtime_state` for this instance, and release + /// the provided `lock`. /// /// This method will unlock the instance if (and only if) the lock is - /// currently held by the provided `saga_lock_id`. If the lock is held by a + /// currently held by the provided `updater_id`. If the lock is held by a /// different saga UUID, the instance will remain locked. If the instance /// has already been unlocked, this method will return `false`. /// @@ -859,15 +1373,20 @@ impl DataStore { /// - `authz_instance`: the instance to attempt to unlock /// - `updater_lock`: an [`UpdaterLock`] token representing the acquired /// lock to release. - pub async fn instance_updater_unlock( + /// - `new_runtime`: an [`InstanceRuntimeState`] to write + /// back to the database when the lock is released. If this is [`None`], + /// the instance's runtime state will not be modified. + pub async fn instance_commit_update( &self, opctx: &OpContext, authz_instance: &authz::Instance, - UpdaterLock { saga_lock_id, locked_gen }: UpdaterLock, + lock: &UpdaterLock, + new_runtime: &InstanceRuntimeState, ) -> Result { use db::schema::instance::dsl; let instance_id = authz_instance.id(); + let UpdaterLock { updater_id, locked_gen } = *lock; let result = diesel::update(dsl::instance) .filter(dsl::time_deleted.is_null()) @@ -875,13 +1394,15 @@ impl DataStore { // Only unlock the instance if: // - the provided updater ID matches that of the saga that has // currently locked this instance. - .filter(dsl::updater_id.eq(Some(saga_lock_id))) + .filter(dsl::updater_id.eq(Some(updater_id))) // - the provided updater generation matches the current updater // generation. .filter(dsl::updater_gen.eq(locked_gen)) + .filter(dsl::state_generation.lt(new_runtime.r#gen)) .set(( dsl::updater_gen.eq(Generation(locked_gen.0.next())), dsl::updater_id.eq(None::), + new_runtime.clone(), )) .check_if_exists::(instance_id) .execute_and_check(&*self.pool_connection_authorized(opctx).await?) @@ -896,49 +1417,127 @@ impl DataStore { ) })?; + // The expected state generation number of the instance record *before* + // applying the update. + let prev_state_gen = u64::from(new_runtime.r#gen.0).saturating_sub(1); match result { // If we updated the record, the lock has been released! Return // `Ok(true)` to indicate that we released the lock successfully. UpdateAndQueryResult { status: UpdateStatus::Updated, .. } => { Ok(true) } - // The generation has advanced past the generation at which the - // lock was held. This means that we have already released the - // lock. Return `Ok(false)` here for idempotency. - UpdateAndQueryResult { - status: UpdateStatus::NotUpdatedButExists, - ref found, - } if found.updater_gen > locked_gen => Ok(false), - // The instance exists, but the lock ID doesn't match our lock ID. - // This means we were trying to release a lock we never held, whcih - // is almost certainly a programmer error. - UpdateAndQueryResult { ref found, .. } => { - match found.updater_id { - Some(lock_holder) => { - debug_assert_ne!(lock_holder, saga_lock_id); - Err(Error::internal_error( - "attempted to release a lock held by another saga! this is a bug!", - )) - }, - None => Err(Error::internal_error( - "attempted to release a lock on an instance that is not locked! this is a bug!", - )), - } + + // The instance has been marked as deleted, so no updates were + // committed! + UpdateAndQueryResult { ref found, .. } + if found.time_deleted().is_some() => + { + warn!( + &opctx.log, + "cannot commit instance update, as the instance no longer \ + exists"; + "instance_id" => %instance_id, + "updater_id" => %updater_id, + "time_deleted" => ?found.time_deleted() + ); + + Err(LookupType::ById(instance_id) + .into_not_found(ResourceType::Instance)) } - } - } -} -#[cfg(test)] -mod tests { - use super::*; - use crate::db::datastore::test_utils::datastore_test; - use crate::db::lookup::LookupPath; + // The instance exists, but both the lock generation *and* the state + // generation no longer matches ours. That's fine --- presumably, + // another execution of the same saga action has already updated the + // instance record. + UpdateAndQueryResult { ref found, .. } + if u64::from(found.runtime().r#gen.0) != prev_state_gen + && found.updater_gen != locked_gen => + { + debug_assert_ne!(found.updater_id, Some(updater_id)); + debug!( + &opctx.log, + "cannot commit instance updates, as the state generation \ + and lock generation have advanced: the required updates \ + have probably already been committed."; + "instance_id" => %instance_id, + "expected_state_gen" => ?new_runtime.r#gen, + "actual_state_gen" => ?found.runtime().r#gen, + "updater_id" => %updater_id, + "updater_gen" => ?locked_gen, + "actual_updater_gen" => ?found.updater_gen, + ); + Ok(false) + } + + // The state generation has advanced, but the instance is *still* + // locked by this saga. That's bad --- this update saga may no + // longer update the instance, as its state has changed, potentially + // invalidating the updates. We need to unwind. + UpdateAndQueryResult { ref found, .. } + if u64::from(found.runtime().r#gen.0) != prev_state_gen + && found.updater_gen == locked_gen + && found.updater_id == Some(updater_id) => + { + info!( + &opctx.log, + "cannot commit instance update, as the state generation \ + has advanced, potentially invalidating the update"; + "instance_id" => %instance_id, + "expected_state_gen" => ?new_runtime.r#gen, + "actual_state_gen" => ?found.runtime().r#gen, + ); + Err(Error::conflict("instance state has changed")) + } + + // The instance exists, but we could not update it because the lock + // did not match. + UpdateAndQueryResult { ref found, .. } => match found.updater_id { + Some(actual_id) => { + const MSG: &'static str = + "cannot commit instance updates: the instance is \ + locked by another saga!"; + error!( + &opctx.log, + "{MSG}"; + "instance_id" => %instance_id, + "updater_id" => %updater_id, + "actual_id" => %actual_id, + "found_gen" => ?found.updater_gen, + "locked_gen" => ?locked_gen, + ); + Err(Error::internal_error(MSG)) + } + None => { + const MSG: &'static str = + "cannot commit instance updates: the instance is \ + not locked"; + error!( + &opctx.log, + "{MSG}"; + "instance_id" => %instance_id, + "updater_id" => %updater_id, + "found_gen" => ?found.updater_gen, + "locked_gen" => ?locked_gen, + ); + Err(Error::internal_error(MSG)) + } + }, + } + } +} + +#[cfg(test)] +mod tests { + use super::*; + use crate::db::datastore::test_utils::datastore_test; + use crate::db::lookup::LookupPath; use nexus_db_model::InstanceState; use nexus_db_model::Project; + use nexus_db_model::VmmRuntimeState; use nexus_db_model::VmmState; use nexus_test_utils::db::test_setup_database; use nexus_types::external_api::params; + use omicron_common::api::external; use omicron_common::api::external::ByteCount; use omicron_common::api::external::IdentityMetadataCreateParams; use omicron_test_utils::dev; @@ -1025,7 +1624,7 @@ mod tests { stringify!($id) )); assert_eq!( - lock.saga_lock_id, + lock.updater_id, $id, "instance's `updater_id` must be set to {}", stringify!($id), @@ -1055,7 +1654,7 @@ mod tests { // unlock the instance from saga 1 let unlocked = datastore - .instance_updater_unlock(&opctx, &authz_instance, lock1) + .instance_updater_unlock(&opctx, &authz_instance, &lock1) .await .expect("instance must be unlocked by saga 1"); assert!(unlocked, "instance must actually be unlocked"); @@ -1068,7 +1667,7 @@ mod tests { // unlock the instance from saga 2 let unlocked = datastore - .instance_updater_unlock(&opctx, &authz_instance, lock2) + .instance_updater_unlock(&opctx, &authz_instance, &lock2) .await .expect("instance must be unlocked by saga 2"); assert!(unlocked, "instance must actually be unlocked"); @@ -1095,7 +1694,7 @@ mod tests { .await ) .expect("instance should be locked"); - assert_eq!(lock1.saga_lock_id, saga1); + assert_eq!(lock1.updater_id, saga1); // doing it again should be fine. let lock2 = dbg!( @@ -1106,7 +1705,7 @@ mod tests { .expect( "instance_updater_lock should succeed again with the same saga ID", ); - assert_eq!(lock2.saga_lock_id, saga1); + assert_eq!(lock2.updater_id, saga1); // the generation should not have changed as a result of the second // update. assert_eq!(lock1.locked_gen, lock2.locked_gen); @@ -1114,7 +1713,7 @@ mod tests { // now, unlock the instance. let unlocked = dbg!( datastore - .instance_updater_unlock(&opctx, &authz_instance, lock1) + .instance_updater_unlock(&opctx, &authz_instance, &lock1) .await ) .expect("instance should unlock"); @@ -1123,7 +1722,7 @@ mod tests { // unlocking it again should also succeed... let unlocked = dbg!( datastore - .instance_updater_unlock(&opctx, &authz_instance, lock2) + .instance_updater_unlock(&opctx, &authz_instance, &lock2,) .await ) .expect("instance should unlock again"); @@ -1136,10 +1735,10 @@ mod tests { } #[tokio::test] - async fn test_instance_updater_unlocking_someone_elses_instance_errors() { + async fn test_instance_updater_cant_unlock_someone_elses_instance_() { // Setup let logctx = dev::test_setup_log( - "test_instance_updater_unlocking_someone_elses_instance_errors", + "test_instance_updater_cant_unlock_someone_elses_instance_", ); let mut db = test_setup_database(&logctx.log).await; let (opctx, datastore) = datastore_test(&logctx, &db).await; @@ -1155,8 +1754,8 @@ mod tests { ) .expect("instance should be locked"); - // attempting to unlock with a different saga ID should be an error. - let err = dbg!( + // attempting to unlock with a different saga ID shouldn't do anything. + let unlocked = dbg!( datastore .instance_updater_unlock( &opctx, @@ -1166,37 +1765,42 @@ mod tests { // what we're doing here. But this simulates a case where // an incorrect one is constructed, or a raw database query // attempts an invalid unlock operation. - UpdaterLock { - saga_lock_id: saga2, + &UpdaterLock { + updater_id: saga2, locked_gen: lock1.locked_gen, }, ) .await ) - .expect_err( - "unlocking the instance with someone else's ID should fail", - ); - assert_eq!( - err, - Error::internal_error( - "attempted to release a lock held by another saga! \ - this is a bug!", - ), - ); + .unwrap(); + assert!(!unlocked); + + let instance = + dbg!(datastore.instance_refetch(&opctx, &authz_instance).await) + .expect("instance should exist"); + assert_eq!(instance.updater_id, Some(saga1)); + assert_eq!(instance.updater_gen, lock1.locked_gen); + let next_gen = Generation(lock1.locked_gen.0.next()); // unlocking with the correct ID should succeed. let unlocked = dbg!( datastore - .instance_updater_unlock(&opctx, &authz_instance, lock1) + .instance_updater_unlock(&opctx, &authz_instance, &lock1) .await ) .expect("instance should unlock"); assert!(unlocked, "instance should have unlocked"); + let instance = + dbg!(datastore.instance_refetch(&opctx, &authz_instance).await) + .expect("instance should exist"); + assert_eq!(instance.updater_id, None); + assert_eq!(instance.updater_gen, next_gen); + // unlocking with the lock holder's ID *again* at a new generation - // (where the lock is no longer held) should fail. - let err = dbg!( + // (where the lock is no longer held) shouldn't do anything + let unlocked = dbg!( datastore .instance_updater_unlock( &opctx, @@ -1204,20 +1808,234 @@ mod tests { // Again, these fields are private specifically to prevent // you from doing this exact thing. But, we should still // test that we handle it gracefully. - UpdaterLock { saga_lock_id: saga1, locked_gen: next_gen }, + &UpdaterLock { updater_id: saga1, locked_gen: next_gen }, + ) + .await + ) + .unwrap(); + assert!(!unlocked); + + // Clean up. + db.cleanup().await.unwrap(); + logctx.cleanup_successful(); + } + + #[tokio::test] + async fn test_unlocking_a_deleted_instance_is_okay() { + // Setup + let logctx = + dev::test_setup_log("test_unlocking_a_deleted_instance_is_okay"); + let mut db = test_setup_database(&logctx.log).await; + let (opctx, datastore) = datastore_test(&logctx, &db).await; + let authz_instance = create_test_instance(&datastore, &opctx).await; + let saga1 = Uuid::new_v4(); + + // put the instance in a state where it will be okay to delete later... + datastore + .instance_update_runtime( + &InstanceUuid::from_untyped_uuid(authz_instance.id()), + &InstanceRuntimeState { + time_updated: Utc::now(), + r#gen: Generation(external::Generation::from_u32(2)), + propolis_id: None, + dst_propolis_id: None, + migration_id: None, + nexus_state: InstanceState::NoVmm, + }, + ) + .await + .expect("should update state successfully"); + + // lock the instance once. + let lock = dbg!( + datastore + .instance_updater_lock(&opctx, &authz_instance, saga1) + .await + ) + .expect("instance should be locked"); + + // mark the instance as deleted + dbg!(datastore.project_delete_instance(&opctx, &authz_instance).await) + .expect("instance should be deleted"); + + // unlocking should still succeed. + dbg!( + datastore + .instance_updater_unlock(&opctx, &authz_instance, &lock) + .await + ) + .expect("instance should unlock"); + + // Clean up. + db.cleanup().await.unwrap(); + logctx.cleanup_successful(); + } + + #[tokio::test] + async fn test_instance_commit_update_is_idempotent() { + // Setup + let logctx = + dev::test_setup_log("test_instance_commit_update_is_idempotent"); + let mut db = test_setup_database(&logctx.log).await; + let (opctx, datastore) = datastore_test(&logctx, &db).await; + let authz_instance = create_test_instance(&datastore, &opctx).await; + let saga1 = Uuid::new_v4(); + + // lock the instance once. + let lock = dbg!( + datastore + .instance_updater_lock(&opctx, &authz_instance, saga1) + .await + ) + .expect("instance should be locked"); + let new_runtime = &InstanceRuntimeState { + time_updated: Utc::now(), + r#gen: Generation(external::Generation::from_u32(2)), + propolis_id: Some(Uuid::new_v4()), + dst_propolis_id: None, + migration_id: None, + nexus_state: InstanceState::Vmm, + }; + + let updated = dbg!( + datastore + .instance_commit_update( + &opctx, + &authz_instance, + &lock, + &new_runtime + ) + .await + ) + .expect("instance_commit_update should succeed"); + assert!(updated, "it should be updated"); + + // okay, let's do it again at the same generation. + let updated = dbg!( + datastore + .instance_commit_update( + &opctx, + &authz_instance, + &lock, + &new_runtime + ) + .await + ) + .expect("instance_commit_update should succeed"); + assert!(!updated, "it was already updated"); + let instance = + dbg!(datastore.instance_refetch(&opctx, &authz_instance).await) + .expect("instance should exist"); + assert_eq!(instance.runtime().propolis_id, new_runtime.propolis_id); + assert_eq!(instance.runtime().r#gen, new_runtime.r#gen); + + // Doing it again at the same generation with a *different* state + // shouldn't change the instance at all. + let updated = dbg!( + datastore + .instance_commit_update( + &opctx, + &authz_instance, + &lock, + &InstanceRuntimeState { + propolis_id: Some(Uuid::new_v4()), + migration_id: Some(Uuid::new_v4()), + dst_propolis_id: Some(Uuid::new_v4()), + ..new_runtime.clone() + } + ) + .await + ) + .expect("instance_commit_update should succeed"); + assert!(!updated, "it was already updated"); + let instance = + dbg!(datastore.instance_refetch(&opctx, &authz_instance).await) + .expect("instance should exist"); + assert_eq!(instance.runtime().propolis_id, new_runtime.propolis_id); + assert_eq!(instance.runtime().dst_propolis_id, None); + assert_eq!(instance.runtime().migration_id, None); + assert_eq!(instance.runtime().r#gen, new_runtime.r#gen); + + // Clean up. + db.cleanup().await.unwrap(); + logctx.cleanup_successful(); + } + + #[tokio::test] + async fn test_instance_update_invalidated_while_locked() { + // Setup + let logctx = dev::test_setup_log( + "test_instance_update_invalidated_while_locked", + ); + let mut db = test_setup_database(&logctx.log).await; + let (opctx, datastore) = datastore_test(&logctx, &db).await; + let authz_instance = create_test_instance(&datastore, &opctx).await; + let saga1 = Uuid::new_v4(); + + // Lock the instance + let lock = dbg!( + datastore + .instance_updater_lock(&opctx, &authz_instance, saga1) + .await + ) + .expect("instance should be locked"); + + // Mutate the instance state, invalidating the state when the lock was + // acquired. + let new_runtime = &InstanceRuntimeState { + time_updated: Utc::now(), + r#gen: Generation(external::Generation::from_u32(2)), + propolis_id: Some(Uuid::new_v4()), + dst_propolis_id: Some(Uuid::new_v4()), + migration_id: Some(Uuid::new_v4()), + nexus_state: InstanceState::Vmm, + }; + let updated = dbg!( + datastore + .instance_update_runtime( + &InstanceUuid::from_untyped_uuid(authz_instance.id()), + &new_runtime + ) + .await + ) + .expect("instance_update_runtime should succeed"); + assert!(updated, "it should be updated"); + + // Okay, now try to commit the result of an update saga. This must fail, + // because the state generation has changed while we had locked the + // instance. + let _err = dbg!( + datastore + .instance_commit_update( + &opctx, + &authz_instance, + &lock, + &InstanceRuntimeState { + time_updated: Utc::now(), + r#gen: Generation(external::Generation::from_u32(2)), + propolis_id: None, + dst_propolis_id: None, + migration_id: None, + nexus_state: InstanceState::NoVmm, + }, ) .await ) .expect_err( - "unlocking the instance with someone else's ID should fail", + "instance_commit_update should fail if the state generation is \ + stale", ); + + let instance = + dbg!(datastore.instance_refetch(&opctx, &authz_instance).await) + .expect("instance should exist"); + assert_eq!(instance.runtime().propolis_id, new_runtime.propolis_id); assert_eq!( - err, - Error::internal_error( - "attempted to release a lock on an instance \ - that is not locked! this is a bug!" - ), + instance.runtime().dst_propolis_id, + new_runtime.dst_propolis_id ); + assert_eq!(instance.runtime().migration_id, new_runtime.migration_id); + assert_eq!(instance.runtime().nexus_state, new_runtime.nexus_state); // Clean up. db.cleanup().await.unwrap(); @@ -1395,4 +2213,264 @@ mod tests { db.cleanup().await.unwrap(); logctx.cleanup_successful(); } + + #[tokio::test] + async fn test_instance_set_migration_ids() { + // Setup + let logctx = dev::test_setup_log("test_instance_set_migration_ids"); + let mut db = test_setup_database(&logctx.log).await; + let (opctx, datastore) = datastore_test(&logctx, &db).await; + let authz_instance = create_test_instance(&datastore, &opctx).await; + + // Create the first VMM in a state where `set_migration_ids` should + // *fail* (Stopped). We will assert that we cannot set the migration + // IDs, and then advance it to Running, when we can start the migration. + let vmm1 = datastore + .vmm_insert( + &opctx, + Vmm { + id: Uuid::new_v4(), + time_created: Utc::now(), + time_deleted: None, + instance_id: authz_instance.id(), + sled_id: Uuid::new_v4(), + propolis_ip: "10.1.9.32".parse().unwrap(), + propolis_port: 420.into(), + runtime: VmmRuntimeState { + time_state_updated: Utc::now(), + r#gen: Generation::new(), + state: VmmState::Stopped, + }, + }, + ) + .await + .expect("active VMM should be inserted successfully!"); + + let instance_id = InstanceUuid::from_untyped_uuid(authz_instance.id()); + let instance = datastore + .instance_refetch(&opctx, &authz_instance) + .await + .expect("instance should be there"); + datastore + .instance_update_runtime( + &instance_id, + &InstanceRuntimeState { + time_updated: Utc::now(), + r#gen: Generation(instance.runtime_state.gen.0.next()), + nexus_state: InstanceState::Vmm, + propolis_id: Some(vmm1.id), + ..instance.runtime_state.clone() + }, + ) + .await + .expect("instance update should work"); + + let vmm2 = datastore + .vmm_insert( + &opctx, + Vmm { + id: Uuid::new_v4(), + time_created: Utc::now(), + time_deleted: None, + instance_id: authz_instance.id(), + sled_id: Uuid::new_v4(), + propolis_ip: "10.1.9.42".parse().unwrap(), + propolis_port: 420.into(), + runtime: VmmRuntimeState { + time_state_updated: Utc::now(), + r#gen: Generation::new(), + state: VmmState::Running, + }, + }, + ) + .await + .expect("second VMM should insert"); + + // make a migration... + let migration = datastore + .migration_insert( + &opctx, + Migration::new(Uuid::new_v4(), instance_id, vmm1.id, vmm2.id), + ) + .await + .expect("migration should be inserted successfully!"); + + // Our first attempt to set migration IDs should fail, because the + // active VMM is Stopped. + let res = dbg!( + datastore + .instance_set_migration_ids( + &opctx, + instance_id, + PropolisUuid::from_untyped_uuid(vmm1.id), + migration.id, + PropolisUuid::from_untyped_uuid(vmm2.id), + ) + .await + ); + assert!(res.is_err()); + + // Okay, now, advance the active VMM to Running, and try again. + let updated = dbg!( + datastore + .vmm_update_runtime( + &PropolisUuid::from_untyped_uuid(vmm1.id), + &VmmRuntimeState { + time_state_updated: Utc::now(), + r#gen: Generation(vmm2.runtime.r#gen.0.next()), + state: VmmState::Running, + }, + ) + .await + ) + .expect("updating VMM state should be fine"); + assert!(updated); + + // Now, it should work! + let instance = dbg!( + datastore + .instance_set_migration_ids( + &opctx, + instance_id, + PropolisUuid::from_untyped_uuid(vmm1.id), + migration.id, + PropolisUuid::from_untyped_uuid(vmm2.id), + ) + .await + ) + .expect("setting migration IDs should succeed"); + assert_eq!(instance.runtime().dst_propolis_id, Some(vmm2.id)); + assert_eq!(instance.runtime().migration_id, Some(migration.id)); + + // Doing it again should be idempotent, and the instance record + // shouldn't change. + let instance2 = dbg!( + datastore + .instance_set_migration_ids( + &opctx, + instance_id, + PropolisUuid::from_untyped_uuid(vmm1.id), + migration.id, + PropolisUuid::from_untyped_uuid(vmm2.id), + ) + .await + ) + .expect("setting the same migration IDs a second time should succeed"); + assert_eq!( + instance.runtime().dst_propolis_id, + instance2.runtime().dst_propolis_id + ); + assert_eq!( + instance.runtime().migration_id, + instance2.runtime().migration_id + ); + + // Trying to set a new migration should fail, as long as the prior stuff + // is still in place. + let vmm3 = datastore + .vmm_insert( + &opctx, + Vmm { + id: Uuid::new_v4(), + time_created: Utc::now(), + time_deleted: None, + instance_id: authz_instance.id(), + sled_id: Uuid::new_v4(), + propolis_ip: "10.1.9.42".parse().unwrap(), + propolis_port: 420.into(), + runtime: VmmRuntimeState { + time_state_updated: Utc::now(), + r#gen: Generation::new(), + state: VmmState::Running, + }, + }, + ) + .await + .expect("third VMM should insert"); + let migration2 = datastore + .migration_insert( + &opctx, + Migration::new(Uuid::new_v4(), instance_id, vmm1.id, vmm3.id), + ) + .await + .expect("migration should be inserted successfully!"); + dbg!( + datastore + .instance_set_migration_ids( + &opctx, + instance_id, + PropolisUuid::from_untyped_uuid(vmm1.id), + migration2.id, + PropolisUuid::from_untyped_uuid(vmm3.id), + ) + .await + ) + .expect_err( + "trying to set migration IDs should fail when a previous \ + migration and VMM are still there", + ); + + // Pretend the previous migration saga has unwound the VMM + let updated = dbg!( + datastore + .vmm_update_runtime( + &PropolisUuid::from_untyped_uuid(vmm2.id), + &VmmRuntimeState { + time_state_updated: Utc::now(), + r#gen: Generation(vmm2.runtime.r#gen.0.next().next()), + state: VmmState::SagaUnwound, + }, + ) + .await + ) + .expect("updating VMM state should be fine"); + assert!(updated); + + // It should still fail, since the migration is still in progress. + dbg!( + datastore + .instance_set_migration_ids( + &opctx, + instance_id, + PropolisUuid::from_untyped_uuid(vmm1.id), + migration2.id, + PropolisUuid::from_untyped_uuid(vmm3.id), + ) + .await + ) + .expect_err( + "trying to set migration IDs should fail when a previous \ + migration ID is present and not marked as failed", + ); + + // Now, mark the previous migration as Failed. + let updated = dbg!(datastore + .migration_mark_failed(&opctx, migration.id) + .await + .expect( + "we should be able to mark the previous migration as failed" + )); + assert!(updated); + + // If the current migration is failed on both sides *and* the current + // VMM is SagaUnwound, we should be able to clobber them with new IDs. + let instance = dbg!( + datastore + .instance_set_migration_ids( + &opctx, + instance_id, + PropolisUuid::from_untyped_uuid(vmm1.id), + migration2.id, + PropolisUuid::from_untyped_uuid(vmm3.id), + ) + .await + ) + .expect("replacing SagaUnwound VMM should work"); + assert_eq!(instance.runtime().migration_id, Some(migration2.id)); + assert_eq!(instance.runtime().dst_propolis_id, Some(vmm3.id)); + + // Clean up. + db.cleanup().await.unwrap(); + logctx.cleanup_successful(); + } } diff --git a/nexus/db-queries/src/db/datastore/migration.rs b/nexus/db-queries/src/db/datastore/migration.rs index 5efe88e83f..128239503c 100644 --- a/nexus/db-queries/src/db/datastore/migration.rs +++ b/nexus/db-queries/src/db/datastore/migration.rs @@ -6,12 +6,16 @@ use super::DataStore; use crate::context::OpContext; +use crate::db; use crate::db::error::public_error_from_diesel; use crate::db::error::ErrorHandler; -use crate::db::model::{Migration, MigrationState}; +use crate::db::model::Generation; +use crate::db::model::Migration; +use crate::db::model::MigrationState; use crate::db::pagination::paginated; use crate::db::schema::migration::dsl; use crate::db::update_and_check::UpdateAndCheck; +use crate::db::update_and_check::UpdateAndQueryResult; use crate::db::update_and_check::UpdateStatus; use async_bb8_diesel::AsyncRunQueryDsl; use chrono::Utc; @@ -23,6 +27,7 @@ use omicron_common::api::external::UpdateResult; use omicron_common::api::internal::nexus; use omicron_uuid_kinds::GenericUuid; use omicron_uuid_kinds::InstanceUuid; +use omicron_uuid_kinds::PropolisUuid; use uuid::Uuid; impl DataStore { @@ -76,24 +81,24 @@ impl DataStore { .map_err(|e| public_error_from_diesel(e, ErrorHandler::Server)) } - /// Marks a migration record as deleted if and only if both sides of the - /// migration are in a terminal state. - pub async fn migration_terminate( + /// Marks a migration record as failed. + pub async fn migration_mark_failed( &self, opctx: &OpContext, migration_id: Uuid, ) -> UpdateResult { - const TERMINAL_STATES: &[MigrationState] = &[ - MigrationState(nexus::MigrationState::Completed), - MigrationState(nexus::MigrationState::Failed), - ]; - + let failed = MigrationState(nexus::MigrationState::Failed); diesel::update(dsl::migration) .filter(dsl::id.eq(migration_id)) .filter(dsl::time_deleted.is_null()) - .filter(dsl::source_state.eq_any(TERMINAL_STATES)) - .filter(dsl::target_state.eq_any(TERMINAL_STATES)) - .set(dsl::time_deleted.eq(Utc::now())) + .set(( + dsl::source_state.eq(failed), + dsl::source_gen.eq(dsl::source_gen + 1), + dsl::time_source_updated.eq(Utc::now()), + dsl::target_state.eq(failed), + dsl::target_gen.eq(dsl::target_gen + 1), + dsl::time_target_updated.eq(Utc::now()), + )) .check_if_exists::(migration_id) .execute_and_check(&*self.pool_connection_authorized(opctx).await?) .await @@ -105,10 +110,6 @@ impl DataStore { } /// Unconditionally mark a migration record as deleted. - /// - /// This is distinct from [`DataStore::migration_terminate`], as it will - /// mark a migration as deleted regardless of the states of the source and - /// target VMMs. pub async fn migration_mark_deleted( &self, opctx: &OpContext, @@ -127,6 +128,50 @@ impl DataStore { }) .map_err(|e| public_error_from_diesel(e, ErrorHandler::Server)) } + + pub(crate) async fn migration_update_source_on_connection( + &self, + conn: &async_bb8_diesel::Connection, + vmm_id: &PropolisUuid, + migration: &nexus::MigrationRuntimeState, + ) -> Result, diesel::result::Error> { + let generation = Generation(migration.r#gen); + diesel::update(dsl::migration) + .filter(dsl::id.eq(migration.migration_id)) + .filter(dsl::time_deleted.is_null()) + .filter(dsl::source_gen.lt(generation)) + .filter(dsl::source_propolis_id.eq(vmm_id.into_untyped_uuid())) + .set(( + dsl::source_state.eq(MigrationState(migration.state)), + dsl::source_gen.eq(generation), + dsl::time_source_updated.eq(migration.time_updated), + )) + .check_if_exists::(migration.migration_id) + .execute_and_check(conn) + .await + } + + pub(crate) async fn migration_update_target_on_connection( + &self, + conn: &async_bb8_diesel::Connection, + vmm_id: &PropolisUuid, + migration: &nexus::MigrationRuntimeState, + ) -> Result, diesel::result::Error> { + let generation = Generation(migration.r#gen); + diesel::update(dsl::migration) + .filter(dsl::id.eq(migration.migration_id)) + .filter(dsl::time_deleted.is_null()) + .filter(dsl::target_gen.lt(generation)) + .filter(dsl::target_propolis_id.eq(vmm_id.into_untyped_uuid())) + .set(( + dsl::target_state.eq(MigrationState(migration.state)), + dsl::target_gen.eq(generation), + dsl::time_target_updated.eq(migration.time_updated), + )) + .check_if_exists::(migration.migration_id) + .execute_and_check(conn) + .await + } } #[cfg(test)] diff --git a/nexus/db-queries/src/db/datastore/mod.rs b/nexus/db-queries/src/db/datastore/mod.rs index 88e1f44cea..58259be7ee 100644 --- a/nexus/db-queries/src/db/datastore/mod.rs +++ b/nexus/db-queries/src/db/datastore/mod.rs @@ -111,7 +111,7 @@ mod zpool; pub use address_lot::AddressLotCreateResult; pub use dns::DataStoreDnsTest; pub use dns::DnsVersionUpdateBuilder; -pub use instance::InstanceAndActiveVmm; +pub use instance::{InstanceAndActiveVmm, InstanceGestalt}; pub use inventory::DataStoreInventoryTest; use nexus_db_model::AllSchemaVersions; pub use rack::RackInit; @@ -123,6 +123,7 @@ pub use sled::SledTransition; pub use sled::TransitionError; pub use switch_port::SwitchPortSettingsCombinedResult; pub use virtual_provisioning_collection::StorageType; +pub use vmm::VmmStateUpdateResult; pub use volume::read_only_resources_associated_with_volume; pub use volume::CrucibleResources; pub use volume::CrucibleTargets; diff --git a/nexus/db-queries/src/db/datastore/virtual_provisioning_collection.rs b/nexus/db-queries/src/db/datastore/virtual_provisioning_collection.rs index 247eefd3d5..7c3e1c4b8f 100644 --- a/nexus/db-queries/src/db/datastore/virtual_provisioning_collection.rs +++ b/nexus/db-queries/src/db/datastore/virtual_provisioning_collection.rs @@ -280,10 +280,7 @@ impl DataStore { } /// Transitively removes the CPU and memory charges for an instance from the - /// instance's project, silo, and fleet, provided that the instance's state - /// generation is less than `max_instance_gen`. This allows a caller who is - /// about to apply generation G to an instance to avoid deleting resources - /// if its update was superseded. + /// instance's project, silo, and fleet. pub async fn virtual_provisioning_collection_delete_instance( &self, opctx: &OpContext, @@ -291,12 +288,10 @@ impl DataStore { project_id: Uuid, cpus_diff: i64, ram_diff: ByteCount, - max_instance_gen: i64, ) -> Result, Error> { let provisions = VirtualProvisioningCollectionUpdate::new_delete_instance( id, - max_instance_gen, cpus_diff, ram_diff, project_id, @@ -518,8 +513,6 @@ mod test { // Delete the instance - // Make this value outrageously high, so that as a "max" it is ignored. - let max_instance_gen: i64 = 1000; datastore .virtual_provisioning_collection_delete_instance( &opctx, @@ -527,7 +520,6 @@ mod test { project_id, cpus, ram, - max_instance_gen, ) .await .unwrap(); @@ -614,10 +606,6 @@ mod test { // Delete the instance - // If the "instance gen" is too low, the delete operation should be - // dropped. This mimics circumstances where an instance update arrives - // late to the query. - let max_instance_gen = 0; datastore .virtual_provisioning_collection_delete_instance( &opctx, @@ -625,25 +613,6 @@ mod test { project_id, cpus, ram, - max_instance_gen, - ) - .await - .unwrap(); - for id in ids { - verify_collection_usage(&datastore, &opctx, id, 12, 1 << 30, 0) - .await; - } - - // Make this value outrageously high, so that as a "max" it is ignored. - let max_instance_gen = 1000; - datastore - .virtual_provisioning_collection_delete_instance( - &opctx, - instance_id, - project_id, - cpus, - ram, - max_instance_gen, ) .await .unwrap(); @@ -664,7 +633,6 @@ mod test { project_id, cpus, ram, - max_instance_gen, ) .await .unwrap(); diff --git a/nexus/db-queries/src/db/datastore/vmm.rs b/nexus/db-queries/src/db/datastore/vmm.rs index 7ce8c1551e..14c3405a70 100644 --- a/nexus/db-queries/src/db/datastore/vmm.rs +++ b/nexus/db-queries/src/db/datastore/vmm.rs @@ -7,6 +7,7 @@ use super::DataStore; use crate::authz; use crate::context::OpContext; +use crate::db; use crate::db::error::public_error_from_diesel; use crate::db::error::ErrorHandler; use crate::db::model::Vmm; @@ -15,23 +16,44 @@ use crate::db::model::VmmState as DbVmmState; use crate::db::pagination::paginated; use crate::db::schema::vmm::dsl; use crate::db::update_and_check::UpdateAndCheck; +use crate::db::update_and_check::UpdateAndQueryResult; use crate::db::update_and_check::UpdateStatus; +use crate::transaction_retry::OptionalError; use async_bb8_diesel::AsyncRunQueryDsl; use chrono::Utc; use diesel::prelude::*; use omicron_common::api::external::CreateResult; use omicron_common::api::external::DataPageParams; use omicron_common::api::external::Error; +use omicron_common::api::external::InternalContext; use omicron_common::api::external::ListResultVec; use omicron_common::api::external::LookupResult; use omicron_common::api::external::LookupType; use omicron_common::api::external::ResourceType; use omicron_common::api::external::UpdateResult; +use omicron_common::api::internal::nexus; +use omicron_common::api::internal::nexus::Migrations; use omicron_uuid_kinds::GenericUuid; use omicron_uuid_kinds::PropolisUuid; use std::net::SocketAddr; use uuid::Uuid; +/// The result of an [`DataStore::vmm_and_migration_update_runtime`] call, +/// indicating which records were updated. +#[derive(Copy, Clone, Debug)] +pub struct VmmStateUpdateResult { + /// `true` if the VMM record was updated, `false` otherwise. + pub vmm_updated: bool, + + /// `true` if a migration record was updated for the migration in, false if + /// no update was performed or no migration in was provided. + pub migration_in_updated: bool, + + /// `true` if a migration record was updated for the migration out, false if + /// no update was performed or no migration out was provided. + pub migration_out_updated: bool, +} + impl DataStore { pub async fn vmm_insert( &self, @@ -116,29 +138,164 @@ impl DataStore { vmm_id: &PropolisUuid, new_runtime: &VmmRuntimeState, ) -> Result { - let updated = diesel::update(dsl::vmm) + self.vmm_update_runtime_on_connection( + &*self.pool_connection_unauthorized().await?, + vmm_id, + new_runtime, + ) + .await + .map(|r| match r.status { + UpdateStatus::Updated => true, + UpdateStatus::NotUpdatedButExists => false, + }) + .map_err(|e| { + public_error_from_diesel( + e, + ErrorHandler::NotFoundByLookup( + ResourceType::Vmm, + LookupType::ById(vmm_id.into_untyped_uuid()), + ), + ) + }) + } + + async fn vmm_update_runtime_on_connection( + &self, + conn: &async_bb8_diesel::Connection, + vmm_id: &PropolisUuid, + new_runtime: &VmmRuntimeState, + ) -> Result, diesel::result::Error> { + diesel::update(dsl::vmm) .filter(dsl::time_deleted.is_null()) .filter(dsl::id.eq(vmm_id.into_untyped_uuid())) .filter(dsl::state_generation.lt(new_runtime.gen)) .set(new_runtime.clone()) .check_if_exists::(vmm_id.into_untyped_uuid()) - .execute_and_check(&*self.pool_connection_unauthorized().await?) + .execute_and_check(conn) .await - .map(|r| match r.status { - UpdateStatus::Updated => true, - UpdateStatus::NotUpdatedButExists => false, - }) - .map_err(|e| { - public_error_from_diesel( - e, - ErrorHandler::NotFoundByLookup( - ResourceType::Vmm, - LookupType::ById(vmm_id.into_untyped_uuid()), - ), - ) - })?; + } - Ok(updated) + /// Updates a VMM record and associated migration record(s) with a single + /// database command. + /// + /// This is intended to be used to apply updates from sled agent that + /// may change a VMM's runtime state (e.g. moving an instance from Running + /// to Stopped) and the state of its current active mgiration in a single + /// transaction. The caller is responsible for ensuring the VMM and + /// migration states are consistent with each other before calling this + /// routine. + /// + /// # Arguments + /// + /// - `vmm_id`: The ID of the VMM to update. + /// - `new_runtime`: The new VMM runtime state to try to write. + /// - `migrations`: The (optional) migration-in and migration-out states to + /// try to write. + /// + /// # Return value + /// + /// - `Ok(`[`VmmStateUpdateResult`]`)` if the query was issued + /// successfully. The returned [`VmmStateUpdateResult`] indicates which + /// database record(s) were updated. Note that an update can fail because + /// it was inapplicable (i.e. the database has state with a newer + /// generation already) or because the relevant record was not found. + /// - `Err` if another error occurred while accessing the database. + pub async fn vmm_and_migration_update_runtime( + &self, + opctx: &OpContext, + vmm_id: PropolisUuid, + new_runtime: &VmmRuntimeState, + Migrations { migration_in, migration_out }: Migrations<'_>, + ) -> Result { + fn migration_id( + m: Option<&nexus::MigrationRuntimeState>, + ) -> Option { + m.as_ref().map(|m| m.migration_id) + } + + // If both a migration-in and migration-out update was provided for this + // VMM, they can't be from the same migration, since migrating from a + // VMM to itself wouldn't make sense... + let migration_out_id = migration_id(migration_out); + if migration_out_id.is_some() + && migration_out_id == migration_id(migration_in) + { + return Err(Error::conflict( + "migrating from a VMM to itself is nonsensical", + )) + .internal_context(format!("migration_in: {migration_in:?}; migration_out: {migration_out:?}")); + } + + let err = OptionalError::new(); + let conn = self.pool_connection_authorized(opctx).await?; + + self.transaction_retry_wrapper("vmm_and_migration_update_runtime") + .transaction(&conn, |conn| { + let err = err.clone(); + async move { + let vmm_updated = self + .vmm_update_runtime_on_connection( + &conn, + &vmm_id, + new_runtime, + ) + .await.map(|r| match r.status { UpdateStatus::Updated => true, UpdateStatus::NotUpdatedButExists => false })?; + let migration_out_updated = match migration_out { + Some(migration) => { + let r = self.migration_update_source_on_connection( + &conn, &vmm_id, migration, + ) + .await?; + match r.status { + UpdateStatus::Updated => true, + UpdateStatus::NotUpdatedButExists => match r.found { + m if m.time_deleted.is_some() => return Err(err.bail(Error::Gone)), + m if m.source_propolis_id != vmm_id.into_untyped_uuid() => { + return Err(err.bail(Error::invalid_value( + "source propolis UUID", + format!("{vmm_id} is not the source VMM of this migration"), + ))); + } + // Not updated, generation has advanced. + _ => false + }, + } + }, + None => false, + }; + let migration_in_updated = match migration_in { + Some(migration) => { + let r = self.migration_update_target_on_connection( + &conn, &vmm_id, migration, + ) + .await?; + match r.status { + UpdateStatus::Updated => true, + UpdateStatus::NotUpdatedButExists => match r.found { + m if m.time_deleted.is_some() => return Err(err.bail(Error::Gone)), + m if m.target_propolis_id != vmm_id.into_untyped_uuid() => { + return Err(err.bail(Error::invalid_value( + "target propolis UUID", + format!("{vmm_id} is not the target VMM of this migration"), + ))); + } + // Not updated, generation has advanced. + _ => false + }, + } + }, + None => false, + }; + Ok(VmmStateUpdateResult { + vmm_updated, + migration_in_updated, + migration_out_updated, + }) + }}) + .await + .map_err(|e| { + err.take().unwrap_or_else(|| public_error_from_diesel(e, ErrorHandler::Server)) + }) } /// Forcibly overwrites the Propolis IP/Port in the supplied VMM's record with @@ -176,7 +333,7 @@ impl DataStore { /// /// A VMM is considered "abandoned" if (and only if): /// - /// - It is in the `Destroyed` state. + /// - It is in the `Destroyed` or `SagaUnwound` state. /// - It is not currently running an instance, and it is also not the /// migration target of any instance (i.e. it is not pointed to by /// any instance record's `active_propolis_id` and `target_propolis_id` @@ -188,16 +345,15 @@ impl DataStore { pagparams: &DataPageParams<'_, Uuid>, ) -> ListResultVec { use crate::db::schema::instance::dsl as instance_dsl; - let destroyed = DbVmmState::Destroyed; + paginated(dsl::vmm, dsl::id, pagparams) // In order to be considered "abandoned", a VMM must be: - // - in the `Destroyed` state - .filter(dsl::state.eq(destroyed)) + // - in the `Destroyed` or `SagaUnwound` state + .filter(dsl::state.eq_any(DbVmmState::DESTROYABLE_STATES)) // - not deleted yet .filter(dsl::time_deleted.is_null()) // - not pointed to by any instance's `active_propolis_id` or // `target_propolis_id`. - // .left_join( // Left join with the `instance` table on the VMM's instance ID, so // that we can check if the instance pointed to by this VMM (if @@ -230,3 +386,295 @@ impl DataStore { .map_err(|e| public_error_from_diesel(e, ErrorHandler::Server)) } } + +#[cfg(test)] +mod tests { + use super::*; + use crate::db; + use crate::db::datastore::test_utils::datastore_test; + use crate::db::model::Generation; + use crate::db::model::Migration; + use crate::db::model::VmmRuntimeState; + use crate::db::model::VmmState; + use nexus_test_utils::db::test_setup_database; + use omicron_common::api::internal::nexus; + use omicron_test_utils::dev; + use omicron_uuid_kinds::InstanceUuid; + + #[tokio::test] + async fn test_vmm_and_migration_update_runtime() { + // Setup + let logctx = + dev::test_setup_log("test_vmm_and_migration_update_runtime"); + let mut db = test_setup_database(&logctx.log).await; + let (opctx, datastore) = datastore_test(&logctx, &db).await; + + let instance_id = InstanceUuid::from_untyped_uuid(Uuid::new_v4()); + let vmm1 = datastore + .vmm_insert( + &opctx, + Vmm { + id: Uuid::new_v4(), + time_created: Utc::now(), + time_deleted: None, + instance_id: instance_id.into_untyped_uuid(), + sled_id: Uuid::new_v4(), + propolis_ip: "10.1.9.32".parse().unwrap(), + propolis_port: 420.into(), + runtime: VmmRuntimeState { + time_state_updated: Utc::now(), + r#gen: Generation::new(), + state: VmmState::Running, + }, + }, + ) + .await + .expect("VMM 1 should be inserted successfully!"); + + let vmm2 = datastore + .vmm_insert( + &opctx, + Vmm { + id: Uuid::new_v4(), + time_created: Utc::now(), + time_deleted: None, + instance_id: instance_id.into_untyped_uuid(), + sled_id: Uuid::new_v4(), + propolis_ip: "10.1.9.42".parse().unwrap(), + propolis_port: 420.into(), + runtime: VmmRuntimeState { + time_state_updated: Utc::now(), + r#gen: Generation::new(), + state: VmmState::Running, + }, + }, + ) + .await + .expect("VMM 2 should be inserted successfully!"); + + let migration1 = datastore + .migration_insert( + &opctx, + Migration::new(Uuid::new_v4(), instance_id, vmm1.id, vmm2.id), + ) + .await + .expect("migration should be inserted successfully!"); + + info!( + &logctx.log, + "pretending to migrate from vmm1 to vmm2"; + "vmm1" => ?vmm1, + "vmm2" => ?vmm2, + "migration" => ?migration1, + ); + + let vmm1_migration_out = nexus::MigrationRuntimeState { + migration_id: migration1.id, + state: nexus::MigrationState::Completed, + r#gen: Generation::new().0.next(), + time_updated: Utc::now(), + }; + datastore + .vmm_and_migration_update_runtime( + &opctx, + PropolisUuid::from_untyped_uuid(vmm1.id), + &VmmRuntimeState { + time_state_updated: Utc::now(), + r#gen: Generation(vmm1.runtime.r#gen.0.next()), + state: VmmState::Stopping, + }, + Migrations { + migration_in: None, + migration_out: Some(&vmm1_migration_out), + }, + ) + .await + .expect("vmm1 state should update"); + let vmm2_migration_in = nexus::MigrationRuntimeState { + migration_id: migration1.id, + state: nexus::MigrationState::Completed, + r#gen: Generation::new().0.next(), + time_updated: Utc::now(), + }; + datastore + .vmm_and_migration_update_runtime( + &opctx, + PropolisUuid::from_untyped_uuid(vmm2.id), + &VmmRuntimeState { + time_state_updated: Utc::now(), + r#gen: Generation(vmm2.runtime.r#gen.0.next()), + state: VmmState::Running, + }, + Migrations { + migration_in: Some(&vmm2_migration_in), + migration_out: None, + }, + ) + .await + .expect("vmm1 state should update"); + + let all_migrations = datastore + .instance_list_migrations( + &opctx, + instance_id, + &DataPageParams::max_page(), + ) + .await + .expect("must list migrations"); + assert_eq!(all_migrations.len(), 1); + let db_migration1 = &all_migrations[0]; + assert_eq!( + db_migration1.source_state, + db::model::MigrationState::COMPLETED + ); + assert_eq!( + db_migration1.target_state, + db::model::MigrationState::COMPLETED + ); + assert_eq!( + db_migration1.source_gen, + Generation(Generation::new().0.next()), + ); + assert_eq!( + db_migration1.target_gen, + Generation(Generation::new().0.next()), + ); + + // now, let's simulate a second migration, out of vmm2. + let vmm3 = datastore + .vmm_insert( + &opctx, + Vmm { + id: Uuid::new_v4(), + time_created: Utc::now(), + time_deleted: None, + instance_id: instance_id.into_untyped_uuid(), + sled_id: Uuid::new_v4(), + propolis_ip: "10.1.9.69".parse().unwrap(), + propolis_port: 420.into(), + runtime: VmmRuntimeState { + time_state_updated: Utc::now(), + r#gen: Generation::new(), + state: VmmState::Running, + }, + }, + ) + .await + .expect("VMM 2 should be inserted successfully!"); + + let migration2 = datastore + .migration_insert( + &opctx, + Migration::new(Uuid::new_v4(), instance_id, vmm2.id, vmm3.id), + ) + .await + .expect("migration 2 should be inserted successfully!"); + info!( + &logctx.log, + "pretending to migrate from vmm2 to vmm3"; + "vmm2" => ?vmm2, + "vmm3" => ?vmm3, + "migration" => ?migration2, + ); + + let vmm2_migration_out = nexus::MigrationRuntimeState { + migration_id: migration2.id, + state: nexus::MigrationState::Completed, + r#gen: Generation::new().0.next(), + time_updated: Utc::now(), + }; + datastore + .vmm_and_migration_update_runtime( + &opctx, + PropolisUuid::from_untyped_uuid(vmm2.id), + &VmmRuntimeState { + time_state_updated: Utc::now(), + r#gen: Generation(vmm2.runtime.r#gen.0.next()), + state: VmmState::Destroyed, + }, + Migrations { + migration_in: Some(&vmm2_migration_in), + migration_out: Some(&vmm2_migration_out), + }, + ) + .await + .expect("vmm2 state should update"); + + let vmm3_migration_in = nexus::MigrationRuntimeState { + migration_id: migration2.id, + // Let's make this fail, just for fun... + state: nexus::MigrationState::Failed, + r#gen: Generation::new().0.next(), + time_updated: Utc::now(), + }; + datastore + .vmm_and_migration_update_runtime( + &opctx, + PropolisUuid::from_untyped_uuid(vmm3.id), + &VmmRuntimeState { + time_state_updated: Utc::now(), + r#gen: Generation(vmm3.runtime.r#gen.0.next()), + state: VmmState::Destroyed, + }, + Migrations { + migration_in: Some(&vmm3_migration_in), + migration_out: None, + }, + ) + .await + .expect("vmm3 state should update"); + + let all_migrations = datastore + .instance_list_migrations( + &opctx, + instance_id, + &DataPageParams::max_page(), + ) + .await + .expect("must list migrations"); + assert_eq!(all_migrations.len(), 2); + + // the previous migration should not have closed. + let new_db_migration1 = all_migrations + .iter() + .find(|m| m.id == migration1.id) + .expect("query must include migration1"); + assert_eq!(new_db_migration1.source_state, db_migration1.source_state); + assert_eq!(new_db_migration1.source_gen, db_migration1.source_gen); + assert_eq!( + db_migration1.time_source_updated, + new_db_migration1.time_source_updated + ); + assert_eq!(new_db_migration1.target_state, db_migration1.target_state); + assert_eq!(new_db_migration1.target_gen, db_migration1.target_gen,); + assert_eq!( + new_db_migration1.time_target_updated, + db_migration1.time_target_updated, + ); + + let db_migration2 = all_migrations + .iter() + .find(|m| m.id == migration2.id) + .expect("query must include migration2"); + assert_eq!( + db_migration2.source_state, + db::model::MigrationState::COMPLETED + ); + assert_eq!( + db_migration2.target_state, + db::model::MigrationState::FAILED + ); + assert_eq!( + db_migration2.source_gen, + Generation(Generation::new().0.next()), + ); + assert_eq!( + db_migration2.target_gen, + Generation(Generation::new().0.next()), + ); + + // Clean up. + db.cleanup().await.unwrap(); + logctx.cleanup_successful(); + } +} diff --git a/nexus/db-queries/src/db/queries/instance.rs b/nexus/db-queries/src/db/queries/instance.rs deleted file mode 100644 index fded585b67..0000000000 --- a/nexus/db-queries/src/db/queries/instance.rs +++ /dev/null @@ -1,390 +0,0 @@ -// This Source Code Form is subject to the terms of the Mozilla Public -// License, v. 2.0. If a copy of the MPL was not distributed with this -// file, You can obtain one at https://mozilla.org/MPL/2.0/. - -//! Implement a query for updating an instance and VMM in a single CTE. - -use async_bb8_diesel::AsyncRunQueryDsl; -use diesel::prelude::QueryResult; -use diesel::query_builder::{Query, QueryFragment, QueryId}; -use diesel::result::Error as DieselError; -use diesel::sql_types::{Nullable, Uuid as SqlUuid}; -use diesel::{pg::Pg, query_builder::AstPass}; -use diesel::{Column, ExpressionMethods, QueryDsl, RunQueryDsl}; -use nexus_db_model::{ - schema::{ - instance::dsl as instance_dsl, migration::dsl as migration_dsl, - vmm::dsl as vmm_dsl, - }, - Generation, InstanceRuntimeState, MigrationState, VmmRuntimeState, -}; -use omicron_common::api::internal::nexus::{ - MigrationRole, MigrationRuntimeState, -}; -use omicron_uuid_kinds::{GenericUuid, InstanceUuid, PropolisUuid}; -use uuid::Uuid; - -use crate::db::pool::DbConnection; -use crate::db::update_and_check::UpdateStatus; - -/// A CTE that checks and updates the instance and VMM tables in a single -/// atomic operation. -// -// The single-table update-and-check CTE has the following form: -// -// WITH found AS (SELECT FROM T WHERE ) -// updated AS (UPDATE T SET RETURNING *) -// SELECT -// found. -// updated. -// found.* -// FROM -// found -// LEFT JOIN -// updated -// ON -// found. = updated.; -// -// The idea behind this query is to have separate "found" and "updated" -// subqueries for the instance and VMM tables, then use those to create two more -// subqueries that perform the joins and yield the results, along the following -// lines: -// -// WITH vmm_found AS (SELECT(SELECT id FROM vmm WHERE vmm.id = id) AS id), -// vmm_updated AS (UPDATE vmm SET ... RETURNING *), -// instance_found AS (SELECT( -// SELECT id FROM instance WHERE instance.id = id -// ) AS id), -// instance_updated AS (UPDATE instance SET ... RETURNING *), -// vmm_result AS ( -// SELECT vmm_found.id AS found, vmm_updated.id AS updated -// FROM vmm_found -// LEFT JOIN vmm_updated -// ON vmm_found.id = vmm_updated.id -// ), -// instance_result AS ( -// SELECT instance_found.id AS found, instance_updated.id AS updated -// FROM instance_found -// LEFT JOIN instance_updated -// ON instance_found.id = instance_updated.id -// ) -// SELECT vmm_result.found, vmm_result.updated, instance_result.found, -// instance_result.updated -// FROM vmm_result, instance_result; -/// -/// If a [`MigrationRuntimeState`] is provided, similar "found" and "update" -/// clauses are also added to join the `migration` record for the instance's -/// active migration, if one exists, and update the migration record. If no -/// migration record is provided, this part of the query is skipped, and the -/// `migration_found` and `migration_updated` portions are always `false`. -// -// The "wrapper" SELECTs when finding instances and VMMs are used to get a NULL -// result in the final output instead of failing the entire query if the target -// object is missing. This maximizes Nexus's flexibility when dealing with -// updates from sled agent that refer to one valid and one deleted object. (This -// can happen if, e.g., sled agent sends a message indicating that a retired VMM -// has finally been destroyed when its instance has since been deleted.) -pub struct InstanceAndVmmUpdate { - instance_find: Box + Send>, - vmm_find: Box + Send>, - instance_update: Box + Send>, - vmm_update: Box + Send>, - migration: Option, -} - -struct MigrationUpdate { - find: Box + Send>, - update: Box + Send>, -} - -/// Contains the result of a combined instance-and-VMM update operation. -#[derive(Copy, Clone, PartialEq, Debug)] -pub struct InstanceAndVmmUpdateResult { - /// `Some(status)` if the target instance was found; the wrapped - /// `UpdateStatus` indicates whether the row was updated. `None` if the - /// instance was not found. - pub instance_status: Option, - - /// `Some(status)` if the target VMM was found; the wrapped `UpdateStatus` - /// indicates whether the row was updated. `None` if the VMM was not found. - pub vmm_status: Option, - - /// `Some(status)` if the target migration was found; the wrapped `UpdateStatus` - /// indicates whether the row was updated. `None` if the migration was not - /// found, or no migration update was performed. - pub migration_status: Option, -} - -/// Computes the update status to return from the results of queries that find -/// and update an object with an ID of type `T`. -fn compute_update_status( - found: Option, - updated: Option, -) -> Option -where - T: PartialEq + std::fmt::Display, -{ - match (found, updated) { - // If both the "find" and "update" prongs returned an ID, the row was - // updated. The IDs should match in this case (if they don't then the - // query was constructed very strangely!). - (Some(found_id), Some(updated_id)) if found_id == updated_id => { - Some(UpdateStatus::Updated) - } - // If the "find" prong returned an ID but the "update" prong didn't, the - // row exists but wasn't updated. - (Some(_), None) => Some(UpdateStatus::NotUpdatedButExists), - // If neither prong returned anything, indicate the row is missing. - (None, None) => None, - // If both prongs returned an ID, but they don't match, something - // terrible has happened--the prongs must have referred to different - // IDs! - (Some(found_id), Some(mismatched_id)) => unreachable!( - "updated ID {} didn't match found ID {}", - mismatched_id, found_id - ), - // Similarly, if the target ID was not found but something was updated - // anyway, then something is wrong with the update query--either it has - // the wrong ID or did not filter rows properly. - (None, Some(updated_id)) => unreachable!( - "ID {} was updated but no found ID was supplied", - updated_id - ), - } -} - -impl InstanceAndVmmUpdate { - pub fn new( - instance_id: InstanceUuid, - new_instance_runtime_state: InstanceRuntimeState, - vmm_id: PropolisUuid, - new_vmm_runtime_state: VmmRuntimeState, - migration: Option, - ) -> Self { - let instance_find = Box::new( - instance_dsl::instance - .filter(instance_dsl::id.eq(instance_id.into_untyped_uuid())) - .select(instance_dsl::id), - ); - - let vmm_find = Box::new( - vmm_dsl::vmm - .filter(vmm_dsl::id.eq(vmm_id.into_untyped_uuid())) - .select(vmm_dsl::id), - ); - - let instance_update = Box::new( - diesel::update(instance_dsl::instance) - .filter(instance_dsl::time_deleted.is_null()) - .filter(instance_dsl::id.eq(instance_id.into_untyped_uuid())) - .filter( - instance_dsl::state_generation - .lt(new_instance_runtime_state.gen), - ) - .set(new_instance_runtime_state), - ); - - let vmm_update = Box::new( - diesel::update(vmm_dsl::vmm) - .filter(vmm_dsl::time_deleted.is_null()) - .filter(vmm_dsl::id.eq(vmm_id.into_untyped_uuid())) - .filter(vmm_dsl::state_generation.lt(new_vmm_runtime_state.gen)) - .set(new_vmm_runtime_state), - ); - - let migration = migration.map( - |MigrationRuntimeState { - role, - migration_id, - state, - gen, - time_updated, - }| { - let state = MigrationState::from(state); - let find = Box::new( - migration_dsl::migration - .filter(migration_dsl::id.eq(migration_id)) - .filter(migration_dsl::time_deleted.is_null()) - .select(migration_dsl::id), - ); - let gen = Generation::from(gen); - let update: Box + Send> = match role { - MigrationRole::Target => Box::new( - diesel::update(migration_dsl::migration) - .filter(migration_dsl::id.eq(migration_id)) - .filter( - migration_dsl::target_propolis_id - .eq(vmm_id.into_untyped_uuid()), - ) - .filter(migration_dsl::target_gen.lt(gen)) - .set(( - migration_dsl::target_state.eq(state), - migration_dsl::time_target_updated - .eq(time_updated), - )), - ), - MigrationRole::Source => Box::new( - diesel::update(migration_dsl::migration) - .filter(migration_dsl::id.eq(migration_id)) - .filter( - migration_dsl::source_propolis_id - .eq(vmm_id.into_untyped_uuid()), - ) - .filter(migration_dsl::source_gen.lt(gen)) - .set(( - migration_dsl::source_state.eq(state), - migration_dsl::time_source_updated - .eq(time_updated), - )), - ), - }; - MigrationUpdate { find, update } - }, - ); - - Self { instance_find, vmm_find, instance_update, vmm_update, migration } - } - - pub async fn execute_and_check( - self, - conn: &(impl async_bb8_diesel::AsyncConnection + Sync), - ) -> Result { - let ( - vmm_found, - vmm_updated, - instance_found, - instance_updated, - migration_found, - migration_updated, - ) = self - .get_result_async::<( - Option, - Option, - Option, - Option, - Option, - Option, - )>(conn) - .await?; - - let instance_status = - compute_update_status(instance_found, instance_updated); - let vmm_status = compute_update_status(vmm_found, vmm_updated); - let migration_status = - compute_update_status(migration_found, migration_updated); - - Ok(InstanceAndVmmUpdateResult { - instance_status, - vmm_status, - migration_status, - }) - } -} - -impl QueryId for InstanceAndVmmUpdate { - type QueryId = (); - const HAS_STATIC_QUERY_ID: bool = false; -} - -impl Query for InstanceAndVmmUpdate { - type SqlType = ( - Nullable, - Nullable, - Nullable, - Nullable, - Nullable, - Nullable, - ); -} - -impl RunQueryDsl for InstanceAndVmmUpdate {} - -impl QueryFragment for InstanceAndVmmUpdate { - fn walk_ast<'b>(&'b self, mut out: AstPass<'_, 'b, Pg>) -> QueryResult<()> { - out.push_sql("WITH instance_found AS (SELECT ("); - self.instance_find.walk_ast(out.reborrow())?; - out.push_sql(") AS id), "); - - out.push_sql("vmm_found AS (SELECT ("); - self.vmm_find.walk_ast(out.reborrow())?; - out.push_sql(") AS id), "); - - if let Some(MigrationUpdate { ref find, .. }) = self.migration { - out.push_sql("migration_found AS (SELECT ("); - find.walk_ast(out.reborrow())?; - out.push_sql(") AS id), "); - } - - out.push_sql("instance_updated AS ("); - self.instance_update.walk_ast(out.reborrow())?; - out.push_sql(" RETURNING id), "); - - out.push_sql("vmm_updated AS ("); - self.vmm_update.walk_ast(out.reborrow())?; - out.push_sql(" RETURNING id), "); - - if let Some(MigrationUpdate { ref update, .. }) = self.migration { - out.push_sql("migration_updated AS ("); - update.walk_ast(out.reborrow())?; - out.push_sql(" RETURNING id), "); - } - - out.push_sql("vmm_result AS ("); - out.push_sql("SELECT vmm_found."); - out.push_identifier(vmm_dsl::id::NAME)?; - out.push_sql(" AS found, vmm_updated."); - out.push_identifier(vmm_dsl::id::NAME)?; - out.push_sql(" AS updated"); - out.push_sql(" FROM vmm_found LEFT JOIN vmm_updated ON vmm_found."); - out.push_identifier(vmm_dsl::id::NAME)?; - out.push_sql(" = vmm_updated."); - out.push_identifier(vmm_dsl::id::NAME)?; - out.push_sql("), "); - - out.push_sql("instance_result AS ("); - out.push_sql("SELECT instance_found."); - out.push_identifier(instance_dsl::id::NAME)?; - out.push_sql(" AS found, instance_updated."); - out.push_identifier(instance_dsl::id::NAME)?; - out.push_sql(" AS updated"); - out.push_sql( - " FROM instance_found LEFT JOIN instance_updated ON instance_found.", - ); - out.push_identifier(instance_dsl::id::NAME)?; - out.push_sql(" = instance_updated."); - out.push_identifier(instance_dsl::id::NAME)?; - out.push_sql(")"); - - if self.migration.is_some() { - out.push_sql(", "); - out.push_sql("migration_result AS ("); - out.push_sql("SELECT migration_found."); - out.push_identifier(migration_dsl::id::NAME)?; - out.push_sql(" AS found, migration_updated."); - out.push_identifier(migration_dsl::id::NAME)?; - out.push_sql(" AS updated"); - out.push_sql( - " FROM migration_found LEFT JOIN migration_updated ON migration_found.", - ); - out.push_identifier(migration_dsl::id::NAME)?; - out.push_sql(" = migration_updated."); - out.push_identifier(migration_dsl::id::NAME)?; - out.push_sql(")"); - } - out.push_sql(" "); - - out.push_sql("SELECT vmm_result.found, vmm_result.updated, "); - out.push_sql("instance_result.found, instance_result.updated, "); - if self.migration.is_some() { - out.push_sql("migration_result.found, migration_result.updated "); - } else { - out.push_sql("NULL, NULL "); - } - out.push_sql("FROM vmm_result, instance_result"); - if self.migration.is_some() { - out.push_sql(", migration_result"); - } - - Ok(()) - } -} diff --git a/nexus/db-queries/src/db/queries/mod.rs b/nexus/db-queries/src/db/queries/mod.rs index a1022f9187..f88b8fab6d 100644 --- a/nexus/db-queries/src/db/queries/mod.rs +++ b/nexus/db-queries/src/db/queries/mod.rs @@ -7,7 +7,6 @@ pub mod disk; pub mod external_ip; -pub mod instance; pub mod ip_pool; #[macro_use] mod next_item; diff --git a/nexus/db-queries/src/db/queries/virtual_provisioning_collection_update.rs b/nexus/db-queries/src/db/queries/virtual_provisioning_collection_update.rs index fd86912107..902d955a79 100644 --- a/nexus/db-queries/src/db/queries/virtual_provisioning_collection_update.rs +++ b/nexus/db-queries/src/db/queries/virtual_provisioning_collection_update.rs @@ -81,17 +81,9 @@ pub fn from_diesel(e: DieselError) -> external::Error { #[derive(Clone)] enum UpdateKind { InsertStorage(VirtualProvisioningResource), - DeleteStorage { - id: uuid::Uuid, - disk_byte_diff: ByteCount, - }, + DeleteStorage { id: uuid::Uuid, disk_byte_diff: ByteCount }, InsertInstance(VirtualProvisioningResource), - DeleteInstance { - id: uuid::Uuid, - max_instance_gen: i64, - cpus_diff: i64, - ram_diff: ByteCount, - }, + DeleteInstance { id: uuid::Uuid, cpus_diff: i64, ram_diff: ByteCount }, } type SelectableSql = < @@ -246,15 +238,7 @@ WITH ),") .bind::(id) }, - UpdateKind::DeleteInstance { id, max_instance_gen, .. } => { - // The filter condition here ensures that the provisioning record is - // only deleted if the corresponding instance has a generation - // number less than the supplied `max_instance_gen`. This allows a - // caller that is about to apply an instance update that will stop - // the instance and that bears generation G to avoid deleting - // resources if the instance generation was already advanced to or - // past G. - // + UpdateKind::DeleteInstance { id, .. } => { // If the relevant instance ID is not in the database, then some // other operation must have ensured the instance was previously // stopped (because that's the only way it could have been deleted), @@ -279,14 +263,13 @@ WITH FROM instance WHERE - instance.id = ").param().sql(" AND instance.state_generation < ").param().sql(" + instance.id = ").param().sql(" LIMIT 1 ) AS update ),") .bind::(id) .bind::(id) - .bind::(max_instance_gen) }, }; @@ -477,7 +460,6 @@ FROM pub fn new_delete_instance( id: InstanceUuid, - max_instance_gen: i64, cpus_diff: i64, ram_diff: ByteCount, project_id: uuid::Uuid, @@ -485,7 +467,6 @@ FROM Self::apply_update( UpdateKind::DeleteInstance { id: id.into_untyped_uuid(), - max_instance_gen, cpus_diff, ram_diff, }, @@ -567,14 +548,9 @@ mod test { let project_id = Uuid::nil(); let cpus_diff = 4; let ram_diff = 2048.try_into().unwrap(); - let max_instance_gen = 0; let query = VirtualProvisioningCollectionUpdate::new_delete_instance( - id, - max_instance_gen, - cpus_diff, - ram_diff, - project_id, + id, cpus_diff, ram_diff, project_id, ); expectorate_query_contents( @@ -678,17 +654,12 @@ mod test { let conn = pool.pool().get().await.unwrap(); let id = InstanceUuid::nil(); - let max_instance_gen = 0; let project_id = Uuid::nil(); let cpus_diff = 16.try_into().unwrap(); let ram_diff = 2048.try_into().unwrap(); let query = VirtualProvisioningCollectionUpdate::new_delete_instance( - id, - max_instance_gen, - cpus_diff, - ram_diff, - project_id, + id, cpus_diff, ram_diff, project_id, ); let _ = query .explain_async(&conn) diff --git a/nexus/db-queries/tests/output/virtual_provisioning_collection_update_delete_instance.sql b/nexus/db-queries/tests/output/virtual_provisioning_collection_update_delete_instance.sql index 3c97b7efc7..69b2e017fd 100644 --- a/nexus/db-queries/tests/output/virtual_provisioning_collection_update_delete_instance.sql +++ b/nexus/db-queries/tests/output/virtual_provisioning_collection_update_delete_instance.sql @@ -40,9 +40,7 @@ WITH 1 ) = 1 - AND EXISTS( - SELECT 1 FROM instance WHERE instance.id = $5 AND instance.state_generation < $6 LIMIT 1 - ) + AND EXISTS(SELECT 1 FROM instance WHERE instance.id = $5 LIMIT 1) AS update ), unused_cte_arm @@ -50,7 +48,7 @@ WITH DELETE FROM virtual_provisioning_resource WHERE - virtual_provisioning_resource.id = $7 AND (SELECT do_update.update FROM do_update LIMIT 1) + virtual_provisioning_resource.id = $6 AND (SELECT do_update.update FROM do_update LIMIT 1) RETURNING virtual_provisioning_resource.id, virtual_provisioning_resource.time_modified, @@ -65,8 +63,8 @@ WITH virtual_provisioning_collection SET time_modified = current_timestamp(), - cpus_provisioned = virtual_provisioning_collection.cpus_provisioned - $8, - ram_provisioned = virtual_provisioning_collection.ram_provisioned - $9 + cpus_provisioned = virtual_provisioning_collection.cpus_provisioned - $7, + ram_provisioned = virtual_provisioning_collection.ram_provisioned - $8 WHERE virtual_provisioning_collection.id = ANY (SELECT all_collections.id FROM all_collections) AND (SELECT do_update.update FROM do_update LIMIT 1) diff --git a/nexus/db-queries/tests/output/vmm_and_migration_update_vmm_and_both_migrations.sql b/nexus/db-queries/tests/output/vmm_and_migration_update_vmm_and_both_migrations.sql new file mode 100644 index 0000000000..bb460ff713 --- /dev/null +++ b/nexus/db-queries/tests/output/vmm_and_migration_update_vmm_and_both_migrations.sql @@ -0,0 +1,93 @@ +WITH + migration_in_found + AS ( + SELECT + ( + SELECT + migration.id + FROM + migration + WHERE + migration.id = $1 AND (migration.time_deleted IS NULL) + ) + AS id + ), + migration_in_updated + AS ( + UPDATE + migration + SET + target_state = $2, time_target_updated = $3, target_gen = $4 + WHERE + (migration.id = $5 AND migration.target_propolis_id = $6) AND migration.target_gen < $7 + RETURNING + id + ), + migration_in_result + AS ( + SELECT + migration_in_found.id AS found, migration_in_updated.id AS updated + FROM + migration_in_found + LEFT JOIN migration_in_updated ON migration_in_found.id = migration_in_updated.id + ), + migration_out_found + AS ( + SELECT + ( + SELECT + migration.id + FROM + migration + WHERE + migration.id = $8 AND (migration.time_deleted IS NULL) + ) + AS id + ), + migration_out_updated + AS ( + UPDATE + migration + SET + source_state = $9, time_source_updated = $10, source_gen = $11 + WHERE + (migration.id = $12 AND migration.source_propolis_id = $13) AND migration.source_gen < $14 + RETURNING + id + ), + migration_out_result + AS ( + SELECT + migration_out_found.id AS found, migration_out_updated.id AS updated + FROM + migration_out_found + LEFT JOIN migration_out_updated ON migration_out_found.id = migration_out_updated.id + ), + vmm_found AS (SELECT (SELECT vmm.id FROM vmm WHERE vmm.id = $15) AS id), + vmm_updated + AS ( + UPDATE + vmm + SET + time_state_updated = $16, state_generation = $17, state = $18 + WHERE + ((vmm.time_deleted IS NULL) AND vmm.id = $19) AND vmm.state_generation < $20 + RETURNING + id + ), + vmm_result + AS ( + SELECT + vmm_found.id AS found, vmm_updated.id AS updated + FROM + vmm_found LEFT JOIN vmm_updated ON vmm_found.id = vmm_updated.id + ) +SELECT + vmm_result.found, + vmm_result.updated, + migration_in_result.found, + migration_in_result.updated, + migration_out_result.found, + migration_out_result.updated +FROM + vmm_result, migration_in_result, migration_out_result diff --git a/nexus/db-queries/tests/output/vmm_and_migration_update_vmm_and_migration_in.sql b/nexus/db-queries/tests/output/vmm_and_migration_update_vmm_and_migration_in.sql new file mode 100644 index 0000000000..3fec792c6f --- /dev/null +++ b/nexus/db-queries/tests/output/vmm_and_migration_update_vmm_and_migration_in.sql @@ -0,0 +1,61 @@ +WITH + migration_in_found + AS ( + SELECT + ( + SELECT + migration.id + FROM + migration + WHERE + migration.id = $1 AND (migration.time_deleted IS NULL) + ) + AS id + ), + migration_in_updated + AS ( + UPDATE + migration + SET + target_state = $2, time_target_updated = $3, target_gen = $4 + WHERE + (migration.id = $5 AND migration.target_propolis_id = $6) AND migration.target_gen < $7 + RETURNING + id + ), + migration_in_result + AS ( + SELECT + migration_in_found.id AS found, migration_in_updated.id AS updated + FROM + migration_in_found + LEFT JOIN migration_in_updated ON migration_in_found.id = migration_in_updated.id + ), + vmm_found AS (SELECT (SELECT vmm.id FROM vmm WHERE vmm.id = $8) AS id), + vmm_updated + AS ( + UPDATE + vmm + SET + time_state_updated = $9, state_generation = $10, state = $11 + WHERE + ((vmm.time_deleted IS NULL) AND vmm.id = $12) AND vmm.state_generation < $13 + RETURNING + id + ), + vmm_result + AS ( + SELECT + vmm_found.id AS found, vmm_updated.id AS updated + FROM + vmm_found LEFT JOIN vmm_updated ON vmm_found.id = vmm_updated.id + ) +SELECT + vmm_result.found, + vmm_result.updated, + migration_in_result.found, + migration_in_result.updated, + NULL, + NULL +FROM + vmm_result, migration_in_result diff --git a/nexus/db-queries/tests/output/vmm_and_migration_update_vmm_and_migration_out.sql b/nexus/db-queries/tests/output/vmm_and_migration_update_vmm_and_migration_out.sql new file mode 100644 index 0000000000..7adeff48da --- /dev/null +++ b/nexus/db-queries/tests/output/vmm_and_migration_update_vmm_and_migration_out.sql @@ -0,0 +1,61 @@ +WITH + migration_out_found + AS ( + SELECT + ( + SELECT + migration.id + FROM + migration + WHERE + migration.id = $1 AND (migration.time_deleted IS NULL) + ) + AS id + ), + migration_out_updated + AS ( + UPDATE + migration + SET + source_state = $2, time_source_updated = $3, source_gen = $4 + WHERE + (migration.id = $5 AND migration.source_propolis_id = $6) AND migration.source_gen < $7 + RETURNING + id + ), + migration_out_result + AS ( + SELECT + migration_out_found.id AS found, migration_out_updated.id AS updated + FROM + migration_out_found + LEFT JOIN migration_out_updated ON migration_out_found.id = migration_out_updated.id + ), + vmm_found AS (SELECT (SELECT vmm.id FROM vmm WHERE vmm.id = $8) AS id), + vmm_updated + AS ( + UPDATE + vmm + SET + time_state_updated = $9, state_generation = $10, state = $11 + WHERE + ((vmm.time_deleted IS NULL) AND vmm.id = $12) AND vmm.state_generation < $13 + RETURNING + id + ), + vmm_result + AS ( + SELECT + vmm_found.id AS found, vmm_updated.id AS updated + FROM + vmm_found LEFT JOIN vmm_updated ON vmm_found.id = vmm_updated.id + ) +SELECT + vmm_result.found, + vmm_result.updated, + NULL, + NULL, + migration_out_result.found, + migration_out_result.updated +FROM + vmm_result, migration_out_result diff --git a/nexus/db-queries/tests/output/vmm_and_migration_update_vmm_only.sql b/nexus/db-queries/tests/output/vmm_and_migration_update_vmm_only.sql new file mode 100644 index 0000000000..cfe56740fe --- /dev/null +++ b/nexus/db-queries/tests/output/vmm_and_migration_update_vmm_only.sql @@ -0,0 +1,24 @@ +WITH + vmm_found AS (SELECT (SELECT vmm.id FROM vmm WHERE vmm.id = $1) AS id), + vmm_updated + AS ( + UPDATE + vmm + SET + time_state_updated = $2, state_generation = $3, state = $4 + WHERE + ((vmm.time_deleted IS NULL) AND vmm.id = $5) AND vmm.state_generation < $6 + RETURNING + id + ), + vmm_result + AS ( + SELECT + vmm_found.id AS found, vmm_updated.id AS updated + FROM + vmm_found LEFT JOIN vmm_updated ON vmm_found.id = vmm_updated.id + ) +SELECT + vmm_result.found, vmm_result.updated, NULL, NULL, NULL, NULL +FROM + vmm_result diff --git a/nexus/examples/config-second.toml b/nexus/examples/config-second.toml index 40f5d95a5f..754f37c064 100644 --- a/nexus/examples/config-second.toml +++ b/nexus/examples/config-second.toml @@ -132,6 +132,8 @@ region_replacement.period_secs = 30 region_replacement_driver.period_secs = 10 # How frequently to query the status of active instances. instance_watcher.period_secs = 30 +# How frequently to schedule new instance update sagas. +instance_updater.period_secs = 30 service_firewall_propagation.period_secs = 300 v2p_mapping_propagation.period_secs = 30 abandoned_vmm_reaper.period_secs = 60 diff --git a/nexus/examples/config.toml b/nexus/examples/config.toml index b194ecf1b6..bd50e846bd 100644 --- a/nexus/examples/config.toml +++ b/nexus/examples/config.toml @@ -118,6 +118,8 @@ region_replacement.period_secs = 30 region_replacement_driver.period_secs = 10 # How frequently to query the status of active instances. instance_watcher.period_secs = 30 +# How frequently to schedule new instance update sagas. +instance_updater.period_secs = 30 service_firewall_propagation.period_secs = 300 v2p_mapping_propagation.period_secs = 30 abandoned_vmm_reaper.period_secs = 60 diff --git a/nexus/src/app/background/init.rs b/nexus/src/app/background/init.rs index 2f1c4cd738..850e63443a 100644 --- a/nexus/src/app/background/init.rs +++ b/nexus/src/app/background/init.rs @@ -98,6 +98,7 @@ use super::tasks::dns_config; use super::tasks::dns_propagation; use super::tasks::dns_servers; use super::tasks::external_endpoints; +use super::tasks::instance_updater; use super::tasks::instance_watcher; use super::tasks::inventory_collection; use super::tasks::lookup_region_port; @@ -154,6 +155,7 @@ pub struct BackgroundTasks { pub task_region_replacement: Activator, pub task_region_replacement_driver: Activator, pub task_instance_watcher: Activator, + pub task_instance_updater: Activator, pub task_service_firewall_propagation: Activator, pub task_abandoned_vmm_reaper: Activator, pub task_vpc_route_manager: Activator, @@ -234,6 +236,7 @@ impl BackgroundTasksInitializer { task_region_replacement: Activator::new(), task_region_replacement_driver: Activator::new(), task_instance_watcher: Activator::new(), + task_instance_updater: Activator::new(), task_service_firewall_propagation: Activator::new(), task_abandoned_vmm_reaper: Activator::new(), task_vpc_route_manager: Activator::new(), @@ -294,6 +297,7 @@ impl BackgroundTasksInitializer { task_region_replacement, task_region_replacement_driver, task_instance_watcher, + task_instance_updater, task_service_firewall_propagation, task_abandoned_vmm_reaper, task_vpc_route_manager, @@ -613,10 +617,9 @@ impl BackgroundTasksInitializer { { let watcher = instance_watcher::InstanceWatcher::new( datastore.clone(), - resolver.clone(), + sagas.clone(), producer_registry, instance_watcher::WatcherIdentity { nexus_id, rack_id }, - task_v2p_manager.clone(), ); driver.register(TaskDefinition { name: "instance_watcher", @@ -629,6 +632,25 @@ impl BackgroundTasksInitializer { }) }; + // Background task: schedule update sagas for instances in need of + // state updates. + { + let updater = instance_updater::InstanceUpdater::new( + datastore.clone(), + sagas.clone(), + config.instance_updater.disable, + ); + driver.register( TaskDefinition { + name: "instance_updater", + description: "detects if instances require update sagas and schedules them", + period: config.instance_watcher.period_secs, + task_impl: Box::new(updater), + opctx: opctx.child(BTreeMap::new()), + watchers: vec![], + activator: task_instance_updater, + }); + } + // Background task: service firewall rule propagation driver.register(TaskDefinition { name: "service_firewall_rule_propagation", diff --git a/nexus/src/app/background/tasks/instance_updater.rs b/nexus/src/app/background/tasks/instance_updater.rs new file mode 100644 index 0000000000..46a3bead21 --- /dev/null +++ b/nexus/src/app/background/tasks/instance_updater.rs @@ -0,0 +1,270 @@ +// This Source Code Form is subject to the terms of the Mozilla Public +// License, v. 2.0. If a copy of the MPL was not distributed with this +// file, You can obtain one at https://mozilla.org/MPL/2.0/. + +//! Background task for detecting instances in need of update sagas. + +use crate::app::background::BackgroundTask; +use crate::app::saga::StartSaga; +use crate::app::sagas::instance_update; +use crate::app::sagas::NexusSaga; +use anyhow::Context; +use futures::future::BoxFuture; +use futures::FutureExt; +use nexus_db_model::Instance; +use nexus_db_queries::context::OpContext; +use nexus_db_queries::db::lookup::LookupPath; +use nexus_db_queries::db::DataStore; +use nexus_db_queries::{authn, authz}; +use nexus_types::identity::Resource; +use omicron_common::api::external::ListResultVec; +use serde_json::json; +use std::future::Future; +use std::sync::Arc; +use tokio::task::JoinSet; + +pub struct InstanceUpdater { + datastore: Arc, + sagas: Arc, + disable: bool, +} + +impl InstanceUpdater { + pub fn new( + datastore: Arc, + sagas: Arc, + disable: bool, + ) -> Self { + InstanceUpdater { datastore, sagas, disable } + } + + async fn actually_activate( + &mut self, + opctx: &OpContext, + stats: &mut ActivationStats, + ) -> Result<(), anyhow::Error> { + async fn find_instances( + what: &'static str, + log: &slog::Logger, + last_err: &mut Result<(), anyhow::Error>, + query: impl Future>, + ) -> Vec { + slog::debug!(&log, "looking for instances with {what}..."); + match query.await { + Ok(list) => { + slog::info!( + &log, + "listed instances with {what}"; + "count" => list.len(), + ); + list + } + Err(error) => { + slog::error!( + &log, + "failed to list instances with {what}"; + "error" => %error, + ); + *last_err = Err(error).with_context(|| { + format!("failed to find instances with {what}",) + }); + Vec::new() + } + } + } + + let mut last_err = Ok(()); + let mut sagas = JoinSet::new(); + + // NOTE(eliza): These don't, strictly speaking, need to be two separate + // queries, they probably could instead be `OR`ed together in SQL. I + // just thought it was nice to be able to record the number of instances + // found separately for each state. + let destroyed_active_vmms = find_instances( + "destroyed active VMMs", + &opctx.log, + &mut last_err, + self.datastore.find_instances_with_destroyed_active_vmms(opctx), + ) + .await; + stats.destroyed_active_vmms = destroyed_active_vmms.len(); + self.start_sagas( + &opctx, + stats, + &mut last_err, + &mut sagas, + destroyed_active_vmms, + ) + .await; + + let terminated_active_migrations = find_instances( + "terminated active migrations", + &opctx.log, + &mut last_err, + self.datastore + .find_instances_with_terminated_active_migrations(opctx), + ) + .await; + stats.terminated_active_migrations = terminated_active_migrations.len(); + self.start_sagas( + &opctx, + stats, + &mut last_err, + &mut sagas, + terminated_active_migrations, + ) + .await; + + // Now, wait for the sagas to complete. + while let Some(saga_result) = sagas.join_next().await { + match saga_result { + Err(err) => { + debug_assert!( + false, + "since nexus is compiled with `panic=\"abort\"`, and \ + we never cancel the tasks on the `JoinSet`, a \ + `JoinError` should never be observed!", + ); + stats.sagas_failed += 1; + last_err = Err(err.into()); + } + Ok(Err(err)) => { + warn!(opctx.log, "update saga failed!"; "error" => %err); + stats.sagas_failed += 1; + last_err = Err(err); + } + Ok(Ok(())) => stats.sagas_completed += 1, + } + } + + last_err + } + + async fn start_sagas( + &self, + opctx: &OpContext, + stats: &mut ActivationStats, + last_err: &mut Result<(), anyhow::Error>, + sagas: &mut JoinSet>, + instances: impl IntoIterator, + ) { + let serialized_authn = authn::saga::Serialized::for_opctx(opctx); + for instance in instances { + let instance_id = instance.id(); + let saga = async { + let (.., authz_instance) = + LookupPath::new(&opctx, &self.datastore) + .instance_id(instance_id) + .lookup_for(authz::Action::Modify) + .await?; + instance_update::SagaInstanceUpdate::prepare( + &instance_update::Params { + serialized_authn: serialized_authn.clone(), + authz_instance, + }, + ) + .with_context(|| { + format!("failed to prepare instance-update saga for {instance_id}") + }) + } + .await; + match saga { + Ok(saga) => { + let start_saga = self.sagas.clone(); + sagas.spawn(async move { + start_saga.saga_start(saga).await.with_context(|| { + format!("update saga for {instance_id} failed") + }) + }); + stats.sagas_started += 1; + } + Err(err) => { + warn!( + opctx.log, + "failed to start instance-update saga!"; + "instance_id" => %instance_id, + "error" => %err, + ); + stats.saga_start_failures += 1; + *last_err = Err(err); + } + } + } + } +} + +#[derive(Default)] +struct ActivationStats { + destroyed_active_vmms: usize, + terminated_active_migrations: usize, + sagas_started: usize, + sagas_completed: usize, + sagas_failed: usize, + saga_start_failures: usize, +} + +impl BackgroundTask for InstanceUpdater { + fn activate<'a>( + &'a mut self, + opctx: &'a OpContext, + ) -> BoxFuture<'a, serde_json::Value> { + async { + let mut stats = ActivationStats::default(); + + let error = if self.disable { + slog::info!(&opctx.log, "background instance updater explicitly disabled"); + None + } else { + match self.actually_activate(opctx, &mut stats).await { + Ok(()) => { + slog::info!( + &opctx.log, + "instance updater activation completed"; + "destroyed_active_vmms" => stats.destroyed_active_vmms, + "terminated_active_migrations" => stats.terminated_active_migrations, + "update_sagas_started" => stats.sagas_started, + "update_sagas_completed" => stats.sagas_completed, + ); + debug_assert_eq!( + stats.sagas_failed, + 0, + "if the task completed successfully, then no sagas \ + should have failed", + ); + debug_assert_eq!( + stats.saga_start_failures, + 0, + "if the task completed successfully, all sagas \ + should have started successfully" + ); + None + } + Err(error) => { + slog::warn!( + &opctx.log, + "instance updater activation failed!"; + "error" => %error, + "destroyed_active_vmms" => stats.destroyed_active_vmms, + "terminated_active_migrations" => stats.terminated_active_migrations, + "update_sagas_started" => stats.sagas_started, + "update_sagas_completed" => stats.sagas_completed, + "update_sagas_failed" => stats.sagas_failed, + "update_saga_start_failures" => stats.saga_start_failures, + ); + Some(error.to_string()) + } + } + }; + json!({ + "destroyed_active_vmms": stats.destroyed_active_vmms, + "terminated_active_migrations": stats.terminated_active_migrations, + "sagas_started": stats.sagas_started, + "sagas_completed": stats.sagas_completed, + "sagas_failed": stats.sagas_failed, + "saga_start_failures": stats.saga_start_failures, + "error": error, + }) + } + .boxed() + } +} diff --git a/nexus/src/app/background/tasks/instance_watcher.rs b/nexus/src/app/background/tasks/instance_watcher.rs index 8a41e2d062..f63c21105e 100644 --- a/nexus/src/app/background/tasks/instance_watcher.rs +++ b/nexus/src/app/background/tasks/instance_watcher.rs @@ -4,8 +4,8 @@ //! Background task for pulling instance state from sled-agents. -use crate::app::background::Activator; use crate::app::background::BackgroundTask; +use crate::app::saga::StartSaga; use futures::{future::BoxFuture, FutureExt}; use http::StatusCode; use nexus_db_model::Instance; @@ -17,6 +17,7 @@ use nexus_db_queries::db::pagination::Paginator; use nexus_db_queries::db::DataStore; use nexus_types::identity::Asset; use nexus_types::identity::Resource; +use omicron_common::api::external::Error; use omicron_common::api::external::InstanceState; use omicron_common::api::internal::nexus::SledInstanceState; use omicron_uuid_kinds::GenericUuid; @@ -37,10 +38,9 @@ use virtual_machine::VirtualMachine; /// Background task that periodically checks instance states. pub(crate) struct InstanceWatcher { datastore: Arc, - resolver: internal_dns::resolver::Resolver, + sagas: Arc, metrics: Arc>, id: WatcherIdentity, - v2p_manager: Activator, } const MAX_SLED_AGENTS: NonZeroU32 = unsafe { @@ -51,16 +51,15 @@ const MAX_SLED_AGENTS: NonZeroU32 = unsafe { impl InstanceWatcher { pub(crate) fn new( datastore: Arc, - resolver: internal_dns::resolver::Resolver, + sagas: Arc, producer_registry: &ProducerRegistry, id: WatcherIdentity, - v2p_manager: Activator, ) -> Self { let metrics = Arc::new(Mutex::new(metrics::Metrics::default())); producer_registry .register_producer(metrics::Producer(metrics.clone())) .unwrap(); - Self { datastore, resolver, metrics, id, v2p_manager } + Self { datastore, sagas, metrics, id } } fn check_instance( @@ -70,7 +69,7 @@ impl InstanceWatcher { target: VirtualMachine, ) -> impl Future + Send + 'static { let datastore = self.datastore.clone(); - let resolver = self.resolver.clone(); + let sagas = self.sagas.clone(); let opctx = opctx.child( std::iter::once(( @@ -80,7 +79,6 @@ impl InstanceWatcher { .collect(), ); let client = client.clone(); - let v2p_manager = self.v2p_manager.clone(); async move { slog::trace!(opctx.log, "checking on instance..."); @@ -89,8 +87,12 @@ impl InstanceWatcher { target.instance_id, )) .await; - let mut check = - Check { target, outcome: Default::default(), result: Ok(()) }; + let mut check = Check { + target, + outcome: Default::default(), + result: Ok(()), + update_saga_queued: false, + }; let state = match rsp { Ok(rsp) => rsp.into_inner(), Err(ClientError::ErrorResponse(rsp)) => { @@ -152,50 +154,37 @@ impl InstanceWatcher { let new_runtime_state: SledInstanceState = state.into(); check.outcome = CheckOutcome::Success(new_runtime_state.vmm_state.state.into()); - slog::debug!( + debug!( opctx.log, "updating instance state"; "state" => ?new_runtime_state.vmm_state.state, ); - check.result = crate::app::instance::notify_instance_updated( + match crate::app::instance::notify_instance_updated( &datastore, - &resolver, - &opctx, &opctx, - &opctx.log, - &InstanceUuid::from_untyped_uuid(target.instance_id), + InstanceUuid::from_untyped_uuid(target.instance_id), &new_runtime_state, - &v2p_manager, ) .await - .map_err(|e| { - slog::warn!( - opctx.log, - "error updating instance"; - "error" => ?e, - "state" => ?new_runtime_state.vmm_state.state, - ); - Incomplete::UpdateFailed - }) - .and_then(|updated| { - updated.ok_or_else(|| { - slog::warn!( - opctx.log, - "error updating instance: not found in database"; - "state" => ?new_runtime_state.vmm_state.state, - ); - Incomplete::InstanceNotFound - }) - }) - .map(|updated| { - slog::debug!( - opctx.log, - "update successful"; - "instance_updated" => updated.instance_updated, - "vmm_updated" => updated.vmm_updated, - "state" => ?new_runtime_state.vmm_state.state, - ); - }); + { + Err(e) => { + warn!(opctx.log, "error updating instance"; "error" => %e); + check.result = match e { + Error::ObjectNotFound { .. } => { + Err(Incomplete::InstanceNotFound) + } + _ => Err(Incomplete::UpdateFailed), + }; + } + Ok(Some(saga)) => { + check.update_saga_queued = true; + if let Err(e) = sagas.saga_start(saga).await { + warn!(opctx.log, "update saga failed"; "error" => ?e); + check.result = Err(Incomplete::UpdateFailed); + } + } + Ok(None) => {} + }; check } @@ -259,6 +248,8 @@ struct Check { /// Depending on when the error occurred, the `outcome` field may also /// be populated. result: Result<(), Incomplete>, + + update_saga_queued: bool, } #[derive(Debug, Clone, Copy, PartialEq, Eq, PartialOrd, Ord, Default)] @@ -418,6 +409,7 @@ impl BackgroundTask for InstanceWatcher { // Now, wait for the check results to come back. let mut total: usize = 0; + let mut update_sagas_queued: usize = 0; let mut instance_states: BTreeMap = BTreeMap::new(); let mut check_failures: BTreeMap = @@ -446,7 +438,11 @@ impl BackgroundTask for InstanceWatcher { if let Err(ref reason) = check.result { *check_errors.entry(reason.as_str().into_owned()).or_default() += 1; } + if check.update_saga_queued { + update_sagas_queued += 1; + } self.metrics.lock().unwrap().record_check(check); + } // All requests completed! Prune any old instance metrics for @@ -460,6 +456,7 @@ impl BackgroundTask for InstanceWatcher { "total_completed" => instance_states.len() + check_failures.len(), "total_failed" => check_failures.len(), "total_incomplete" => check_errors.len(), + "update_sagas_queued" => update_sagas_queued, "pruned_instances" => pruned, ); serde_json::json!({ @@ -467,6 +464,7 @@ impl BackgroundTask for InstanceWatcher { "instance_states": instance_states, "failed_checks": check_failures, "incomplete_checks": check_errors, + "update_sagas_queued": update_sagas_queued, "pruned_instances": pruned, }) } diff --git a/nexus/src/app/background/tasks/mod.rs b/nexus/src/app/background/tasks/mod.rs index 5062799bdb..fe041a6daa 100644 --- a/nexus/src/app/background/tasks/mod.rs +++ b/nexus/src/app/background/tasks/mod.rs @@ -14,6 +14,7 @@ pub mod dns_config; pub mod dns_propagation; pub mod dns_servers; pub mod external_endpoints; +pub mod instance_updater; pub mod instance_watcher; pub mod inventory_collection; pub mod lookup_region_port; diff --git a/nexus/src/app/instance.rs b/nexus/src/app/instance.rs index e6866bfab6..344d2688f7 100644 --- a/nexus/src/app/instance.rs +++ b/nexus/src/app/instance.rs @@ -13,12 +13,12 @@ use super::MAX_SSH_KEYS_PER_INSTANCE; use super::MAX_VCPU_PER_INSTANCE; use super::MIN_MEMORY_BYTES_PER_INSTANCE; use crate::app::sagas; +use crate::app::sagas::NexusSaga; use crate::cidata::InstanceCiData; use crate::external_api::params; use cancel_safe_futures::prelude::*; use futures::future::Fuse; use futures::{FutureExt, SinkExt, StreamExt}; -use nexus_db_model::InstanceState as DbInstanceState; use nexus_db_model::IpAttachState; use nexus_db_model::IpKind; use nexus_db_model::Vmm as DbVmm; @@ -27,7 +27,6 @@ use nexus_db_queries::authn; use nexus_db_queries::authz; use nexus_db_queries::context::OpContext; use nexus_db_queries::db; -use nexus_db_queries::db::datastore::instance::InstanceUpdateResult; use nexus_db_queries::db::datastore::InstanceAndActiveVmm; use nexus_db_queries::db::identity::Resource; use nexus_db_queries::db::lookup; @@ -47,7 +46,6 @@ use omicron_common::api::external::LookupResult; use omicron_common::api::external::NameOrId; use omicron_common::api::external::UpdateResult; use omicron_common::api::internal::nexus; -use omicron_common::api::internal::nexus::VmmState; use omicron_common::api::internal::shared::SourceNatConfig; use omicron_uuid_kinds::GenericUuid; use omicron_uuid_kinds::InstanceUuid; @@ -60,10 +58,8 @@ use propolis_client::support::InstanceSerialConsoleHelper; use propolis_client::support::WSClientOffset; use propolis_client::support::WebSocketStream; use sagas::instance_common::ExternalIpAttach; -use sled_agent_client::types::InstanceMigrationSourceParams; use sled_agent_client::types::InstanceMigrationTargetParams; use sled_agent_client::types::InstanceProperties; -use sled_agent_client::types::InstancePutMigrationIdsBody; use sled_agent_client::types::InstancePutStateBody; use std::matches; use std::net::SocketAddr; @@ -530,144 +526,6 @@ impl super::Nexus { self.db_datastore.instance_fetch_with_vmm(opctx, &authz_instance).await } - /// Attempts to set the migration IDs for the supplied instance via the - /// sled specified in `db_instance`. - /// - /// The caller is assumed to have fetched the current instance record from - /// the DB and verified that the record has no migration IDs. - /// - /// Returns `Ok` and the updated instance record if this call successfully - /// updated the instance with the sled agent and that update was - /// successfully reflected into CRDB. Returns `Err` with an appropriate - /// error otherwise. - /// - /// # Panics - /// - /// Asserts that `db_instance` has no migration ID or destination Propolis - /// ID set. - pub(crate) async fn instance_set_migration_ids( - &self, - opctx: &OpContext, - instance_id: InstanceUuid, - sled_id: SledUuid, - prev_instance_runtime: &db::model::InstanceRuntimeState, - migration_params: InstanceMigrationSourceParams, - ) -> UpdateResult { - assert!(prev_instance_runtime.migration_id.is_none()); - assert!(prev_instance_runtime.dst_propolis_id.is_none()); - - let (.., authz_instance) = LookupPath::new(opctx, &self.db_datastore) - .instance_id(instance_id.into_untyped_uuid()) - .lookup_for(authz::Action::Modify) - .await?; - - let sa = self.sled_client(&sled_id).await?; - let instance_put_result = sa - .instance_put_migration_ids( - &instance_id, - &InstancePutMigrationIdsBody { - old_runtime: prev_instance_runtime.clone().into(), - migration_params: Some(migration_params), - }, - ) - .await - .map(|res| Some(res.into_inner().into())) - .map_err(|e| SledAgentInstancePutError(e)); - - // Write the updated instance runtime state back to CRDB. If this - // outright fails, this operation fails. If the operation nominally - // succeeds but nothing was updated, this action is outdated and the - // caller should not proceed with migration. - let InstanceUpdateResult { instance_updated, .. } = - match instance_put_result { - Ok(state) => { - self.write_returned_instance_state(&instance_id, state) - .await? - } - Err(e) => { - if e.instance_unhealthy() { - let _ = self - .mark_instance_failed( - &instance_id, - &prev_instance_runtime, - &e, - ) - .await; - } - return Err(e.into()); - } - }; - - if instance_updated { - Ok(self - .db_datastore - .instance_refetch(opctx, &authz_instance) - .await?) - } else { - Err(Error::conflict( - "instance is already migrating, or underwent an operation that \ - prevented this migration from proceeding" - )) - } - } - - /// Attempts to clear the migration IDs for the supplied instance via the - /// sled specified in `db_instance`. - /// - /// The supplied instance record must contain valid migration IDs. - /// - /// Returns `Ok` if sled agent accepted the request to clear migration IDs - /// and the resulting attempt to write instance runtime state back to CRDB - /// succeeded. This routine returns `Ok` even if the update was not actually - /// applied (due to a separate generation number change). - /// - /// # Panics - /// - /// Asserts that `db_instance` has a migration ID and destination Propolis - /// ID set. - pub(crate) async fn instance_clear_migration_ids( - &self, - instance_id: InstanceUuid, - sled_id: SledUuid, - prev_instance_runtime: &db::model::InstanceRuntimeState, - ) -> Result<(), Error> { - assert!(prev_instance_runtime.migration_id.is_some()); - assert!(prev_instance_runtime.dst_propolis_id.is_some()); - - let sa = self.sled_client(&sled_id).await?; - let instance_put_result = sa - .instance_put_migration_ids( - &instance_id, - &InstancePutMigrationIdsBody { - old_runtime: prev_instance_runtime.clone().into(), - migration_params: None, - }, - ) - .await - .map(|res| Some(res.into_inner().into())) - .map_err(|e| SledAgentInstancePutError(e)); - - match instance_put_result { - Ok(state) => { - self.write_returned_instance_state(&instance_id, state).await?; - } - Err(e) => { - if e.instance_unhealthy() { - let _ = self - .mark_instance_failed( - &instance_id, - &prev_instance_runtime, - &e, - ) - .await; - } - return Err(e.into()); - } - } - - Ok(()) - } - /// Reboot the specified instance. pub(crate) async fn instance_reboot( &self, @@ -836,11 +694,10 @@ impl super::Nexus { vmm_state: &Option, requested: &InstanceStateChangeRequest, ) -> Result { - let effective_state = if let Some(vmm) = vmm_state { - vmm.runtime.state.into() - } else { - instance_state.runtime().nexus_state.into() - }; + let effective_state = InstanceAndActiveVmm::determine_effective_state( + instance_state, + vmm_state.as_ref(), + ); // Requests that operate on active instances have to be directed to the // instance's current sled agent. If there is none, the request needs to @@ -992,13 +849,13 @@ impl super::Nexus { // the caller to let it decide how to handle it. // // When creating the zone for the first time, we just get - // Ok(None) here, which is a no-op in write_returned_instance_state. + // Ok(None) here, in which case, there's nothing to write back. match instance_put_result { - Ok(state) => self - .write_returned_instance_state(&instance_id, state) + Ok(Some(ref state)) => self + .notify_instance_updated(opctx, instance_id, state) .await - .map(|_| ()) .map_err(Into::into), + Ok(None) => Ok(()), Err(e) => Err(InstanceStateChangeError::SledAgent(e)), } } @@ -1279,12 +1136,13 @@ impl super::Nexus { }, ) .await - .map(|res| Some(res.into_inner().into())) + .map(|res| res.into_inner().into()) .map_err(|e| SledAgentInstancePutError(e)); match instance_register_result { Ok(state) => { - self.write_returned_instance_state(&instance_id, state).await?; + self.notify_instance_updated(opctx, instance_id, &state) + .await?; } Err(e) => { if e.instance_unhealthy() { @@ -1303,59 +1161,6 @@ impl super::Nexus { Ok(()) } - /// Takes an updated instance state returned from a call to sled agent and - /// writes it back to the database. - /// - /// # Return value - /// - /// - `Ok((instance_updated, vmm_updated))` if no failures occurred. The - /// tuple fields indicate which database records (if any) were updated. - /// Note that it is possible for sled agent not to return an updated - /// instance state from a particular API call. In that case, the `state` - /// parameter is `None` and this routine returns `Ok((false, false))`. - /// - `Err` if an error occurred while writing state to the database. A - /// database operation that succeeds but doesn't update anything (e.g. - /// owing to an outdated generation number) will return `Ok`. - async fn write_returned_instance_state( - &self, - instance_id: &InstanceUuid, - state: Option, - ) -> Result { - slog::debug!(&self.log, - "writing instance state returned from sled agent"; - "instance_id" => %instance_id, - "new_state" => ?state); - - if let Some(state) = state { - let update_result = self - .db_datastore - .instance_and_vmm_update_runtime( - instance_id, - &state.instance_state.into(), - &state.propolis_id, - &state.vmm_state.into(), - &state.migration_state, - ) - .await; - - slog::debug!(&self.log, - "attempted to write instance state from sled agent"; - "instance_id" => %instance_id, - "propolis_id" => %state.propolis_id, - "result" => ?update_result); - - update_result - } else { - // There was no instance state to write back, so --- perhaps - // obviously --- nothing happened. - Ok(InstanceUpdateResult { - instance_updated: false, - vmm_updated: false, - migration_updated: None, - }) - } - } - /// Attempts to move an instance from `prev_instance_runtime` to the /// `Failed` state in response to an error returned from a call to a sled /// agent instance API, supplied in `reason`. @@ -1519,21 +1324,74 @@ impl super::Nexus { pub(crate) async fn notify_instance_updated( &self, opctx: &OpContext, - instance_id: &InstanceUuid, + instance_id: InstanceUuid, new_runtime_state: &nexus::SledInstanceState, ) -> Result<(), Error> { - notify_instance_updated( - &self.datastore(), - self.resolver(), - &self.opctx_alloc, + let saga = notify_instance_updated( + &self.db_datastore, opctx, - &self.log, instance_id, new_runtime_state, - &self.background_tasks.task_v2p_manager, ) .await?; - self.vpc_needed_notify_sleds(); + + // We don't need to wait for the instance update saga to run to + // completion to return OK to the sled-agent --- all it needs to care + // about is that the VMM/migration state in the database was updated. + // Even if we fail to successfully start an update saga, the + // instance-updater background task will eventually see that the + // instance is in a state which requires an update saga, and ensure that + // one is eventually executed. + // + // Therefore, just spawn the update saga in a new task, and return. + if let Some(saga) = saga { + info!(opctx.log, "starting update saga for {instance_id}"; + "instance_id" => %instance_id, + "vmm_state" => ?new_runtime_state.vmm_state, + "migration_state" => ?new_runtime_state.migrations(), + ); + let sagas = self.sagas.clone(); + let task_instance_updater = + self.background_tasks.task_instance_updater.clone(); + let log = opctx.log.clone(); + tokio::spawn(async move { + // TODO(eliza): maybe we should use the lower level saga API so + // we can see if the saga failed due to the lock being held and + // retry it immediately? + let running_saga = async move { + let runnable_saga = sagas.saga_prepare(saga).await?; + runnable_saga.start().await + } + .await; + let result = match running_saga { + Err(error) => { + error!(&log, "failed to start update saga for {instance_id}"; + "instance_id" => %instance_id, + "error" => %error, + ); + // If we couldn't start the update saga for this + // instance, kick the instance-updater background task + // to try and start it again in a timely manner. + task_instance_updater.activate(); + return; + } + Ok(saga) => { + saga.wait_until_stopped().await.into_omicron_result() + } + }; + if let Err(error) = result { + error!(&log, "update saga for {instance_id} failed"; + "instance_id" => %instance_id, + "error" => %error, + ); + // If we couldn't complete the update saga for this + // instance, kick the instance-updater background task + // to try and start it again in a timely manner. + task_instance_updater.activate(); + } + }); + } + Ok(()) } @@ -1973,194 +1831,56 @@ impl super::Nexus { } /// Invoked by a sled agent to publish an updated runtime state for an -/// Instance. -#[allow(clippy::too_many_arguments)] // :( +/// Instance, returning an update saga for that instance (if one must be +/// executed). pub(crate) async fn notify_instance_updated( datastore: &DataStore, - resolver: &internal_dns::resolver::Resolver, - opctx_alloc: &OpContext, opctx: &OpContext, - log: &slog::Logger, - instance_id: &InstanceUuid, + instance_id: InstanceUuid, new_runtime_state: &nexus::SledInstanceState, - v2p_manager: &crate::app::background::Activator, -) -> Result, Error> { - let propolis_id = new_runtime_state.propolis_id; - - info!(log, "received new runtime state from sled agent"; - "instance_id" => %instance_id, - "instance_state" => ?new_runtime_state.instance_state, - "propolis_id" => %propolis_id, - "vmm_state" => ?new_runtime_state.vmm_state, - "migration_state" => ?new_runtime_state.migration_state); - - // Grab the current state of the instance in the DB to reason about - // whether this update is stale or not. - let (.., authz_instance, db_instance) = LookupPath::new(&opctx, &datastore) - .instance_id(instance_id.into_untyped_uuid()) - .fetch() - .await?; +) -> Result, Error> { + use sagas::instance_update; - // Update OPTE and Dendrite if the instance's active sled assignment - // changed or a migration was retired. If these actions fail, sled agent - // is expected to retry this update. - // - // This configuration must be updated before updating any state in CRDB - // so that, if the instance was migrating or has shut down, it will not - // appear to be able to migrate or start again until the appropriate - // networking state has been written. Without this interlock, another - // thread or another Nexus can race with this routine to write - // conflicting configuration. - // - // In the future, this should be replaced by a call to trigger a - // networking state update RPW. - super::instance_network::ensure_updated_instance_network_config( - datastore, - log, - resolver, - opctx, - opctx_alloc, - &authz_instance, - db_instance.runtime(), - &new_runtime_state.instance_state, - v2p_manager, - ) - .await?; - - // If the supplied instance state indicates that the instance no longer - // has an active VMM, attempt to delete the virtual provisioning record, - // and the assignment of the Propolis metric producer to an oximeter - // collector. - // - // As with updating networking state, this must be done before - // committing the new runtime state to the database: once the DB is - // written, a new start saga can arrive and start the instance, which - // will try to create its own virtual provisioning charges, which will - // race with this operation. - if new_runtime_state.instance_state.propolis_id.is_none() { - datastore - .virtual_provisioning_collection_delete_instance( - opctx, - *instance_id, - db_instance.project_id, - i64::from(db_instance.ncpus.0 .0), - db_instance.memory, - (&new_runtime_state.instance_state.gen).into(), - ) - .await?; - - // TODO-correctness: The `notify_instance_updated` method can run - // concurrently with itself in some situations, such as where a - // sled-agent attempts to update Nexus about a stopped instance; - // that times out; and it makes another request to a different - // Nexus. The call to `unassign_producer` is racy in those - // situations, and we may end with instances with no metrics. - // - // This unfortunate case should be handled as part of - // instance-lifecycle improvements, notably using a reliable - // persistent workflow to correctly update the oximete assignment as - // an instance's state changes. - // - // Tracked in https://github.com/oxidecomputer/omicron/issues/3742. - super::oximeter::unassign_producer( - datastore, - log, - opctx, - &instance_id.into_untyped_uuid(), - ) - .await?; - } + let migrations = new_runtime_state.migrations(); + let propolis_id = new_runtime_state.propolis_id; + info!(opctx.log, "received new VMM runtime state from sled agent"; + "instance_id" => %instance_id, + "propolis_id" => %propolis_id, + "vmm_state" => ?new_runtime_state.vmm_state, + "migration_state" => ?migrations, + ); - // Write the new instance and VMM states back to CRDB. This needs to be - // done before trying to clean up the VMM, since the datastore will only - // allow a VMM to be marked as deleted if it is already in a terminal - // state. let result = datastore - .instance_and_vmm_update_runtime( - instance_id, - &db::model::InstanceRuntimeState::from( - new_runtime_state.instance_state.clone(), - ), - &propolis_id, - &db::model::VmmRuntimeState::from( - new_runtime_state.vmm_state.clone(), - ), - &new_runtime_state.migration_state, + .vmm_and_migration_update_runtime( + &opctx, + propolis_id, + // TODO(eliza): probably should take this by value... + &new_runtime_state.vmm_state.clone().into(), + migrations, ) - .await; - - // If the VMM is now in a terminal state, make sure its resources get - // cleaned up. - // - // For idempotency, only check to see if the update was successfully - // processed and ignore whether the VMM record was actually updated. - // This is required to handle the case where this routine is called - // once, writes the terminal VMM state, fails before all per-VMM - // resources are released, returns a retriable error, and is retried: - // the per-VMM resources still need to be cleaned up, but the DB update - // will return Ok(_, false) because the database was already updated. - // - // Unlike the pre-update cases, it is legal to do this cleanup *after* - // committing state to the database, because a terminated VMM cannot be - // reused (restarting or migrating its former instance will use new VMM - // IDs). - if result.is_ok() { - let propolis_terminated = matches!( - new_runtime_state.vmm_state.state, - VmmState::Destroyed | VmmState::Failed - ); - - if propolis_terminated { - info!(log, "vmm is terminated, cleaning up resources"; - "instance_id" => %instance_id, - "propolis_id" => %propolis_id); - - datastore - .sled_reservation_delete(opctx, propolis_id.into_untyped_uuid()) - .await?; - - if !datastore.vmm_mark_deleted(opctx, &propolis_id).await? { - warn!(log, "failed to mark vmm record as deleted"; - "instance_id" => %instance_id, - "propolis_id" => %propolis_id, - "vmm_state" => ?new_runtime_state.vmm_state); - } - } - } - - match result { - Ok(result) => { - info!(log, "instance and vmm updated by sled agent"; - "instance_id" => %instance_id, - "propolis_id" => %propolis_id, - "instance_updated" => result.instance_updated, - "vmm_updated" => result.vmm_updated, - "migration_updated" => ?result.migration_updated); - Ok(Some(result)) - } - - // The update command should swallow object-not-found errors and - // return them back as failures to update, so this error case is - // unexpected. There's no work to do if this occurs, however. - Err(Error::ObjectNotFound { .. }) => { - error!(log, "instance/vmm update unexpectedly returned \ - an object not found error"; - "instance_id" => %instance_id, - "propolis_id" => %propolis_id); - Ok(None) - } + .await?; - // If the datastore is unavailable, propagate that to the caller. - // TODO-robustness Really this should be any _transient_ error. How - // can we distinguish? Maybe datastore should emit something - // different from Error with an Into. - Err(error) => { - warn!(log, "failed to update instance from sled agent"; - "instance_id" => %instance_id, - "propolis_id" => %propolis_id, - "error" => ?error); - Err(error) - } + // If an instance-update saga must be executed as a result of this update, + // prepare and return it. + if instance_update::update_saga_needed( + &opctx.log, + instance_id, + new_runtime_state, + &result, + ) { + let (.., authz_instance) = LookupPath::new(&opctx, datastore) + .instance_id(instance_id.into_untyped_uuid()) + .lookup_for(authz::Action::Modify) + .await?; + let saga = instance_update::SagaInstanceUpdate::prepare( + &instance_update::Params { + serialized_authn: authn::saga::Serialized::for_opctx(opctx), + authz_instance, + }, + )?; + Ok(Some(saga)) + } else { + Ok(None) } } @@ -2178,83 +1898,69 @@ fn instance_start_allowed( // // If the instance doesn't have an active VMM, see if the instance state // permits it to start. - if let Some(vmm) = vmm { - match vmm.runtime.state { - // If the VMM is already starting or is in another "active" - // state, succeed to make successful start attempts idempotent. - DbVmmState::Starting - | DbVmmState::Running - | DbVmmState::Rebooting - | DbVmmState::Migrating => { - debug!(log, "asked to start an active instance"; - "instance_id" => %instance.id()); - - Ok(InstanceStartDisposition::AlreadyStarted) - } - // If a previous start saga failed and left behind a VMM in the - // SagaUnwound state, allow a new start saga to try to overwrite - // it. - DbVmmState::SagaUnwound => { - debug!( - log, - "instance's last VMM's start saga unwound, OK to start"; - "instance_id" => %instance.id() - ); - - Ok(InstanceStartDisposition::Start) - } - // When sled agent publishes a Stopped state, Nexus should clean - // up the instance/VMM pointer. - DbVmmState::Stopped => { - let propolis_id = instance - .runtime() - .propolis_id - .expect("needed a VMM ID to fetch a VMM record"); - error!(log, - "instance is stopped but still has an active VMM"; - "instance_id" => %instance.id(), - "propolis_id" => %propolis_id); - - Err(Error::internal_error( - "instance is stopped but still has an active VMM", - )) - } - _ => Err(Error::conflict(&format!( - "instance is in state {} but must be {} to be started", - vmm.runtime.state, - InstanceState::Stopped - ))), + match state.effective_state() { + // If the VMM is already starting or is in another "active" + // state, succeed to make successful start attempts idempotent. + s @ InstanceState::Starting + | s @ InstanceState::Running + | s @ InstanceState::Rebooting + | s @ InstanceState::Migrating => { + debug!(log, "asked to start an active instance"; + "instance_id" => %instance.id(), + "state" => ?s); + + Ok(InstanceStartDisposition::AlreadyStarted) } - } else { - match instance.runtime_state.nexus_state { - // If the instance is in a known-good no-VMM state, it can - // start. - DbInstanceState::NoVmm => { - debug!(log, "instance has no VMM, OK to start"; - "instance_id" => %instance.id()); - - Ok(InstanceStartDisposition::Start) + InstanceState::Stopped => { + match vmm.as_ref() { + // If a previous start saga failed and left behind a VMM in the + // SagaUnwound state, allow a new start saga to try to overwrite + // it. + Some(vmm) if vmm.runtime.state == DbVmmState::SagaUnwound => { + debug!( + log, + "instance's last VMM's start saga unwound, OK to start"; + "instance_id" => %instance.id() + ); + + Ok(InstanceStartDisposition::Start) + } + // This shouldn't happen: `InstanceAndVmm::effective_state` should + // only return `Stopped` if there is no active VMM or if the VMM is + // `SagaUnwound`. + Some(vmm) => { + error!(log, + "instance is stopped but still has an active VMM"; + "instance_id" => %instance.id(), + "propolis_id" => %vmm.id, + "propolis_state" => ?vmm.runtime.state); + + Err(Error::internal_error( + "instance is stopped but still has an active VMM", + )) + } + // Ah, it's actually stopped. We can restart it. + None => Ok(InstanceStartDisposition::Start), } - // If the instance isn't ready yet or has been destroyed, it - // can't start. - // - // TODO(#2825): If the "Failed" state could be interpreted to - // mean "stopped abnormally" and not just "Nexus doesn't know - // what state the instance is in," it would be fine to start the - // instance here. See RFD 486. - DbInstanceState::Creating - | DbInstanceState::Failed - | DbInstanceState::Destroyed => Err(Error::conflict(&format!( - "instance is in state {} but must be {} to be started", - instance.runtime_state.nexus_state, + } + InstanceState::Stopping => { + let (propolis_id, propolis_state) = match vmm.as_ref() { + Some(vmm) => (Some(vmm.id), Some(vmm.runtime.state)), + None => (None, None), + }; + debug!(log, "instance's VMM is still in the process of stopping"; + "instance_id" => %instance.id(), + "propolis_id" => ?propolis_id, + "propolis_state" => ?propolis_state); + Err(Error::conflict( + "instance must finish stopping before it can be started", + )) + } + s => { + return Err(Error::conflict(&format!( + "instance is in state {s} but it must be {} to be started", InstanceState::Stopped - ))), - // If the instance is in the Vmm state, there should have been - // an active Propolis ID and a VMM record to read, so this - // branch shouldn't have been reached. - DbInstanceState::Vmm => Err(Error::internal_error( - "instance is in state Vmm but has no active VMM", - )), + ))) } } } @@ -2265,7 +1971,10 @@ mod tests { use super::*; use core::time::Duration; use futures::{SinkExt, StreamExt}; - use nexus_db_model::{Instance as DbInstance, VmmInitialState}; + use nexus_db_model::{ + Instance as DbInstance, InstanceState as DbInstanceState, + VmmInitialState, VmmState as DbVmmState, + }; use omicron_common::api::external::{ Hostname, IdentityMetadataCreateParams, InstanceCpuCount, Name, }; diff --git a/nexus/src/app/instance_network.rs b/nexus/src/app/instance_network.rs index 5f5274dea2..8cd0a34fbf 100644 --- a/nexus/src/app/instance_network.rs +++ b/nexus/src/app/instance_network.rs @@ -4,7 +4,6 @@ //! Routines that manage instance-related networking state. -use crate::app::background; use crate::app::switch_port; use ipnetwork::IpNetwork; use nexus_db_model::ExternalIp; @@ -14,11 +13,9 @@ use nexus_db_model::Ipv4NatValues; use nexus_db_model::Vni as DbVni; use nexus_db_queries::authz; use nexus_db_queries::context::OpContext; -use nexus_db_queries::db; use nexus_db_queries::db::lookup::LookupPath; use nexus_db_queries::db::DataStore; use omicron_common::api::external::Error; -use omicron_common::api::internal::nexus; use omicron_common::api::internal::shared::NetworkInterface; use omicron_common::api::internal::shared::SwitchLocation; use omicron_uuid_kinds::GenericUuid; @@ -230,175 +227,6 @@ pub(crate) async fn boundary_switches( Ok(boundary_switches) } -/// Given old and new instance runtime states, determines the desired -/// networking configuration for a given instance and ensures it has been -/// propagated to all relevant sleds. -/// -/// # Arguments -/// -/// - `datastore`: the datastore to use for lookups and updates. -/// - `log`: the [`slog::Logger`] to log to. -/// - `resolver`: an internal DNS resolver to look up DPD service addresses. -/// - `opctx`: An operation context for this operation. -/// - `opctx_alloc`: An operational context list permissions for all sleds. When -/// called by methods on the [`Nexus`] type, this is the `OpContext` used for -/// instance allocation. In a background task, this may be the background -/// task's operational context; nothing stops you from passing the same -/// `OpContext` as both `opctx` and `opctx_alloc`. -/// - `authz_instance``: A resolved authorization context for the instance of -/// interest. -/// - `prev_instance_state``: The most-recently-recorded instance runtime -/// state for this instance. -/// - `new_instance_state`: The instance state that the caller of this routine -/// has observed and that should be used to set up this instance's -/// networking state. -/// -/// # Return value -/// -/// `Ok(())` if this routine completed all the operations it wanted to -/// complete, or an appropriate `Err` otherwise. -#[allow(clippy::too_many_arguments)] // Yeah, I know, I know, Clippy... -pub(crate) async fn ensure_updated_instance_network_config( - datastore: &DataStore, - log: &slog::Logger, - resolver: &internal_dns::resolver::Resolver, - opctx: &OpContext, - opctx_alloc: &OpContext, - authz_instance: &authz::Instance, - prev_instance_state: &db::model::InstanceRuntimeState, - new_instance_state: &nexus::InstanceRuntimeState, - v2p_manager: &background::Activator, -) -> Result<(), Error> { - let instance_id = InstanceUuid::from_untyped_uuid(authz_instance.id()); - - // If this instance update is stale, do nothing, since the superseding - // update may have allowed the instance's location to change further. - if prev_instance_state.gen >= new_instance_state.gen.into() { - debug!(log, - "instance state generation already advanced, \ - won't touch network config"; - "instance_id" => %instance_id); - - return Ok(()); - } - - // If this update will retire the instance's active VMM, delete its - // networking state. It will be re-established the next time the - // instance starts. - if new_instance_state.propolis_id.is_none() { - info!(log, - "instance cleared its Propolis ID, cleaning network config"; - "instance_id" => %instance_id, - "propolis_id" => ?prev_instance_state.propolis_id); - - clear_instance_networking_state( - datastore, - log, - resolver, - opctx, - opctx_alloc, - authz_instance, - v2p_manager, - ) - .await?; - return Ok(()); - } - - // If the instance still has a migration in progress, don't change - // any networking state until an update arrives that retires that - // migration. - // - // This is needed to avoid the following race: - // - // 1. Migration from S to T completes. - // 2. Migration source sends an update that changes the instance's - // active VMM but leaves the migration ID in place. - // 3. Meanwhile, migration target sends an update that changes the - // instance's active VMM and clears the migration ID. - // 4. The migration target's call updates networking state and commits - // the new instance record. - // 5. The instance migrates from T to T' and Nexus applies networking - // configuration reflecting that the instance is on T'. - // 6. The update in step 2 applies configuration saying the instance - // is on sled T. - if new_instance_state.migration_id.is_some() { - debug!(log, - "instance still has a migration in progress, won't touch \ - network config"; - "instance_id" => %instance_id, - "migration_id" => ?new_instance_state.migration_id); - - return Ok(()); - } - - let new_propolis_id = new_instance_state.propolis_id.unwrap(); - - // Updates that end live migration need to push OPTE V2P state even if - // the instance's active sled did not change (see below). - let migration_retired = prev_instance_state.migration_id.is_some() - && new_instance_state.migration_id.is_none(); - - if (prev_instance_state.propolis_id - == new_instance_state.propolis_id.map(GenericUuid::into_untyped_uuid)) - && !migration_retired - { - debug!(log, "instance didn't move, won't touch network config"; - "instance_id" => %instance_id); - - return Ok(()); - } - - // Either the instance moved from one sled to another, or it attempted - // to migrate and failed. Ensure the correct networking configuration - // exists for its current home. - // - // TODO(#3107) This is necessary even if the instance didn't move, - // because registering a migration target on a sled creates OPTE ports - // for its VNICs, and that creates new V2P mappings on that sled that - // place the relevant virtual IPs on the local sled. Once OPTE stops - // creating these mappings, this path only needs to be taken if an - // instance has changed sleds. - let new_sled_id = match datastore - .vmm_fetch(&opctx, authz_instance, &new_propolis_id) - .await - { - Ok(vmm) => vmm.sled_id, - - // A VMM in the active position should never be destroyed. If the - // sled sending this message is the owner of the instance's last - // active VMM and is destroying it, it should also have retired that - // VMM. - Err(Error::ObjectNotFound { .. }) => { - error!(log, "instance's active vmm unexpectedly not found"; - "instance_id" => %instance_id, - "propolis_id" => %new_propolis_id); - - return Ok(()); - } - - Err(e) => return Err(e), - }; - - v2p_manager.activate(); - - let (.., sled) = - LookupPath::new(opctx, datastore).sled_id(new_sled_id).fetch().await?; - - instance_ensure_dpd_config( - datastore, - log, - resolver, - opctx, - opctx_alloc, - instance_id, - &sled.address(), - None, - ) - .await?; - - Ok(()) -} - /// Ensures that the Dendrite configuration for the supplied instance is /// up-to-date. /// @@ -685,43 +513,6 @@ pub(crate) async fn probe_ensure_dpd_config( Ok(()) } -/// Deletes an instance's OPTE V2P mappings and the boundary switch NAT -/// entries for its external IPs. -/// -/// This routine returns immediately upon encountering any errors (and will -/// not try to destroy any more objects after the point of failure). -async fn clear_instance_networking_state( - datastore: &DataStore, - log: &slog::Logger, - resolver: &internal_dns::resolver::Resolver, - opctx: &OpContext, - opctx_alloc: &OpContext, - authz_instance: &authz::Instance, - v2p_manager: &background::Activator, -) -> Result<(), Error> { - v2p_manager.activate(); - - instance_delete_dpd_config( - datastore, - log, - resolver, - opctx, - opctx_alloc, - authz_instance, - ) - .await?; - - notify_dendrite_nat_state( - datastore, - log, - resolver, - opctx_alloc, - Some(InstanceUuid::from_untyped_uuid(authz_instance.id())), - true, - ) - .await -} - /// Attempts to delete all of the Dendrite NAT configuration for the /// instance identified by `authz_instance`. /// diff --git a/nexus/src/app/saga.rs b/nexus/src/app/saga.rs index 2b510a0f12..fcdbb0db59 100644 --- a/nexus/src/app/saga.rs +++ b/nexus/src/app/saga.rs @@ -371,12 +371,6 @@ pub(crate) struct StoppedSaga { impl StoppedSaga { /// Fetches the raw Steno result for the saga's execution - /// - /// This is a test-only routine meant for use in tests that need to examine - /// the details of a saga's final state (e.g., examining the exact point at - /// which it failed). Non-test callers should use `into_omicron_result` - /// instead. - #[cfg(test)] pub(crate) fn into_raw_result(self) -> SagaResult { self.result } diff --git a/nexus/src/app/sagas/instance_create.rs b/nexus/src/app/sagas/instance_create.rs index 4f0ec7c0c6..d19230892f 100644 --- a/nexus/src/app/sagas/instance_create.rs +++ b/nexus/src/app/sagas/instance_create.rs @@ -1065,7 +1065,7 @@ pub mod test { app::sagas::instance_create::SagaInstanceCreate, app::sagas::test_helpers, external_api::params, }; - use async_bb8_diesel::{AsyncRunQueryDsl, AsyncSimpleConnection}; + use async_bb8_diesel::AsyncRunQueryDsl; use diesel::{ ExpressionMethods, OptionalExtension, QueryDsl, SelectableHelper, }; @@ -1201,39 +1201,6 @@ pub mod test { .is_none() } - async fn no_sled_resource_instance_records_exist( - datastore: &DataStore, - ) -> bool { - use nexus_db_queries::db::model::SledResource; - use nexus_db_queries::db::schema::sled_resource::dsl; - - let conn = datastore.pool_connection_for_tests().await.unwrap(); - - datastore - .transaction_retry_wrapper( - "no_sled_resource_instance_records_exist", - ) - .transaction(&conn, |conn| async move { - conn.batch_execute_async( - nexus_test_utils::db::ALLOW_FULL_TABLE_SCAN_SQL, - ) - .await - .unwrap(); - - Ok(dsl::sled_resource - .filter(dsl::kind.eq( - nexus_db_queries::db::model::SledResourceKind::Instance, - )) - .select(SledResource::as_select()) - .get_results_async::(&conn) - .await - .unwrap() - .is_empty()) - }) - .await - .unwrap() - } - async fn disk_is_detached(datastore: &DataStore) -> bool { use nexus_db_queries::db::model::Disk; use nexus_db_queries::db::schema::disk::dsl; @@ -1267,7 +1234,10 @@ pub mod test { assert!(no_instance_records_exist(datastore).await); assert!(no_network_interface_records_exist(datastore).await); assert!(no_external_ip_records_exist(datastore).await); - assert!(no_sled_resource_instance_records_exist(datastore).await); + assert!( + test_helpers::no_sled_resource_instance_records_exist(cptestctx) + .await + ); assert!( test_helpers::no_virtual_provisioning_resource_records_exist( cptestctx diff --git a/nexus/src/app/sagas/instance_migrate.rs b/nexus/src/app/sagas/instance_migrate.rs index b8599feb04..bb4bf282e4 100644 --- a/nexus/src/app/sagas/instance_migrate.rs +++ b/nexus/src/app/sagas/instance_migrate.rs @@ -16,9 +16,7 @@ use nexus_db_queries::{authn, authz, db}; use omicron_uuid_kinds::{GenericUuid, InstanceUuid, PropolisUuid, SledUuid}; use serde::Deserialize; use serde::Serialize; -use sled_agent_client::types::{ - InstanceMigrationSourceParams, InstanceMigrationTargetParams, -}; +use sled_agent_client::types::InstanceMigrationTargetParams; use slog::warn; use std::net::{Ipv6Addr, SocketAddr}; use steno::ActionError; @@ -72,22 +70,44 @@ declare_saga_actions! { CREATE_MIGRATION_RECORD -> "migration_record" { + sim_create_migration_record - - sim_delete_migration_record + - sim_fail_migration_record } - // This step the instance's migration ID and destination Propolis ID - // fields. Because the instance is active, its current sled agent maintains - // its most recent runtime state, so to update it, the saga calls into the - // sled and asks it to produce an updated instance record with the - // appropriate migration IDs and a new generation number. + // fields in the database. + // + // If the instance's migration ID has already been set when we attempt to + // set ours, that means we have probably raced with another migrate saga for + // the same instance. If this is the case, this action will fail and the + // saga will unwind. + // + // Yes, it's a bit unfortunate that our attempt to compare-and-swap in a + // migration ID happens only after we've created VMM and migration records, + // and that we'll have to destroy them as we unwind. However, the + // alternative, setting the migration IDs *before* records for the target + // VMM and the migration are created, would mean that there is a period of + // time during which the instance record contains foreign keys into the + // `vmm` and `migration` tables that don't have corresponding records to + // those tables. Because the `instance` table is queried in the public API, + // we take care to ensure that it doesn't have "dangling pointers" to + // records in the `vmm` and `migration` tables that don't exist yet. + // + // Note that unwinding this action does *not* clear the migration IDs from + // the instance record. This is to avoid a potential race with the instance + // update saga where: // - // The source sled agent synchronizes concurrent attempts to set these IDs. - // Setting a new migration ID and re-setting an existing ID are allowed, but - // trying to set an ID when a different ID is already present fails. + // - a `instance-migrate` saga sets the migration IDs at instance state + // generation _N_ + // - an `instance-update` saga increments the instance's state generation to + // _N_ + 1 + // - the `instance-migrate` saga unwinds and attempts to clear the migration + // IDs, but can't, because the state generation has advanced. + // + // Instead, we leave the migration IDs in place and rely on setting the VMM + // state to `SagaUnwound` to indicate to other future `instance-migrate` + // sagas that it's okay to start a new migration. SET_MIGRATION_IDS -> "set_migration_ids" { + sim_set_migration_ids - - sim_clear_migration_ids } // This step registers the instance with the destination sled. Care is @@ -239,7 +259,7 @@ async fn sim_create_migration_record( .map_err(ActionError::action_failed) } -async fn sim_delete_migration_record( +async fn sim_fail_migration_record( sagactx: NexusActionContext, ) -> Result<(), anyhow::Error> { let osagactx: &std::sync::Arc = @@ -251,9 +271,24 @@ async fn sim_delete_migration_record( ); let migration_id = sagactx.lookup::("migrate_id")?; - info!(osagactx.log(), "deleting migration record"; - "migration_id" => %migration_id); - osagactx.datastore().migration_mark_deleted(&opctx, migration_id).await?; + info!( + osagactx.log(), + "migration saga unwinding, marking migration record as failed"; + "instance_id" => %params.instance.id(), + "migration_id" => %migration_id, + ); + // If the migration record wasn't updated, this means it's already deleted, + // which...seems weird, but isn't worth getting the whole saga unwind stuck over. + if let Err(e) = + osagactx.datastore().migration_mark_failed(&opctx, migration_id).await + { + warn!(osagactx.log(), + "Error marking migration record as failed during rollback"; + "instance_id" => %params.instance.id(), + "migration_id" => %migration_id, + "error" => ?e); + } + Ok(()) } @@ -323,75 +358,28 @@ async fn sim_set_migration_ids( let db_instance = ¶ms.instance; let instance_id = InstanceUuid::from_untyped_uuid(db_instance.id()); - let src_sled_id = SledUuid::from_untyped_uuid(params.src_vmm.sled_id); + let src_propolis_id = PropolisUuid::from_untyped_uuid(params.src_vmm.id); let migration_id = sagactx.lookup::("migrate_id")?; let dst_propolis_id = sagactx.lookup::("dst_propolis_id")?; - info!(osagactx.log(), "setting migration IDs on migration source sled"; + info!(osagactx.log(), "setting instance migration IDs"; "instance_id" => %db_instance.id(), - "sled_id" => %src_sled_id, "migration_id" => %migration_id, + "src_propolis_id" => %src_propolis_id, "dst_propolis_id" => %dst_propolis_id, "prev_runtime_state" => ?db_instance.runtime()); - let updated_record = osagactx - .nexus() + osagactx + .datastore() .instance_set_migration_ids( &opctx, instance_id, - src_sled_id, - db_instance.runtime(), - InstanceMigrationSourceParams { dst_propolis_id, migration_id }, - ) - .await - .map_err(ActionError::action_failed)?; - - Ok(updated_record) -} - -async fn sim_clear_migration_ids( - sagactx: NexusActionContext, -) -> Result<(), anyhow::Error> { - let osagactx = sagactx.user_data(); - let params = sagactx.saga_params::()?; - let src_sled_id = SledUuid::from_untyped_uuid(params.src_vmm.sled_id); - let db_instance = - sagactx.lookup::("set_migration_ids")?; - let instance_id = InstanceUuid::from_untyped_uuid(db_instance.id()); - - info!(osagactx.log(), "clearing migration IDs for saga unwind"; - "instance_id" => %db_instance.id(), - "sled_id" => %src_sled_id, - "prev_runtime_state" => ?db_instance.runtime()); - - // Because the migration never actually started (and thus didn't finish), - // the instance should be at the same Propolis generation as it was when - // migration IDs were set, which means sled agent should accept a request to - // clear them. The only exception is if the instance stopped, but that also - // clears its migration IDs; in that case there is no work to do here. - // - // Other failures to clear migration IDs are handled like any other failure - // to update an instance's state: the callee attempts to mark the instance - // as failed; if the failure occurred because the instance changed state - // such that sled agent could not fulfill the request, the callee will - // produce a stale generation number and will not actually mark the instance - // as failed. - if let Err(e) = osagactx - .nexus() - .instance_clear_migration_ids( - instance_id, - src_sled_id, - db_instance.runtime(), + src_propolis_id, + migration_id, + dst_propolis_id, ) .await - { - warn!(osagactx.log(), - "Error clearing migration IDs during rollback"; - "instance_id" => %instance_id, - "error" => ?e); - } - - Ok(()) + .map_err(ActionError::action_failed) } async fn sim_ensure_destination_propolis( @@ -575,21 +563,16 @@ async fn sim_instance_migrate( #[cfg(test)] mod tests { + use super::*; use crate::app::sagas::test_helpers; - use camino::Utf8Path; use dropshot::test_util::ClientTestContext; - use nexus_test_interface::NexusServer; use nexus_test_utils::resource_helpers::{ create_default_ip_pool, create_project, object_create, }; - use nexus_test_utils::start_sled_agent; use nexus_test_utils_macros::nexus_test; use omicron_common::api::external::{ ByteCount, IdentityMetadataCreateParams, InstanceCpuCount, }; - use omicron_sled_agent::sim::Server; - - use super::*; type ControlPlaneTestContext = nexus_test_utils::ControlPlaneTestContext; @@ -603,35 +586,6 @@ mod tests { project.identity.id } - async fn add_sleds( - cptestctx: &ControlPlaneTestContext, - num_sleds: usize, - ) -> Vec<(SledUuid, Server)> { - let mut sas = Vec::with_capacity(num_sleds); - for _ in 0..num_sleds { - let sa_id = SledUuid::new_v4(); - let log = - cptestctx.logctx.log.new(o!("sled_id" => sa_id.to_string())); - let addr = - cptestctx.server.get_http_server_internal_address().await; - - info!(&cptestctx.logctx.log, "Adding simulated sled"; "sled_id" => %sa_id); - let update_dir = Utf8Path::new("/should/be/unused"); - let sa = start_sled_agent( - log, - addr, - sa_id, - &update_dir, - omicron_sled_agent::sim::SimMode::Explicit, - ) - .await - .unwrap(); - sas.push((sa_id, sa)); - } - - sas - } - async fn create_instance( client: &ClientTestContext, ) -> omicron_common::api::external::Instance { @@ -659,32 +613,11 @@ mod tests { .await } - fn select_first_alternate_sled( - db_vmm: &db::model::Vmm, - other_sleds: &[(SledUuid, Server)], - ) -> SledUuid { - let default_sled_uuid: SledUuid = - nexus_test_utils::SLED_AGENT_UUID.parse().unwrap(); - if other_sleds.is_empty() { - panic!("need at least one other sled"); - } - - if other_sleds.iter().any(|sled| sled.0 == default_sled_uuid) { - panic!("default test sled agent was in other_sleds"); - } - - if db_vmm.sled_id == default_sled_uuid.into_untyped_uuid() { - other_sleds[0].0 - } else { - default_sled_uuid - } - } - #[nexus_test(server = crate::Server)] async fn test_saga_basic_usage_succeeds( cptestctx: &ControlPlaneTestContext, ) { - let other_sleds = add_sleds(cptestctx, 1).await; + let other_sleds = test_helpers::add_sleds(cptestctx, 1).await; let client = &cptestctx.external_client; let nexus = &cptestctx.server.server_context().nexus; let _project_id = setup_test_project(&client).await; @@ -698,7 +631,8 @@ mod tests { let state = test_helpers::instance_fetch(cptestctx, instance_id).await; let vmm = state.vmm().as_ref().unwrap(); - let dst_sled_id = select_first_alternate_sled(vmm, &other_sleds); + let dst_sled_id = + test_helpers::select_first_alternate_sled(vmm, &other_sleds[..]); let params = Params { serialized_authn: authn::saga::Serialized::for_opctx(&opctx), instance: state.instance().clone(), @@ -731,7 +665,7 @@ mod tests { cptestctx: &ControlPlaneTestContext, ) { let log = &cptestctx.logctx.log; - let other_sleds = add_sleds(cptestctx, 1).await; + let other_sleds = test_helpers::add_sleds(cptestctx, 1).await; let client = &cptestctx.external_client; let nexus = &cptestctx.server.server_context().nexus; let _project_id = setup_test_project(&client).await; @@ -756,8 +690,10 @@ mod tests { .as_ref() .expect("instance should have a vmm before migrating"); - let dst_sled_id = - select_first_alternate_sled(old_vmm, &other_sleds); + let dst_sled_id = test_helpers::select_first_alternate_sled( + old_vmm, + &other_sleds[..], + ); info!(log, "setting up new migration saga"; "old_instance" => ?old_instance, @@ -781,24 +717,44 @@ mod tests { let after_saga = || -> futures::future::BoxFuture<'_, ()> { Box::pin({ async { - // Unwinding at any step should clear the migration IDs from - // the instance record and leave the instance's location - // otherwise untouched. - let new_state = - test_helpers::instance_fetch(cptestctx, instance_id) - .await; - - let new_instance = new_state.instance(); - let new_vmm = - new_state.vmm().as_ref().expect("vmm should be active"); + let new_state = test_helpers::instance_fetch_all( + cptestctx, + instance_id, + ) + .await; + + let new_instance = new_state.instance; + let new_vmm = new_state + .active_vmm + .as_ref() + .expect("vmm should be active"); - assert!(new_instance.runtime().migration_id.is_none()); - assert!(new_instance.runtime().dst_propolis_id.is_none()); assert_eq!( new_instance.runtime().propolis_id.unwrap(), new_vmm.id ); + // If the instance has had migration IDs set, then both + // sides of the migration should be marked as failed. + if let Some(migration) = new_state.migration { + assert_eq!( + migration.source_state, + db::model::MigrationState::FAILED + ); + assert_eq!( + migration.target_state, + db::model::MigrationState::FAILED + ); + } + // If the instance has a target VMM ID left behind by the + // unwinding saga, that VMM must be in the `SagaUnwound` state. + if let Some(target_vmm) = new_state.target_vmm { + assert_eq!( + target_vmm.runtime.state, + db::model::VmmState::SagaUnwound + ); + } + info!( &log, "migration saga unwind: stopping instance after failed \ @@ -812,17 +768,19 @@ mod tests { test_helpers::instance_stop(cptestctx, &instance_id).await; test_helpers::instance_simulate(cptestctx, &instance_id) .await; - - let new_state = - test_helpers::instance_fetch(cptestctx, instance_id) - .await; + // Wait until the instance has advanced to the `NoVmm` + // state. This may not happen immediately, as an + // instance-update saga must complete to update the + // instance's state. + let new_state = test_helpers::instance_wait_for_state( + cptestctx, + instance_id, + nexus_db_model::InstanceState::NoVmm, + ) + .await; let new_instance = new_state.instance(); let new_vmm = new_state.vmm().as_ref(); - assert_eq!( - new_instance.runtime().nexus_state, - nexus_db_model::InstanceState::NoVmm, - ); assert!(new_instance.runtime().propolis_id.is_none()); assert!(new_vmm.is_none()); diff --git a/nexus/src/app/sagas/instance_start.rs b/nexus/src/app/sagas/instance_start.rs index adde040a77..9e4e010eea 100644 --- a/nexus/src/app/sagas/instance_start.rs +++ b/nexus/src/app/sagas/instance_start.rs @@ -235,21 +235,38 @@ async fn sis_move_to_starting( // For idempotency, refetch the instance to see if this step already applied // its desired update. - let (.., db_instance) = LookupPath::new(&opctx, &datastore) + let (_, _, authz_instance, ..) = LookupPath::new(&opctx, &datastore) .instance_id(instance_id.into_untyped_uuid()) .fetch_for(authz::Action::Modify) .await .map_err(ActionError::action_failed)?; + let state = datastore + .instance_fetch_with_vmm(&opctx, &authz_instance) + .await + .map_err(ActionError::action_failed)?; + + let db_instance = state.instance(); - match db_instance.runtime().propolis_id { + // If `true`, we have unlinked a Propolis ID left behind by a previous + // unwinding start saga, and we should activate the activate the abandoned + // VMM reaper background task once we've written back the instance record. + let mut abandoned_unwound_vmm = false; + match state.vmm() { // If this saga's Propolis ID is already written to the record, then // this step must have completed already and is being retried, so // proceed. - Some(db_id) if db_id == propolis_id.into_untyped_uuid() => { + Some(vmm) if vmm.id == propolis_id.into_untyped_uuid() => { info!(osagactx.log(), "start saga: Propolis ID already set"; "instance_id" => %instance_id); - Ok(db_instance) + return Ok(db_instance.clone()); + } + + // If the instance has a Propolis ID, but the Propolis was left behind + // by a previous start saga unwinding, that's fine, we can just clear it + // out and proceed as though there was no Propolis ID here. + Some(vmm) if vmm.runtime.state == db::model::VmmState::SagaUnwound => { + abandoned_unwound_vmm = true; } // If the instance has a different Propolis ID, a competing start saga @@ -266,33 +283,38 @@ async fn sis_move_to_starting( // this point causes the VMM's state, which is Starting, to supersede // the instance's state, so this won't cause the instance to appear to // be running before Propolis thinks it has started.) - None => { - let new_runtime = db::model::InstanceRuntimeState { - nexus_state: db::model::InstanceState::Vmm, - propolis_id: Some(propolis_id.into_untyped_uuid()), - time_updated: Utc::now(), - gen: db_instance.runtime().gen.next().into(), - ..db_instance.runtime_state - }; - - // Bail if another actor managed to update the instance's state in - // the meantime. - if !osagactx - .datastore() - .instance_update_runtime(&instance_id, &new_runtime) - .await - .map_err(ActionError::action_failed)? - { - return Err(ActionError::action_failed(Error::conflict( - "instance changed state before it could be started", - ))); - } + None => {} + } - let mut new_record = db_instance.clone(); - new_record.runtime_state = new_runtime; - Ok(new_record) - } + let new_runtime = db::model::InstanceRuntimeState { + nexus_state: db::model::InstanceState::Vmm, + propolis_id: Some(propolis_id.into_untyped_uuid()), + time_updated: Utc::now(), + gen: db_instance.runtime().gen.next().into(), + ..db_instance.runtime_state + }; + + // Bail if another actor managed to update the instance's state in + // the meantime. + if !osagactx + .datastore() + .instance_update_runtime(&instance_id, &new_runtime) + .await + .map_err(ActionError::action_failed)? + { + return Err(ActionError::action_failed(Error::conflict( + "instance changed state before it could be started", + ))); + } + + // Don't fear the reaper! + if abandoned_unwound_vmm { + osagactx.nexus().background_tasks.task_abandoned_vmm_reaper.activate(); } + + let mut new_record = db_instance.clone(); + new_record.runtime_state = new_runtime; + Ok(new_record) } async fn sis_move_to_starting_undo( @@ -363,9 +385,6 @@ async fn sis_account_virtual_resources_undo( ¶ms.serialized_authn, ); - let started_record = - sagactx.lookup::("started_record")?; - osagactx .datastore() .virtual_provisioning_collection_delete_instance( @@ -374,11 +393,6 @@ async fn sis_account_virtual_resources_undo( params.db_instance.project_id, i64::from(params.db_instance.ncpus.0 .0), nexus_db_model::ByteCount(*params.db_instance.memory), - // Use the next instance generation number as the generation limit - // to ensure the provisioning counters are released. (The "mark as - // starting" undo step will "publish" this new state generation when - // it moves the instance back to Stopped.) - (&started_record.runtime().gen.next()).into(), ) .await .map_err(ActionError::action_failed)?; @@ -810,28 +824,23 @@ mod test { }) }, || { - Box::pin({ - async { - let new_db_instance = test_helpers::instance_fetch( - cptestctx, - instance_id, - ) - .await.instance().clone(); - - info!(log, - "fetched instance runtime state after saga execution"; - "instance_id" => %instance.identity.id, - "instance_runtime" => ?new_db_instance.runtime()); - - assert!(new_db_instance.runtime().propolis_id.is_none()); - assert_eq!( - new_db_instance.runtime().nexus_state, - nexus_db_model::InstanceState::NoVmm - ); - - assert!(test_helpers::no_virtual_provisioning_resource_records_exist(cptestctx).await); - assert!(test_helpers::no_virtual_provisioning_collection_records_using_instances(cptestctx).await); - } + Box::pin(async { + let new_db_state = test_helpers::instance_wait_for_state( + cptestctx, + instance_id, + nexus_db_model::InstanceState::NoVmm, + ).await; + let new_db_instance = new_db_state.instance(); + + info!(log, + "fetched instance runtime state after saga execution"; + "instance_id" => %instance.identity.id, + "instance_runtime" => ?new_db_instance.runtime()); + + assert!(new_db_instance.runtime().propolis_id.is_none()); + + assert!(test_helpers::no_virtual_provisioning_resource_records_exist(cptestctx).await); + assert!(test_helpers::no_virtual_provisioning_collection_records_using_instances(cptestctx).await); }) }, log, diff --git a/nexus/src/app/sagas/instance_update/destroyed.rs b/nexus/src/app/sagas/instance_update/destroyed.rs new file mode 100644 index 0000000000..243f952c8b --- /dev/null +++ b/nexus/src/app/sagas/instance_update/destroyed.rs @@ -0,0 +1,127 @@ +// This Source Code Form is subject to the terms of the Mozilla Public +// License, v. 2.0. If a copy of the MPL was not distributed with this +// file, You can obtain one at https://mozilla.org/MPL/2.0/. + +use super::{ + declare_saga_actions, ActionRegistry, DagBuilder, NexusActionContext, + NexusSaga, SagaInitError, +}; +use crate::app::sagas::ActionError; +use nexus_db_queries::authn; +use omicron_common::api::external::Error; +use omicron_uuid_kinds::GenericUuid; +use omicron_uuid_kinds::InstanceUuid; +use omicron_uuid_kinds::PropolisUuid; +use serde::{Deserialize, Serialize}; + +// destroy VMM subsaga: input parameters + +#[derive(Debug, Deserialize, Serialize)] +pub(super) struct Params { + /// Authentication context to use to fetch the instance's current state from + /// the database. + pub(super) serialized_authn: authn::saga::Serialized, + + /// Instance UUID of the instance being updated. This is only just used + /// for logging, so we just use the instance ID here instead of serializing + /// a whole instance record. + pub(super) instance_id: InstanceUuid, + + /// UUID of the VMM to destroy. + pub(super) vmm_id: PropolisUuid, +} + +// destroy VMM subsaga: actions + +declare_saga_actions! { + destroy_vmm; + + // Deallocate physical sled resources reserved for the destroyed VMM, as it + // is no longer using them. + RELEASE_SLED_RESOURCES -> "release_sled_resources" { + + siu_destroyed_release_sled_resources + } + + // Mark the VMM record as deleted. + MARK_VMM_DELETED -> "mark_vmm_deleted" { + + siu_destroyed_mark_vmm_deleted + } +} + +// destroy VMM subsaga: definition + +#[derive(Debug)] +pub(super) struct SagaDestroyVmm; +impl NexusSaga for SagaDestroyVmm { + const NAME: &'static str = "destroy-vmm"; + type Params = Params; + + fn register_actions(registry: &mut ActionRegistry) { + destroy_vmm_register_actions(registry) + } + + fn make_saga_dag( + _: &Self::Params, + mut builder: DagBuilder, + ) -> Result { + builder.append(release_sled_resources_action()); + builder.append(mark_vmm_deleted_action()); + Ok(builder.build()?) + } +} + +async fn siu_destroyed_release_sled_resources( + sagactx: NexusActionContext, +) -> Result<(), ActionError> { + let osagactx = sagactx.user_data(); + let Params { ref serialized_authn, instance_id, vmm_id, .. } = + sagactx.saga_params::()?; + + let opctx = + crate::context::op_context_for_saga_action(&sagactx, serialized_authn); + + info!( + osagactx.log(), + "instance update (VMM destroyed): deallocating sled resource reservation"; + "instance_id" => %instance_id, + "propolis_id" => %vmm_id, + ); + + osagactx + .datastore() + .sled_reservation_delete(&opctx, vmm_id.into_untyped_uuid()) + .await + .or_else(|err| { + // Necessary for idempotency + match err { + Error::ObjectNotFound { .. } => Ok(()), + _ => Err(err), + } + }) + .map_err(ActionError::action_failed) +} + +async fn siu_destroyed_mark_vmm_deleted( + sagactx: NexusActionContext, +) -> Result<(), ActionError> { + let osagactx = sagactx.user_data(); + let Params { ref serialized_authn, instance_id, vmm_id, .. } = + sagactx.saga_params::()?; + + let opctx = + crate::context::op_context_for_saga_action(&sagactx, serialized_authn); + + info!( + osagactx.log(), + "instance update (VMM destroyed): marking VMM record deleted"; + "instance_id" => %instance_id, + "propolis_id" => %vmm_id, + ); + + osagactx + .datastore() + .vmm_mark_deleted(&opctx, &vmm_id) + .await + .map(|_| ()) + .map_err(ActionError::action_failed) +} diff --git a/nexus/src/app/sagas/instance_update/mod.rs b/nexus/src/app/sagas/instance_update/mod.rs new file mode 100644 index 0000000000..71abe63bbd --- /dev/null +++ b/nexus/src/app/sagas/instance_update/mod.rs @@ -0,0 +1,2778 @@ +// This Source Code Form is subject to the terms of the Mozilla Public +// License, v. 2.0. If a copy of the MPL was not distributed with this +// file, You can obtain one at https://mozilla.org/MPL/2.0/. + +//! Instance Update Saga +//! +//! ## Background +//! +//! The state of a VM instance, as understood by Nexus, consists of a +//! combination of database tables: +//! +//! - The `instance` table, owned exclusively by Nexus itself, represents the +//! user-facing "logical" VM instance. +//! - The `vmm` table, which represents a "physical" Propolis VMM process on +//! which a running instance is incarnated. +//! - The `migration` table, which represents the state of an in-progress live +//! migration of an instance between two VMMs. +//! +//! When an instance is incarnated on a sled, the `propolis_id` field in an +//! `instance` record contains a UUID foreign key into the `vmm` table that +//! points to the `vmm` record for the Propolis process on which the instance is +//! currently running. If an instance is undergoing live migration, its record +//! additionally contains a `dst_propolis_id` foreign key pointing at the `vmm` +//! row representing the *target* Propolis process that it is migrating to, and +//! a `migration_id` foreign key into the `migration` table record tracking the +//! state of that migration. +//! +//! Sled-agents report the state of the VMMs they manage to Nexus. This occurs +//! when a VMM state transition occurs and the sled-agent *pushes* an update to +//! Nexus' `cpapi_instances_put` internal API endpoint, when a Nexus' +//! `instance-watcher` background task *pulls* instance states from sled-agents +//! periodically, or as the return value of an API call from Nexus to a +//! sled-agent. When a Nexus receives a new [`SledInstanceState`] from a +//! sled-agent through any of these mechanisms, the Nexus will write any changed +//! state to the `vmm` and/or `migration` tables directly on behalf of the +//! sled-agent. +//! +//! Although Nexus is technically the party responsible for the database query +//! that writes VMM and migration state updates received from sled-agent, it is +//! the sled-agent that *logically* "owns" these records. A row in the `vmm` +//! table represents a particular Propolis process running on a particular sled, +//! and that sled's sled-agent is the sole source of truth for that process. The +//! generation number for a `vmm` record is the property of the sled-agent +//! responsible for that VMM. Similarly, a `migration` record has separate +//! generation numbers for the source and target VMM states, which are owned by +//! the sled-agents responsible for the source and target Propolis processes, +//! respectively. If a sled-agent pushes a state update to a particular Nexus +//! instance and that Nexus fails to write the state to the database, that isn't +//! the end of the world: the sled-agent can simply retry with a different +//! Nexus, and the generation number, which is incremented exclusively by the +//! sled-agent, ensures that state changes are idempotent and ordered. If a +//! faulty Nexus were to return 200 OK to a sled-agent's call to +//! `cpapi_instances_put` but choose to simply eat the received instance state +//! update rather than writing it to the database, even *that* wouldn't +//! necessarily mean that the state change was gone forever: the +//! `instance-watcher` background task on another Nexus instance would +//! eventually poll the sled-agent's state and observe any changes that were +//! accidentally missed. This is all very neat and tidy, and we should feel +//! proud of ourselves for having designed such a nice little system. +//! +//! Unfortunately, when we look beyond the `vmm` and `migration` tables, things +//! rapidly become interesting (in the "may you live in interesting times" +//! sense). The `instance` record *cannot* be owned exclusively by anyone. The +//! logical instance state it represents is a gestalt that may consist of state +//! that exists in multiple VMM processes on multiple sleds, as well as +//! control-plane operations triggered by operator inputs and performed by +//! multiple Nexus instances. This is, as they say, "hairy". The neat and tidy +//! little state updates published by sled-agents to Nexus in the previous +//! paragraph may, in some cases, represent a state transition that also +//! requires changes to the `instance` table: for instance, a live migration may +//! have completed, necessitating a change in the instance's `propolis_id` to +//! point to the new VMM. +//! +//! Oh, and one other thing: the `instance` table record in turn logically +//! "owns" other resources, such as the virtual-provisioning counters that +//! represent rack-level resources allocated to the instance, and the instance's +//! network configuration. When the instance's state changes, these resources +//! owned by the instance may also need to be updated, such as changing the +//! network configuration to point at an instance's new home after a successful +//! migration, or deallocating virtual provisioning counters when an instance is +//! destroyed. Naturally, these updates must also be performed reliably and +//! inconsistent states must be avoided. +//! +//! Thus, we arrive here, at the instance-update saga. +//! +//! ## Theory of Operation +//! +//! In order to ensure that changes to the state of an instance are handled +//! reliably, we require that all multi-stage operations on an instance --- +//! i.e., operations which cannot be done atomically in a single database query +//! --- on an instance are performed by a saga. The following sagas currently +//! touch the `instance` record: +//! +//! - [`instance_start`] +//! - [`instance_migrate`] +//! - [`instance_delete`] +//! - `instance_update` (this saga) +//! +//! For most of these sagas, the instance state machine itself guards against +//! potential race conditions. By considering the valid and invalid flows +//! through an instance's state machine, we arrive at some ground rules: +//! +//! - The `instance_migrate` and `instance_delete` sagas will only modify the +//! instance record if the instance *has* an active Propolis ID. +//! - The `instance_start` and instance_delete` sagas will only modify the +//! instance record if the instance does *not* have an active VMM. +//! - The presence of a migration ID prevents an `instance_migrate` saga from +//! succeeding until the current migration is resolved (either completes or +//! fails). +//! - Only the `instance_start` saga can set the instance's *active* Propolis +//! ID, and it can only do this if there is currently no active Propolis. +//! - Only the `instance_migrate` saga can set the instance's *target* Propolis +//! ID and migration ID, and it can only do that if these fields are unset, or +//! were left behind by a failed `instance_migrate` saga unwinding. +//! - Only the `instance_update` saga can unset a migration ID and target +//! Propolis ID, which it will do when handling an update from sled-agent that +//! indicates that a migration has succeeded or failed. +//! - Only the `instance_update` saga can unset an instance's active Propolis +//! ID, which it will do when handling an update from sled-agent that +//! indicates that the VMM has been destroyed (peacefully or violently). +//! +//! For the most part, this state machine prevents race conditions where +//! multiple sagas mutate the same fields in the instance record, because the +//! states from which a particular transition may start limited. However, this +//! is not the case for the `instance-update` saga, which may need to run any +//! time a sled-agent publishes a new instance state. Therefore, this saga +//! ensures mutual exclusion using one of the only distributed locking schemes +//! in Omicron: the "instance updater lock". +//! +//! ### The Instance-Updater Lock, or, "Distributed RAII" +//! +//! Distributed locks [are scary][dist-locking]. One of the *scariest* things +//! about distributed locks is that a process can die[^1] while holding a lock, +//! which results in the protected resource (in this case, the `instance` +//! record) being locked forever.[^2] It would be good for that to not happen. +//! Fortunately, *if* (and only if) we promise to *only* ever acquire the +//! instance-updater lock inside of a saga, we can guarantee forward progress: +//! should a saga fail while holding the lock, it will unwind into a reverse +//! action that releases the lock. This is essentially the distributed +//! equivalent to holding a RAII guard in a Rust program: if the thread holding +//! the lock panics, it unwinds its stack, drops the [`std::sync::MutexGuard`], +//! and the rest of the system is not left in a deadlocked state. As long as we +//! ensure that the instance-updater lock is only ever acquired by sagas, and +//! that any saga holding a lock will reliably release it when it unwinds, we're +//! ... *probably* ... okay. +//! +//! When an `instance-update` saga is started, it attempts to [acquire the +//! updater lock][instance_updater_lock]. If the lock is already held by another +//! update saga, then the update saga completes immediately. Otherwise, the saga +//! then queries CRDB for the current state of the `instance` record, the active +//! and migration-target `vmm` records (if any exist), and the current +//! `migration` record (if one exists). This snapshot represents the state from +//! which the update will be applied, and must be read only after locking the +//! instance to ensure that it cannot race with another saga. +//! +//! This is where another of this saga's weird quirks shows up: the shape of the +//! saga DAG we wish to execute depends on this instance, active VMM, target +//! VMM, and migration. But, because the precondition for the saga state may +//! only be read once the lock is acquired, and --- as we discussed above --- +//! the instance-updater lock may only ever be acquired within a saga, we arrive +//! at a bit of a weird impasse: we can't determine what saga DAG to build +//! without looking at the initial state, but we can't load the state until +//! we've already started a saga. To solve this, we've split this saga into two +//! pieces: the first, `start-instance-update`, is a very small saga that just +//! tries to lock the instance, and upon doing so, loads the instance state from +//! the database and prepares and executes the "real" instance update saga. Once +//! the "real" saga starts, it "inherits" the lock from the start saga by +//! performing [the SQL equivalent equivalent of a compare-and-swap +//! operation][instance_updater_inherit_lock] with its own UUID. +//! +//! The DAG for the "real" update saga depends on the state read within the +//! lock, and since the lock was never released, that state remains valid for +//! its execution. As the final action of the update saga, the instance record's +//! new runtime state is written back to the database and the lock is released, +//! in a [single atomic operation][instance_updater_unlock]. Should the update +//! saga fail, it will release the inherited lock. And, if the unwinding update +//! saga unwinds into the start saga, that's fine, because a double-unlock is +//! prevented by the saga ID having changed in the "inherit lock" operation. +//! +//! ### Interaction With Other Sagas +//! +//! The instance-updater lock only provides mutual exclusion with regards to +//! *other `instance-update` sagas*. It does *not* prevent modifications to the +//! instance record by other sagas, such as `instance-start`, +//! `instance-migrate`, and `instance-delete`. Instead, mutual exclusion between +//! the `instance-update` saga and `instance-start` and `instance-delete` sagas +//! is ensured by the actual state of the instance record, as discussed above: +//! start and delete sagas can be started only when the instance has no active +//! VMM, and the `instance-update` saga will only run when an instance *does* +//! have an active VMM that has transitioned to a state where it must be +//! unlinked from the instance. The update saga unlinks the VMM from the +//! instance record as its last action, which allows the instance to be a valid +//! target for a start or delete saga. +//! +//! On the other hand, an `instance-migrate` saga can, potentially, mutate the +//! instance record while an update saga is running, if it attempts to start a +//! migration while an update is still being processed. If the migrate saga +//! starts during an update and completes before the update saga, the update +//! saga writing back an updated instance state to the instance record could +//! result in an [ABA problem]-like issue, where the changes made by the migrate +//! saga are clobbered by the update saga. These issues are instead guarded +//! against by the instance record's state generation number: the update saga +//! determines the generation for the updated instance record by incrementing +//! the generation number observed when the initial state for the update is +//! read. The query that writes back the instance's runtime state fails if the +//! generation number has changed since the state was read at the beginning of +//! the saga, which causes the saga to unwind. An unwinding saga activates the +//! `instance-updater` background task, which may in turn start a new saga if +//! the instance's current state still requires an update. +//! +//! To avoid unnecessarily changing an instance's state generation and +//! invalidating in-progress update sagas, unwinding `instance-start` and +//! `instance-migrate` sagas don't remove the VMMs and migrations they create +//! from the instance's `propolis_id`, `target_propolis_id`, and `migration_id` +//! fields. Instead, they transition the `vmm` records to +//! [`VmmState::SagaUnwound`], which is treated as equivalent to having no VMM +//! in that position by other instances of those sagas. +//! +//! ### Avoiding Missed Updates, or, "The `InstanceRuntimeState` Will Always Get Through" +//! +//! The lock operation we've described above is really more of a "try-lock" +//! operation: if the lock is already held, the saga trying to acquire it just +//! ends immediately, rather than waiting for the lock to be released. This begs +//! the question, "what happens if an instance update comes in while the lock is +//! held?" Do we just...leave it on the floor? Wasn't the whole point of this +//! Rube Goldberg mechanism of sagas to *prevent* instance state changes from +//! being missed? +//! +//! We solve this using an ~~even more layers of complexity~~defense-in-depth +//! approach. Together, a number of mechanisms exist to ensure that (a) an +//! instance whose VMM and migration states require an update saga will always +//! have an update saga run eventually, and (b) update sagas are run in as +//! timely a manner as possible. +//! +//! The first of these ~~layers of nonsense~~redundant systems to prevent missed +//! updates is perhaps the simplest one: _avoiding unnecessary update sagas_. +//! The `cpapi_instances_put` API endpoint and instance-watcher background tasks +//! handle changes to VMM and migration states by calling the +//! [`notify_instance_updated`] method, which writes the new states to the +//! database and (potentially) starts an update saga. Naively, this method would +//! *always* start an update saga, but remember that --- as we discussed +//! [above](#background) --- many VMM/migration state changes don't actually +//! require modifying the instance record. For example, if an instance's VMM +//! transitions from [`VmmState::Starting`] to [`VmmState::Running`], that +//! changes the instance's externally-visible effective state, but it does *not* +//! require an instance record update. By not starting an update saga unless one +//! is actually required, we reduce updater lock contention, so that the lock is +//! less likely to be held when VMM and migration states that actually *do* +//! require an update saga are published. The [`update_saga_needed`] function in +//! this module contains the logic for determining whether an update saga is +//! required. +//! +//! The second mechanism for ensuring updates are performed in a timely manner +//! is what I'm calling _saga chaining_. When the final action in an +//! instance-update saga writes back the instance record and releases the +//! updater lock, it will then perform a second query to read the instance, VMM, +//! and migration records. If the current state of the instance indicates that +//! another update saga is needed, then the completing saga will execute a new +//! start saga as its final action. +//! +//! The last line of defense is the `instance-updater` background task. This +//! task periodically queries the database to list instances which require +//! update sagas (either their active VMM is `Destroyed` or their active +//! migration has terminated) and are not currently locked by another update +//! saga. A new update saga is started for any such instances found. Because +//! this task runs periodically, it ensures that eventually, an update saga will +//! be started for any instance that requires one.[^3] +//! +//! The background task ensures that sagas are started eventually, but because +//! it only runs occasionally, update sagas started by it may be somewhat +//! delayed. To improve the timeliness of update sagas, we will also explicitly +//! activate the background task at any point where we know that an update saga +//! *should* run but we were not able to run it. If an update saga cannot be +//! started, whether by [`notify_instance_updated`], a `start-instance-update` +//! saga attempting to start its real saga, or an `instance-update` saga +//! chaining into a new one as its last action, the `instance-watcher` +//! background task is activated. Similarly, when a `start-instance-update` saga +//! fails to acquire the lock and exits, it activates the background task as +//! well. This ensures that we will attempt the update again. +//! +//! ### On Unwinding +//! +//! Typically, when a Nexus saga unwinds, each node's reverse action undoes any +//! changes made by the forward action. The `instance-update` saga, however, is +//! a bit different: most of its nodes don't have reverse actions that undo the +//! action they performed. This is because, unlike `instance-start`, +//! `instance-migrate`, or `instance-delete`, the instance-update saga is +//! **not** attempting to perform a state change for the instance that was +//! requested by a user. Instead, it is attempting to update the +//! database and networking configuration *to match a state change that has +//! already occurred.* +//! +//! Consider the following: if we run an `instance-start` saga, and the instance +//! cannot actually be started, of course we would want the unwinding saga to +//! undo any database changes it has made, because the instance was not actually +//! started. Failing to undo those changes when an `instance-start` saga unwinds +//! would mean the database is left in a state that does not reflect reality, as +//! the instance was not actually started. On the other hand, suppose an +//! instance's active VMM shuts down and we start an `instance-update` saga to +//! move it to the `Destroyed` state. Even if some action along the way fails, the +//! instance is still `Destroyed``; that state transition has *already happened* +//! on the sled, and unwinding the update saga cannot and should not un-destroy +//! the VMM. +//! +//! So, unlike other sagas, we want to leave basically anything we've +//! successfully done in place when unwinding, because even if the update is +//! incomplete, we have still brought Nexus' understanding of the instance +//! *closer* to reality. If there was something we weren't able to do, one of +//! the instance-update-related RPWs[^rpws] will start a new update saga to try +//! it again. Because saga actions are idempotent, attempting to do something +//! that was already successfully performed a second time isn't a problem, and +//! we don't need to undo it. +//! +//! The one exception to this is, as [discussed +//! above](#the-instance-updater-lock-or-distributed-raii), unwinding instance +//! update sagas MUST always release the instance-updater lock, so that a +//! subsequent saga can update the instance. Thus, the saga actions which lock +//! the instance have reverse actions that release the updater lock. +//! +//! [`instance_start`]: super::instance_start +//! [`instance_migrate`]: super::instance_migrate +//! [`instance_delete`]: super::instance_delete +//! [instance_updater_lock]: +//! crate::app::db::datastore::DataStore::instance_updater_lock +//! [instance_updater_inherit_lock]: +//! crate::app::db::datastore::DataStore::instance_updater_inherit_lock +//! [instance_updater_unlock]: +//! crate::app::db::datastore::DataStore::instance_updater_unlock +//! [`notify_instance_updated`]: crate::app::Nexus::notify_instance_updated +//! +//! [dist-locking]: +//! https://martin.kleppmann.com/2016/02/08/how-to-do-distributed-locking.html +//! [ABA problem]: https://en.wikipedia.org/wiki/ABA_problem +//! +//! [^1]: And, if a process *can* die, well...we can assume it *will*. +//! [^2]: Barring human intervention. +//! [^3]: Even if the Nexus instance that processed the state update died +//! between when it wrote the state to CRDB and when it started the +//! requisite update saga! +//! [^rpws]: Either the `instance-updater` or `abandoned-vmm-reaper` background +//! tasks, as appropriate. + +use super::{ + ActionRegistry, NexusActionContext, NexusSaga, SagaInitError, + ACTION_GENERATE_ID, +}; +use crate::app::db::datastore::instance; +use crate::app::db::datastore::InstanceGestalt; +use crate::app::db::datastore::VmmStateUpdateResult; +use crate::app::db::lookup::LookupPath; +use crate::app::db::model::ByteCount; +use crate::app::db::model::Generation; +use crate::app::db::model::InstanceRuntimeState; +use crate::app::db::model::InstanceState; +use crate::app::db::model::MigrationState; +use crate::app::db::model::Vmm; +use crate::app::db::model::VmmState; +use crate::app::sagas::declare_saga_actions; +use anyhow::Context; +use chrono::Utc; +use nexus_db_queries::{authn, authz}; +use nexus_types::identity::Resource; +use omicron_common::api::external::Error; +use omicron_common::api::internal::nexus; +use omicron_common::api::internal::nexus::SledInstanceState; +use omicron_uuid_kinds::GenericUuid; +use omicron_uuid_kinds::InstanceUuid; +use omicron_uuid_kinds::PropolisUuid; +use serde::{Deserialize, Serialize}; +use steno::{ActionError, DagBuilder, Node}; +use uuid::Uuid; + +// The public interface to this saga is actually a smaller saga that starts the +// "real" update saga, which inherits the lock from the start saga. This is +// because the decision of which subsaga(s) to run depends on the state of the +// instance record read from the database *once the lock has been acquired*, +// and the saga DAG for the "real" instance update saga may be constructed only +// after the instance state has been fetched. However, since the the instance +// state must be read inside the lock, that *also* needs to happen in a saga, +// so that the lock is always dropped when unwinding. Thus, we have a second, +// smaller saga which starts our real saga, and then the real saga, which +// decides what DAG to build based on the instance fetched by the start saga. +// +// Don't worry, this won't be on the test. +mod start; +pub(crate) use self::start::{Params, SagaInstanceUpdate}; + +mod destroyed; + +/// Returns `true` if an `instance-update` saga should be executed as a result +/// of writing the provided [`SledInstanceState`] to the database with the +/// provided [`VmmStateUpdateResult`]. +/// +/// We determine this only after actually updating the database records, +/// because we don't know whether a particular VMM or migration state is +/// *new* or not until we know whether the corresponding database record has +/// actually changed (determined by the generation number). For example, when +/// an instance has migrated into a Propolis process, Propolis will continue +/// to report the migration in in the `Completed` state as part of all state +/// updates regarding that instance, but we no longer need to act on it if +/// the migration record has already been updated to reflect that the +/// migration has completed. +/// +/// Once we know what rows have been updated, we can inspect the states +/// written to the DB to determine whether an instance-update saga is +/// required to bring the instance record's state in line with the new +/// VMM/migration states. +pub fn update_saga_needed( + log: &slog::Logger, + instance_id: InstanceUuid, + state: &SledInstanceState, + result: &VmmStateUpdateResult, +) -> bool { + // Currently, an instance-update saga is required if (and only if): + // + // - The instance's active VMM has transitioned to `Destroyed`. We don't + // actually know whether the VMM whose state was updated here was the + // active VMM or not, so we will always attempt to run an instance-update + // saga if the VMM was `Destroyed`. + let vmm_needs_update = result.vmm_updated + && state.vmm_state.state == nexus::VmmState::Destroyed; + // - A migration in to this VMM has transitioned to a terminal state + // (`Failed` or `Completed`). + let migrations = state.migrations(); + let migration_in_needs_update = result.migration_in_updated + && migrations + .migration_in + .map(|migration| migration.state.is_terminal()) + .unwrap_or(false); + // - A migration out from this VMM has transitioned to a terminal state + // (`Failed` or `Completed`). + let migration_out_needs_update = result.migration_out_updated + && migrations + .migration_out + .map(|migration| migration.state.is_terminal()) + .unwrap_or(false); + // If any of the above conditions are true, prepare an instance-update saga + // for this instance. + let needed = vmm_needs_update + || migration_in_needs_update + || migration_out_needs_update; + if needed { + debug!(log, + "new VMM runtime state from sled agent requires an \ + instance-update saga"; + "instance_id" => %instance_id, + "propolis_id" => %state.propolis_id, + "vmm_needs_update" => vmm_needs_update, + "migration_in_needs_update" => migration_in_needs_update, + "migration_out_needs_update" => migration_out_needs_update, + ); + } + needed +} + +/// The set of updates to the instance and its owned resources to perform in +/// response to a VMM/migration state update. +/// +/// Depending on the current state of the instance and its VMM(s) and migration, +/// an update saga may perform a variety of operations. Which operations need to +/// be performed for the current state snapshot of the instance, VMM, and +/// migration records is determined by the [`UpdatesRequired::for_instance`] +/// function. +#[derive(Debug, Deserialize, Serialize)] +struct UpdatesRequired { + /// The new runtime state that must be written back to the database when the + /// saga completes. + new_runtime: InstanceRuntimeState, + + /// If this is [`Some`], the instance's active VMM with this UUID has + /// transitioned to [`VmmState::Destroyed`], and its resources must be + /// cleaned up by a [`destroyed`] subsaga. + destroy_active_vmm: Option, + + /// If this is [`Some`], the instance's migration target VMM with this UUID + /// has transitioned to [`VmmState::Destroyed`], and its resources must be + /// cleaned up by a [`destroyed`] subsaga. + destroy_target_vmm: Option, + + /// If this is [`Some`], the instance no longer has an active VMM, and its + /// virtual provisioning resource records and Oximeter producer should be + /// deallocated. + deprovision: Option, + + /// If this is [`Some`], then a network configuration update must be + /// performed: either updating NAT configuration and V2P mappings when the + /// instance has moved to a new sled, or deleting them if it is no longer + /// incarnated. + network_config: Option, +} + +#[derive(Debug, Deserialize, Serialize)] +enum NetworkConfigUpdate { + Delete, + Update { active_propolis_id: PropolisUuid, new_sled_id: Uuid }, +} + +/// Virtual provisioning counters to release when an instance no longer has a +/// VMM. +#[derive(Debug, Deserialize, Serialize)] +struct Deprovision { + project_id: Uuid, + cpus_diff: i64, + ram_diff: ByteCount, +} + +impl UpdatesRequired { + fn for_instance( + log: &slog::Logger, + snapshot: &InstanceGestalt, + ) -> Option { + let mut new_runtime = snapshot.instance.runtime().clone(); + new_runtime.gen = Generation(new_runtime.gen.next()); + new_runtime.time_updated = Utc::now(); + let instance_id = snapshot.instance.id(); + + let mut update_required = false; + let mut network_config = None; + + // Has the active VMM been destroyed? + let destroy_active_vmm = + snapshot.active_vmm.as_ref().and_then(|active_vmm| { + if active_vmm.runtime.state == VmmState::Destroyed { + let id = PropolisUuid::from_untyped_uuid(active_vmm.id); + // Unlink the active VMM ID. If the active VMM was destroyed + // because a migration out completed, the next block, which + // handles migration updates, will set this to the new VMM's ID, + // instead. + new_runtime.propolis_id = None; + update_required = true; + Some(id) + } else { + None + } + }); + + // Okay, what about the target? + let destroy_target_vmm = + snapshot.target_vmm.as_ref().and_then(|target_vmm| { + if target_vmm.runtime.state == VmmState::Destroyed { + // Unlink the target VMM ID. + new_runtime.dst_propolis_id = None; + update_required = true; + Some(PropolisUuid::from_untyped_uuid(target_vmm.id)) + } else { + None + } + }); + + // If there's an active migration, determine how to update the instance + // record to reflect the current migration state. + if let Some(ref migration) = snapshot.migration { + if migration.either_side_failed() { + // If the migration has failed, clear the instance record's + // migration IDs so that a new migration can begin. + info!( + log, + "instance update (migration failed): clearing migration IDs"; + "instance_id" => %instance_id, + "migration_id" => %migration.id, + "src_propolis_id" => %migration.source_propolis_id, + "target_propolis_id" => %migration.target_propolis_id, + ); + new_runtime.migration_id = None; + new_runtime.dst_propolis_id = None; + update_required = true; + // If the active VMM was destroyed, the network config must be + // deleted (which was determined above). Otherwise, if the + // migration failed but the active VMM was still there, we must + // still ensure the correct networking configuration + // exists for its current home. + // + // TODO(#3107) This is necessary even if the instance didn't move, + // because registering a migration target on a sled creates OPTE ports + // for its VNICs, and that creates new V2P mappings on that sled that + // place the relevant virtual IPs on the local sled. Once OPTE stops + // creating these mappings, this path only needs to be taken if an + // instance has changed sleds. + if destroy_active_vmm.is_none() { + if let Some(ref active_vmm) = snapshot.active_vmm { + info!( + log, + "instance update (migration failed): pointing network \ + config back at current VMM"; + "instance_id" => %instance_id, + "migration_id" => %migration.id, + "src_propolis_id" => %migration.source_propolis_id, + "target_propolis_id" => %migration.target_propolis_id, + ); + network_config = + Some(NetworkConfigUpdate::to_vmm(active_vmm)); + } else { + // Otherwise, the active VMM has already been destroyed, + // and the target is reporting a failure because of + // that. Just delete the network config. + } + } + } else if migration.either_side_completed() { + // If either side reports that the migration has completed, set + // the instance record's active Propolis ID to point at the new + // VMM, and update the network configuration to point at that VMM. + if new_runtime.propolis_id != Some(migration.target_propolis_id) + { + info!( + log, + "instance update (migration completed): setting active \ + VMM ID to target and updating network config"; + "instance_id" => %instance_id, + "migration_id" => %migration.id, + "src_propolis_id" => %migration.source_propolis_id, + "target_propolis_id" => %migration.target_propolis_id, + ); + let new_vmm = snapshot.target_vmm.as_ref().expect( + "if we have gotten here, there must be a target VMM", + ); + debug_assert_eq!(new_vmm.id, migration.target_propolis_id); + new_runtime.propolis_id = + Some(migration.target_propolis_id); + network_config = Some(NetworkConfigUpdate::to_vmm(new_vmm)); + update_required = true; + } + + // Welp, the migration has succeeded, but the target Propolis + // has also gone away. This is functionally equivalent to having + // the active VMM go to `Destroyed`, so now we have no active + // VMM anymore. + if destroy_target_vmm.is_some() { + info!( + log, + "instance update (migration completed): target VMM \ + has gone away, destroying it!"; + "instance_id" => %instance_id, + "migration_id" => %migration.id, + "src_propolis_id" => %migration.source_propolis_id, + "target_propolis_id" => %migration.target_propolis_id, + ); + new_runtime.propolis_id = None; + update_required = true; + } + + // If the target reports that the migration has completed, + // unlink the migration (allowing a new one to begin). This has + // to wait until the target has reported completion to ensure a + // migration out of the target can't start until the migration + // in has definitely finished. + if migration.target_state == MigrationState::COMPLETED { + info!( + log, + "instance update (migration target completed): \ + clearing migration IDs"; + "instance_id" => %instance_id, + "migration_id" => %migration.id, + "src_propolis_id" => %migration.source_propolis_id, + "target_propolis_id" => %migration.target_propolis_id, + ); + new_runtime.migration_id = None; + new_runtime.dst_propolis_id = None; + update_required = true; + } + } + } + + // If the *new* state no longer has a `propolis_id` field, that means + // that the active VMM was destroyed without a successful migration out + // (or, we migrated out to a target VMM that was immediately destroyed, + // which could happen if a running VM shut down immediately after + // migrating). In that case, the instance is no longer incarnated on a + // sled, and we must update the state of the world to reflect that. + let deprovision = if new_runtime.propolis_id.is_none() { + // N.B. that this does *not* set `update_required`, because + // `new_runtime.propolis_id` might be `None` just because there was, + // already, no VMM there. `update_required` gets set above if there + // was any actual state change. + + // We no longer have a VMM. + new_runtime.nexus_state = InstanceState::NoVmm; + // If the active VMM was destroyed and the instance has not migrated + // out of it, we must delete the instance's network configuration. + // + // This clobbers a previously-set network config update to a new + // VMM, because if we set one above, we must have subsequently + // discovered that there actually *is* no new VMM anymore! + network_config = Some(NetworkConfigUpdate::Delete); + // The instance's virtual provisioning records must be deallocated, + // as it is no longer consuming any virtual resources. Providing a + // set of virtual provisioning counters to deallocate also indicates + // that the instance's oximeter producer should be cleaned up. + Some(Deprovision { + project_id: snapshot.instance.project_id, + cpus_diff: i64::from(snapshot.instance.ncpus.0 .0), + ram_diff: snapshot.instance.memory, + }) + } else { + None + }; + + if !update_required { + return None; + } + + Some(Self { + new_runtime, + destroy_active_vmm, + destroy_target_vmm, + deprovision, + network_config, + }) + } +} + +impl NetworkConfigUpdate { + fn to_vmm(vmm: &Vmm) -> Self { + Self::Update { + active_propolis_id: PropolisUuid::from_untyped_uuid(vmm.id), + new_sled_id: vmm.sled_id, + } + } +} + +/// Parameters to the "real" instance update saga. +#[derive(Debug, Deserialize, Serialize)] +struct RealParams { + serialized_authn: authn::saga::Serialized, + + authz_instance: authz::Instance, + + update: UpdatesRequired, + + orig_lock: instance::UpdaterLock, +} + +const INSTANCE_LOCK_ID: &str = "saga_instance_lock_id"; +const INSTANCE_LOCK: &str = "updater_lock"; +const NETWORK_CONFIG_UPDATE: &str = "network_config_update"; + +// instance update saga: actions + +declare_saga_actions! { + instance_update; + + // Become the instance updater. + // + // This action inherits the instance-updater lock from the + // `start-instance-update` saga, which attempts to compare-and-swap in a new + // saga UUID. This ensuring that only one child update saga is + // actually allowed to proceed, even if the `start-instance-update` saga's + // "fetch_instance_and_start_real_saga" executes multiple times, avoiding + // duplicate work. + // + // Unwinding this action releases the updater lock. In addition, it + // activates the `instance-updater` background task to ensure that a new + // update saga is started in a timely manner, to perform the work that the + // unwinding saga was *supposed* to do. Since this action only succeeds if + // the lock was acquired, and this saga is only started if updates are + // required, having this action activate the background task when unwinding + // avoids unneeded activations when a saga fails just because it couldn't + // get the lock. + BECOME_UPDATER -> "updater_lock" { + + siu_become_updater + - siu_unbecome_updater + } + + // Update network configuration. + UPDATE_NETWORK_CONFIG -> "update_network_config" { + + siu_update_network_config + } + + // Deallocate virtual provisioning resources reserved by the instance, as it + // is no longer running. + RELEASE_VIRTUAL_PROVISIONING -> "release_virtual_provisioning" { + + siu_release_virtual_provisioning + } + + // Unassign the instance's Oximeter producer. + UNASSIGN_OXIMETER_PRODUCER -> "unassign_oximeter_producer" { + + siu_unassign_oximeter_producer + } + + // Write back the new instance record, releasing the instance updater lock, + // and re-fetch the VMM and migration states. If they have changed in a way + // that requires an additional update saga, attempt to execute an additional + // update saga immediately. + // + // Writing back the updated instance runtime state is conditional on both + // the instance updater lock *and* the instance record's state generation + // number. If the state generation has advanced since this update saga + // began, writing the new runtime state will fail, as the update was + // performed based on an initial state that is no longer current. In that + // case, this action will fail, causing the saga to unwind, release the + // updater lock, and activate the `instance-updater` background task to + // schedule new update saga if one is still required. + COMMIT_INSTANCE_UPDATES -> "commit_instance_updates" { + + siu_commit_instance_updates + } + +} + +// instance update saga: definition +struct SagaDoActualInstanceUpdate; + +impl NexusSaga for SagaDoActualInstanceUpdate { + const NAME: &'static str = "instance-update"; + type Params = RealParams; + + fn register_actions(registry: &mut ActionRegistry) { + instance_update_register_actions(registry); + } + + fn make_saga_dag( + params: &Self::Params, + mut builder: DagBuilder, + ) -> Result { + // Helper function for constructing a constant node. + fn const_node( + name: impl AsRef, + value: &impl serde::Serialize, + ) -> Result { + let value = serde_json::to_value(value).map_err(|e| { + SagaInitError::SerializeError(name.as_ref().to_string(), e) + })?; + Ok(Node::constant(name, value)) + } + + // Generate a new ID and attempt to inherit the lock from the start saga. + builder.append(Node::action( + INSTANCE_LOCK_ID, + "GenerateInstanceLockId", + ACTION_GENERATE_ID.as_ref(), + )); + builder.append(become_updater_action()); + + // If a network config update is required, do that. + if let Some(ref update) = params.update.network_config { + builder.append(const_node(NETWORK_CONFIG_UPDATE, update)?); + builder.append(update_network_config_action()); + } + + // If the instance now has no active VMM, release its virtual + // provisioning resources and unassign its Oximeter producer. + if params.update.deprovision.is_some() { + builder.append(release_virtual_provisioning_action()); + builder.append(unassign_oximeter_producer_action()); + } + + // Once we've finished mutating everything owned by the instance, we can + // write back the updated state and release the instance lock. + builder.append(commit_instance_updates_action()); + + // If either VMM linked to this instance has been destroyed, append + // subsagas to clean up the VMMs resources and mark them as deleted. + // + // Note that we must not mark the VMMs as deleted until *after* we have + // written back the updated instance record. Otherwise, if we mark a VMM + // as deleted while the instance record still references its ID, we will + // have created a state where the instance record contains a "dangling + // pointer" (database version) where the foreign key points to a record + // that no longer exists. Other consumers of the instance record may be + // unpleasantly surprised by this, so we avoid marking these rows as + // deleted until they've been unlinked from the instance by the + // `update_and_unlock_instance` action. + let mut append_destroyed_vmm_subsaga = + |vmm_id: PropolisUuid, which_vmm: &'static str| { + let params = destroyed::Params { + vmm_id, + instance_id: InstanceUuid::from_untyped_uuid( + params.authz_instance.id(), + ), + serialized_authn: params.serialized_authn.clone(), + }; + let name = format!("destroy_{which_vmm}_vmm"); + + let subsaga = destroyed::SagaDestroyVmm::make_saga_dag( + ¶ms, + DagBuilder::new(steno::SagaName::new(&name)), + )?; + + let params_name = format!("{name}_params"); + builder.append(const_node(¶ms_name, ¶ms)?); + + let output_name = format!("{which_vmm}_vmm_destroyed"); + builder.append(Node::subsaga( + output_name.as_str(), + subsaga, + ¶ms_name, + )); + + Ok::<(), SagaInitError>(()) + }; + + if let Some(vmm_id) = params.update.destroy_active_vmm { + append_destroyed_vmm_subsaga(vmm_id, "active")?; + } + + if let Some(vmm_id) = params.update.destroy_target_vmm { + append_destroyed_vmm_subsaga(vmm_id, "target")?; + } + + Ok(builder.build()?) + } +} + +async fn siu_become_updater( + sagactx: NexusActionContext, +) -> Result { + let RealParams { + ref serialized_authn, ref authz_instance, orig_lock, .. + } = sagactx.saga_params::()?; + let saga_id = sagactx.lookup::(INSTANCE_LOCK_ID)?; + let opctx = + crate::context::op_context_for_saga_action(&sagactx, serialized_authn); + let osagactx = sagactx.user_data(); + let log = osagactx.log(); + + debug!( + log, + "instance update: trying to become instance updater..."; + "instance_id" => %authz_instance.id(), + "saga_id" => %saga_id, + "parent_lock" => ?orig_lock, + ); + + let lock = osagactx + .datastore() + .instance_updater_inherit_lock( + &opctx, + &authz_instance, + orig_lock, + saga_id, + ) + .await + .map_err(ActionError::action_failed)?; + + info!( + log, + "instance_update: Now, I am become Updater, the destroyer of VMMs."; + "instance_id" => %authz_instance.id(), + "saga_id" => %saga_id, + ); + + Ok(lock) +} + +async fn siu_unbecome_updater( + sagactx: NexusActionContext, +) -> Result<(), anyhow::Error> { + let RealParams { ref serialized_authn, ref authz_instance, .. } = + sagactx.saga_params::()?; + let lock = sagactx.lookup::(INSTANCE_LOCK)?; + + unwind_instance_lock(lock, serialized_authn, authz_instance, &sagactx) + .await; + + // Now that we've released the lock, activate the `instance-updater` + // background task to make sure that a new instance update saga is started + // if the instance still needs to be updated. + sagactx + .user_data() + .nexus() + .background_tasks + .task_instance_updater + .activate(); + + Ok(()) +} + +async fn siu_update_network_config( + sagactx: NexusActionContext, +) -> Result<(), ActionError> { + let RealParams { ref serialized_authn, ref authz_instance, .. } = + sagactx.saga_params::()?; + + let update = + sagactx.lookup::(NETWORK_CONFIG_UPDATE)?; + + let opctx = + crate::context::op_context_for_saga_action(&sagactx, serialized_authn); + + let osagactx = sagactx.user_data(); + let nexus = osagactx.nexus(); + let log = osagactx.log(); + + let instance_id = InstanceUuid::from_untyped_uuid(authz_instance.id()); + + match update { + NetworkConfigUpdate::Delete => { + info!( + log, + "instance update: deleting network config"; + "instance_id" => %instance_id, + ); + nexus + .instance_delete_dpd_config(&opctx, authz_instance) + .await + .map_err(ActionError::action_failed)?; + } + NetworkConfigUpdate::Update { active_propolis_id, new_sled_id } => { + info!( + log, + "instance update: ensuring updated instance network config"; + "instance_id" => %instance_id, + "active_propolis_id" => %active_propolis_id, + "new_sled_id" => %new_sled_id, + ); + + let (.., sled) = LookupPath::new(&opctx, osagactx.datastore()) + .sled_id(new_sled_id) + .fetch() + .await + .map_err(ActionError::action_failed)?; + + nexus + .instance_ensure_dpd_config( + &opctx, + instance_id, + &sled.address(), + None, + ) + .await + .map_err(ActionError::action_failed)?; + } + } + + Ok(()) +} + +async fn siu_release_virtual_provisioning( + sagactx: NexusActionContext, +) -> Result<(), ActionError> { + let osagactx = sagactx.user_data(); + let RealParams { + ref serialized_authn, ref authz_instance, ref update, .. + } = sagactx.saga_params::()?; + let Some(Deprovision { project_id, cpus_diff, ram_diff }) = + update.deprovision + else { + return Err(ActionError::action_failed( + "a `siu_release_virtual_provisioning` action should never have \ + been added to the DAG if the update does not contain virtual \ + resources to deprovision" + .to_string(), + )); + }; + let instance_id = InstanceUuid::from_untyped_uuid(authz_instance.id()); + + let log = osagactx.log(); + let opctx = + crate::context::op_context_for_saga_action(&sagactx, serialized_authn); + + let result = osagactx + .datastore() + .virtual_provisioning_collection_delete_instance( + &opctx, + instance_id, + project_id, + cpus_diff, + ram_diff, + ) + .await; + match result { + Ok(deleted) => { + info!( + log, + "instance update (no VMM): deallocated virtual provisioning \ + resources"; + "instance_id" => %instance_id, + "records_deleted" => ?deleted, + ); + } + // Necessary for idempotency --- the virtual provisioning resources may + // have been deleted already, that's fine. + Err(Error::ObjectNotFound { .. }) => { + info!( + log, + "instance update (no VMM): virtual provisioning record not \ + found; perhaps it has already been deleted?"; + "instance_id" => %instance_id, + ); + } + Err(err) => return Err(ActionError::action_failed(err)), + }; + + Ok(()) +} + +async fn siu_unassign_oximeter_producer( + sagactx: NexusActionContext, +) -> Result<(), ActionError> { + let osagactx = sagactx.user_data(); + let RealParams { ref serialized_authn, ref authz_instance, .. } = + sagactx.saga_params::()?; + + let opctx = + crate::context::op_context_for_saga_action(&sagactx, serialized_authn); + let log = osagactx.log(); + + info!( + log, + "instance update (no VMM): unassigning oximeter producer"; + "instance_id" => %authz_instance.id(), + ); + crate::app::oximeter::unassign_producer( + osagactx.datastore(), + log, + &opctx, + &authz_instance.id(), + ) + .await + .map_err(ActionError::action_failed) +} + +async fn siu_commit_instance_updates( + sagactx: NexusActionContext, +) -> Result<(), ActionError> { + let osagactx = sagactx.user_data(); + let RealParams { serialized_authn, authz_instance, ref update, .. } = + sagactx.saga_params::()?; + let lock = sagactx.lookup::(INSTANCE_LOCK)?; + + let opctx = + crate::context::op_context_for_saga_action(&sagactx, &serialized_authn); + let log = osagactx.log(); + let nexus = osagactx.nexus(); + + let instance_id = authz_instance.id(); + + debug!( + log, + "instance update: committing new runtime state and unlocking..."; + "instance_id" => %instance_id, + "new_runtime" => ?update.new_runtime, + "lock" => ?lock, + ); + + let did_unlock = osagactx + .datastore() + .instance_commit_update( + &opctx, + &authz_instance, + &lock, + &update.new_runtime, + ) + .await + .map_err(ActionError::action_failed)?; + + info!( + log, + "instance update: committed update new runtime state!"; + "instance_id" => %instance_id, + "new_runtime" => ?update.new_runtime, + "did_unlock" => ?did_unlock, + ); + + if update.network_config.is_some() { + // If the update we performed changed networking configuration, activate + // the V2P manager and VPC router RPWs, to ensure that the V2P mapping + // and VPC for this instance are up to date. + // + // We do this here, rather than in the network config update action, so + // that the instance's state in the database reflects the new rather + // than the old state. Otherwise, if the networking RPW ran *before* + // writing the new state to CRDB, it will run with the old VMM, rather + // than the new one, and probably do nothing. Then, the networking + // config update would be delayed until the *next* background task + // activation. This way, we ensure that the RPW runs *after* we are in + // the new state. + + nexus.background_tasks.task_v2p_manager.activate(); + nexus.vpc_needed_notify_sleds(); + } + + // Check if the VMM or migration state has changed while the update saga was + // running and whether an additional update saga is now required. If one is + // required, try to start it. + // + // TODO(eliza): it would be nice if we didn't release the lock, determine + // the needed updates, and then start a new start-instance-update saga that + // re-locks the instance --- instead, perhaps we could keep the lock, and + // try to start a new "actual" instance update saga that inherits our lock. + // This way, we could also avoid computing updates required twice. + // But, I'm a bit sketched out by the implications of not committing update + // and dropping the lock in the same operation. This deserves more thought... + if let Err(error) = + chain_update_saga(&sagactx, authz_instance, serialized_authn).await + { + // If starting the new update saga failed, DO NOT unwind this saga and + // undo all the work we've done successfully! Instead, just kick the + // instance-updater background task to try and start a new saga + // eventually, and log a warning. + warn!( + log, + "instance update: failed to start successor saga!"; + "instance_id" => %instance_id, + "error" => %error, + ); + nexus.background_tasks.task_instance_updater.activate(); + } + + Ok(()) +} + +async fn chain_update_saga( + sagactx: &NexusActionContext, + authz_instance: authz::Instance, + serialized_authn: authn::saga::Serialized, +) -> Result<(), anyhow::Error> { + let opctx = + crate::context::op_context_for_saga_action(sagactx, &serialized_authn); + let osagactx = sagactx.user_data(); + let log = osagactx.log(); + + let instance_id = authz_instance.id(); + + // Fetch the state from the database again to see if we should immediately + // run a new saga. + let new_state = osagactx + .datastore() + .instance_fetch_all(&opctx, &authz_instance) + .await + .context("failed to fetch latest snapshot for instance")?; + + if let Some(update) = UpdatesRequired::for_instance(log, &new_state) { + debug!( + log, + "instance update: additional updates required, preparing a \ + successor update saga..."; + "instance_id" => %instance_id, + "update.new_runtime_state" => ?update.new_runtime, + "update.network_config_update" => ?update.network_config, + "update.destroy_active_vmm" => ?update.destroy_active_vmm, + "update.destroy_target_vmm" => ?update.destroy_target_vmm, + "update.deprovision" => ?update.deprovision, + ); + let saga_dag = SagaInstanceUpdate::prepare(&Params { + serialized_authn, + authz_instance, + }) + .context("failed to build new update saga DAG")?; + let saga = osagactx + .nexus() + .sagas + .saga_prepare(saga_dag) + .await + .context("failed to prepare new update saga")?; + saga.start().await.context("failed to start successor update saga")?; + // N.B. that we don't wait for the successor update saga to *complete* + // here. We just want to make sure it starts. + info!( + log, + "instance update: successor update saga started!"; + "instance_id" => %instance_id, + ); + } + + Ok(()) +} + +/// Unlock the instance record while unwinding. +/// +/// This is factored out of the actual reverse action, because the `Params` type +/// differs between the start saga and the actual instance update sagas, both of +/// which must unlock the instance in their reverse actions. +async fn unwind_instance_lock( + lock: instance::UpdaterLock, + serialized_authn: &authn::saga::Serialized, + authz_instance: &authz::Instance, + sagactx: &NexusActionContext, +) { + // /!\ EXTREMELY IMPORTANT WARNING /!\ + // + // This comment is a message, and part of a system of messages. Pay + // attention to it! The message is a warning about danger. + // + // The danger is still present in your time, as it was in ours. The danger + // is to the instance record, and it can deadlock. + // + // When unwinding, unlocking an instance MUST succeed at all costs. This is + // of the upmost importance. It's fine for unlocking an instance in a + // forward action to fail, since the reverse action will still unlock the + // instance when the saga is unwound. However, when unwinding, we must + // ensure the instance is unlocked, no matter what. This is because a + // failure to unlock the instance will leave the instance record in a + // PERMANENTLY LOCKED state, since no other update saga will ever be + // able to lock it again. If we can't unlock the instance here, our death + // will ruin the instance record forever and it will only be able to be + // removed by manual operator intervention. That would be...not great. + // + // Therefore, this action will retry the attempt to unlock the instance + // until it either: + // + // - succeeds, and we know the instance is now unlocked. + // - fails *because the instance doesn't exist*, in which case we can die + // happily because it doesn't matter if the instance is actually unlocked. + use dropshot::HttpError; + use futures::{future, TryFutureExt}; + use omicron_common::backoff; + + let osagactx = sagactx.user_data(); + let log = osagactx.log(); + let instance_id = authz_instance.id(); + let opctx = + crate::context::op_context_for_saga_action(sagactx, &serialized_authn); + + debug!( + log, + "instance update: unlocking instance on unwind"; + "instance_id" => %instance_id, + "lock" => ?lock, + ); + + const WARN_DURATION: std::time::Duration = + std::time::Duration::from_secs(20); + + let did_unlock = backoff::retry_notify_ext( + // This is an internal service query to CockroachDB. + backoff::retry_policy_internal_service(), + || { + osagactx + .datastore() + .instance_updater_unlock(&opctx, authz_instance, &lock) + .or_else(|err| future::ready(match err { + // The instance record was not found. It's probably been + // deleted. That's fine, we can now die happily, since we won't + // be leaving the instance permanently locked. + Error::ObjectNotFound { .. } => { + info!( + log, + "instance update: giving up on unlocking instance, \ + as it no longer exists"; + "instance_id" => %instance_id, + "lock" => ?lock, + ); + + Ok(false) + }, + // All other errors should be retried. + _ => Err(backoff::BackoffError::transient(err)), + })) + }, + |error, call_count, total_duration| { + let http_error = HttpError::from(error.clone()); + if http_error.status_code.is_client_error() { + error!( + log, + "instance update: client error while unlocking instance \ + (likely requires operator intervention), retrying anyway"; + "instance_id" => %instance_id, + "lock" => ?lock, + "error" => &error, + "call_count" => call_count, + "total_duration" => ?total_duration, + ); + } else if total_duration > WARN_DURATION { + warn!( + log, + "instance update: server error while unlocking instance, \ + retrying"; + "instance_id" => %instance_id, + "lock" => ?lock, + "error" => &error, + "call_count" => call_count, + "total_duration" => ?total_duration, + ); + } else { + info!( + log, + "instance update: server error while unlocking instance, \ + retrying"; + "instance_id" => %instance_id, + "lock" => ?lock, + "error" => &error, + "call_count" => call_count, + "total_duration" => ?total_duration, + ); + } + }, + ) + .await + .expect("errors should be retried indefinitely"); + + info!( + log, + "instance update: unlocked instance while unwinding"; + "instance_id" => %instance_id, + "lock" => ?lock, + "did_unlock" => did_unlock, + ); +} + +#[cfg(test)] +mod test { + use super::*; + use crate::app::db::model::Instance; + use crate::app::db::model::VmmRuntimeState; + use crate::app::saga::create_saga_dag; + use crate::app::sagas::test_helpers; + use crate::app::OpContext; + use crate::external_api::params; + use chrono::Utc; + use dropshot::test_util::ClientTestContext; + use nexus_db_queries::db::datastore::InstanceAndActiveVmm; + use nexus_db_queries::db::lookup::LookupPath; + use nexus_test_utils::resource_helpers::{ + create_default_ip_pool, create_project, object_create, + }; + use nexus_test_utils_macros::nexus_test; + use omicron_common::api::internal::nexus::{ + MigrationRuntimeState, MigrationState, Migrations, + }; + use omicron_uuid_kinds::GenericUuid; + use omicron_uuid_kinds::PropolisUuid; + use omicron_uuid_kinds::SledUuid; + use std::sync::Arc; + use std::sync::Mutex; + use uuid::Uuid; + + type ControlPlaneTestContext = + nexus_test_utils::ControlPlaneTestContext; + + const PROJECT_NAME: &str = "test-project"; + const INSTANCE_NAME: &str = "test-instance"; + + // Most Nexus sagas have test suites that follow a simple formula: there's + // usually a `test_saga_basic_usage_succeeds` that just makes sure the saga + // basically works, and then a `test_actions_succeed_idempotently` test that + // does the same thing, but runs every action twice. Then, there's usually a + // `test_action_failures_can_unwind` test, and often also a + // `test_action_failures_can_unwind_idempotently` test. + // + // For the instance-update saga, the test suite is a bit more complicated. + // This saga will do a number of different things depending on the ways in + // which the instance's migration and VMM records have changed since the + // last update. Therefore, we want to test all of the possible branches + // through the saga: + // + // 1. active VMM destroyed + // 2. migration source completed + // 3. migration target completed + // 4. migration source VMM completed and was destroyed, + // 5. migration target failed + // 6. migration source failed + + async fn setup_test_project(client: &ClientTestContext) -> Uuid { + create_default_ip_pool(&client).await; + let project = create_project(&client, PROJECT_NAME).await; + project.identity.id + } + + async fn create_instance( + client: &ClientTestContext, + ) -> omicron_common::api::external::Instance { + use omicron_common::api::external::{ + ByteCount, IdentityMetadataCreateParams, InstanceCpuCount, + }; + let instances_url = format!("/v1/instances?project={}", PROJECT_NAME); + object_create( + client, + &instances_url, + ¶ms::InstanceCreate { + identity: IdentityMetadataCreateParams { + name: INSTANCE_NAME.parse().unwrap(), + description: format!("instance {:?}", INSTANCE_NAME), + }, + ncpus: InstanceCpuCount(1), + memory: ByteCount::from_gibibytes_u32(1), + hostname: INSTANCE_NAME.parse().unwrap(), + user_data: b"#cloud-config".to_vec(), + ssh_public_keys: Some(Vec::new()), + network_interfaces: + params::InstanceNetworkInterfaceAttachment::None, + external_ips: vec![], + disks: vec![], + start: true, + }, + ) + .await + } + + #[track_caller] + fn assert_instance_unlocked(instance: &Instance) { + assert_eq!( + instance.updater_id, None, + "instance updater lock should have been released" + ) + } + + // Asserts that an instance record is in a consistent state (e.g., that all + // state changes performed by the update saga are either applied atomically, + // or have not been applied). This is particularly important to check when a + // saga unwinds. + #[track_caller] + fn assert_instance_record_is_consistent(instance: &Instance) { + let run_state = instance.runtime(); + match run_state.nexus_state { + InstanceState::Vmm => assert!( + run_state.propolis_id.is_some(), + "if the instance record is in the `Vmm` state, it must have \ + an active VMM\ninstance: {instance:#?}", + ), + state => assert_eq!( + run_state.propolis_id, None, + "if the instance record is in the `{state:?}` state, it must \ + not have an active VMM\ninstance: {instance:#?}", + ), + } + + if run_state.dst_propolis_id.is_some() { + assert!( + run_state.migration_id.is_some(), + "if the instance record has a target VMM ID, then it must \ + also have a migration\ninstance: {instance:#?}", + ); + } + + if run_state.migration_id.is_some() { + assert_eq!( + run_state.nexus_state, + InstanceState::Vmm, + "if an instance is migrating, it must be in the VMM state\n\ + instance: {instance:#?}", + ); + } + } + + async fn after_unwinding( + parent_saga_id: Option, + cptestctx: &ControlPlaneTestContext, + ) { + let state = test_helpers::instance_fetch_by_name( + cptestctx, + INSTANCE_NAME, + PROJECT_NAME, + ) + .await; + let instance = state.instance(); + + // Unlike most other sagas, we actually don't unwind the work performed + // by an update saga, as we would prefer that at least some of it + // succeeds. The only thing that *needs* to be rolled back when an + // instance-update saga fails is that the updater lock *MUST* either + // remain locked by the parent start saga, or have been released so that + // a subsequent saga can run. See the section "on unwinding" in the + // documentation comment at the top of the instance-update module for + // details. + if let Some(parent_saga_id) = parent_saga_id { + if let Some(actual_lock_id) = instance.updater_id { + assert_eq!( + actual_lock_id, parent_saga_id, + "if the instance is locked after unwinding, it must be \ + locked by the `start-instance-update` saga, and not the \ + unwinding child saga!" + ); + } + } else { + assert_instance_unlocked(instance); + } + + // Additionally, we assert that the instance record is in a + // consistent state, ensuring that all changes to the instance record + // are atomic. This is important *because* we won't roll back changes + // to the instance: if we're going to leave them in place, they can't + // be partially applied, even if we unwound partway through the saga. + assert_instance_record_is_consistent(instance); + + // Throw away the instance so that subsequent unwinding + // tests also operate on an instance in the correct + // preconditions to actually run the saga path we mean + // to test. + let instance_id = InstanceUuid::from_untyped_uuid(instance.id()); + // Depending on where we got to in the update saga, the + // sled-agent may or may not actually be willing to stop + // the instance, so just manually update the DB record + // into a state where we can delete it to make sure + // everything is cleaned up for the next run. + cptestctx + .server + .server_context() + .nexus + .datastore() + .instance_update_runtime( + &instance_id, + &InstanceRuntimeState { + time_updated: Utc::now(), + gen: Generation(instance.runtime().gen.0.next()), + propolis_id: None, + dst_propolis_id: None, + migration_id: None, + nexus_state: InstanceState::NoVmm, + }, + ) + .await + .unwrap(); + + test_helpers::instance_delete_by_name( + cptestctx, + INSTANCE_NAME, + PROJECT_NAME, + ) + .await; + } + + // === Active VMM destroyed tests === + + #[nexus_test(server = crate::Server)] + async fn test_active_vmm_destroyed_succeeds( + cptestctx: &ControlPlaneTestContext, + ) { + let _project_id = setup_test_project(&cptestctx.external_client).await; + let (state, params) = setup_active_vmm_destroyed_test(cptestctx).await; + + // Run the instance-update saga. + let nexus = &cptestctx.server.server_context().nexus; + nexus + .sagas + .saga_execute::(params) + .await + .expect("update saga should succeed"); + + // Assert that the saga properly cleaned up the active VMM's resources. + verify_active_vmm_destroyed(cptestctx, state.instance().id()).await; + } + + #[nexus_test(server = crate::Server)] + async fn test_active_vmm_destroyed_actions_succeed_idempotently( + cptestctx: &ControlPlaneTestContext, + ) { + let _project_id = setup_test_project(&cptestctx.external_client).await; + let (state, params) = setup_active_vmm_destroyed_test(cptestctx).await; + + // Build the saga DAG with the provided test parameters + let real_params = make_real_params( + cptestctx, + &test_helpers::test_opctx(cptestctx), + params, + ) + .await; + let dag = + create_saga_dag::(real_params).unwrap(); + + crate::app::sagas::test_helpers::actions_succeed_idempotently( + &cptestctx.server.server_context().nexus, + dag, + ) + .await; + + // Assert that the saga properly cleaned up the active VMM's resources. + verify_active_vmm_destroyed(cptestctx, state.instance().id()).await; + } + + #[nexus_test(server = crate::Server)] + async fn test_active_vmm_destroyed_action_failure_can_unwind( + cptestctx: &ControlPlaneTestContext, + ) { + let _project_id = setup_test_project(&cptestctx.external_client).await; + let nexus = &cptestctx.server.server_context().nexus; + let opctx = test_helpers::test_opctx(cptestctx); + // Stupid side channel for passing the expected parent start saga's lock + // ID into the "after unwinding" method, so that it can check that the + // lock is either released or was never acquired. + let parent_saga_id = Arc::new(Mutex::new(None)); + + test_helpers::action_failure_can_unwind::< + SagaDoActualInstanceUpdate, + _, + _, + >( + nexus, + || { + let parent_saga_id = parent_saga_id.clone(); + let opctx = &opctx; + Box::pin(async move { + let (_, start_saga_params) = + setup_active_vmm_destroyed_test(cptestctx).await; + + // Since the unwinding test will test unwinding from each + // individual saga node *in the saga DAG constructed by the + // provided params*, we need to give it the "real saga"'s + // params rather than the start saga's params. Otherwise, + // we're just testing the unwinding behavior of the trivial + // two-node start saga + let real_params = + make_real_params(cptestctx, opctx, start_saga_params) + .await; + *parent_saga_id.lock().unwrap() = + Some(real_params.orig_lock.updater_id); + real_params + }) + }, + || { + let parent_saga_id = parent_saga_id.clone(); + Box::pin(async move { + let parent_saga_id = + parent_saga_id.lock().unwrap().take().expect( + "parent saga's lock ID must have been set by the \ + `before_saga` function; this is a test bug", + ); + after_unwinding(Some(parent_saga_id), cptestctx).await + }) + }, + &cptestctx.logctx.log, + ) + .await; + } + + // === idempotency and unwinding tests for the start saga === + + // We only do these tests with an "active VMM destroyed" precondition, since + // the behavior of the `start-instance-update` saga does *not* depend on the + // specific update to perform, and it seems unnecessary to run the start + // saga's tests against every possible migration outcome combination tested + // below. + + #[nexus_test(server = crate::Server)] + async fn test_start_saga_actions_succeed_idempotently( + cptestctx: &ControlPlaneTestContext, + ) { + let _project_id = setup_test_project(&cptestctx.external_client).await; + let (state, params) = setup_active_vmm_destroyed_test(cptestctx).await; + let dag = create_saga_dag::(params).unwrap(); + + crate::app::sagas::test_helpers::actions_succeed_idempotently( + &cptestctx.server.server_context().nexus, + dag, + ) + .await; + + // Assert that the saga properly cleaned up the active VMM's resources. + verify_active_vmm_destroyed(cptestctx, state.instance().id()).await; + } + + #[nexus_test(server = crate::Server)] + async fn test_start_saga_action_failure_can_unwind( + cptestctx: &ControlPlaneTestContext, + ) { + let _project_id = setup_test_project(&cptestctx.external_client).await; + let nexus = &cptestctx.server.server_context().nexus; + + test_helpers::action_failure_can_unwind::( + nexus, + || { + Box::pin(async { + let (_, params) = + setup_active_vmm_destroyed_test(cptestctx).await; + params + }) + }, + // Don't pass a parent saga ID here because the instance MUST be + // unlocked if the whole start saga unwinds. + || Box::pin(after_unwinding(None, cptestctx)), + &cptestctx.logctx.log, + ) + .await; + } + + // --- test helpers --- + + async fn setup_active_vmm_destroyed_test( + cptestctx: &ControlPlaneTestContext, + ) -> (InstanceAndActiveVmm, Params) { + let client = &cptestctx.external_client; + let nexus = &cptestctx.server.server_context().nexus; + let datastore = nexus.datastore().clone(); + + let opctx = test_helpers::test_opctx(cptestctx); + let instance = create_instance(client).await; + let instance_id = InstanceUuid::from_untyped_uuid(instance.identity.id); + + // Poke the instance to get it into the Running state. + test_helpers::instance_simulate(cptestctx, &instance_id).await; + + let state = test_helpers::instance_fetch(cptestctx, instance_id).await; + // The instance should have an active VMM. + let instance_runtime = state.instance().runtime(); + assert_eq!(instance_runtime.nexus_state, InstanceState::Vmm); + assert!(instance_runtime.propolis_id.is_some()); + // Once we destroy the active VMM, we'll assert that the virtual + // provisioning and sled resource records it owns have been deallocated. + // In order to ensure we're actually testing the correct thing, let's + // make sure that those records exist now --- if not, the assertions + // later won't mean anything! + assert!( + !test_helpers::no_virtual_provisioning_resource_records_exist( + cptestctx + ) + .await, + "we can't assert that a destroyed VMM instance update deallocates \ + virtual provisioning records if none exist!", + ); + assert!( + !test_helpers::no_virtual_provisioning_collection_records_using_instances(cptestctx) + .await, + "we can't assert that a destroyed VMM instance update deallocates \ + virtual provisioning records if none exist!", + ); + assert!( + !test_helpers::no_sled_resource_instance_records_exist(cptestctx) + .await, + "we can't assert that a destroyed VMM instance update deallocates \ + sled resource records if none exist!" + ); + + // Now, destroy the active VMM + let vmm = state.vmm().as_ref().unwrap(); + let vmm_id = PropolisUuid::from_untyped_uuid(vmm.id); + datastore + .vmm_update_runtime( + &vmm_id, + &VmmRuntimeState { + time_state_updated: Utc::now(), + gen: Generation(vmm.runtime.gen.0.next()), + state: VmmState::Destroyed, + }, + ) + .await + .unwrap(); + + let (_, _, authz_instance, ..) = LookupPath::new(&opctx, &datastore) + .instance_id(instance_id.into_untyped_uuid()) + .fetch() + .await + .expect("test instance should be present in datastore"); + let params = Params { + authz_instance, + serialized_authn: authn::saga::Serialized::for_opctx(&opctx), + }; + (state, params) + } + + async fn verify_active_vmm_destroyed( + cptestctx: &ControlPlaneTestContext, + instance_id: Uuid, + ) { + let state = test_helpers::instance_fetch( + cptestctx, + InstanceUuid::from_untyped_uuid(instance_id), + ) + .await; + + // The instance's active VMM has been destroyed, so its state should + // transition to `NoVmm`, and its active VMM ID should be unlinked. The + // virtual provisioning and sled resources allocated to the instance + // should be deallocated. + assert_instance_unlocked(state.instance()); + assert!(state.vmm().is_none()); + let instance_runtime = state.instance().runtime(); + assert_eq!(instance_runtime.nexus_state, InstanceState::NoVmm); + assert!(instance_runtime.propolis_id.is_none()); + assert!( + test_helpers::no_virtual_provisioning_resource_records_exist( + cptestctx + ) + .await + ); + assert!(test_helpers::no_virtual_provisioning_collection_records_using_instances(cptestctx).await); + assert!( + test_helpers::no_sled_resource_instance_records_exist(cptestctx) + .await + ); + } + + // === migration source completed tests === + + #[nexus_test(server = crate::Server)] + async fn test_migration_source_completed_succeeds( + cptestctx: &ControlPlaneTestContext, + ) { + let _project_id = setup_test_project(&cptestctx.external_client).await; + let other_sleds = test_helpers::add_sleds(cptestctx, 1).await; + MigrationOutcome::default() + .source(MigrationState::Completed, VmmState::Stopping) + .setup_test(cptestctx, &other_sleds) + .await + .run_saga_basic_usage_succeeds_test(cptestctx) + .await; + } + + #[nexus_test(server = crate::Server)] + async fn test_migration_source_completed_actions_succeed_idempotently( + cptestctx: &ControlPlaneTestContext, + ) { + let _project_id = setup_test_project(&cptestctx.external_client).await; + let other_sleds = test_helpers::add_sleds(cptestctx, 1).await; + + MigrationOutcome::default() + .source(MigrationState::Completed, VmmState::Stopping) + .setup_test(cptestctx, &other_sleds) + .await + .run_actions_succeed_idempotently_test(cptestctx) + .await; + } + + #[nexus_test(server = crate::Server)] + async fn test_migration_source_completed_can_unwind( + cptestctx: &ControlPlaneTestContext, + ) { + MigrationOutcome::default() + .source(MigrationState::Completed, VmmState::Stopping) + .run_unwinding_test(cptestctx) + .await; + } + + // === migration target completed tests === + + #[nexus_test(server = crate::Server)] + async fn test_migration_target_completed_succeeds( + cptestctx: &ControlPlaneTestContext, + ) { + let _project_id = setup_test_project(&cptestctx.external_client).await; + let other_sleds = test_helpers::add_sleds(cptestctx, 1).await; + + MigrationOutcome::default() + .target(MigrationState::Completed, VmmState::Running) + .setup_test(cptestctx, &other_sleds) + .await + .run_saga_basic_usage_succeeds_test(cptestctx) + .await; + } + + #[nexus_test(server = crate::Server)] + async fn test_migration_target_completed_actions_succeed_idempotently( + cptestctx: &ControlPlaneTestContext, + ) { + let _project_id = setup_test_project(&cptestctx.external_client).await; + let other_sleds = test_helpers::add_sleds(cptestctx, 1).await; + + MigrationOutcome::default() + .target(MigrationState::Completed, VmmState::Running) + .setup_test(cptestctx, &other_sleds) + .await + .run_actions_succeed_idempotently_test(cptestctx) + .await; + } + + #[nexus_test(server = crate::Server)] + async fn test_migration_target_completed_can_unwind( + cptestctx: &ControlPlaneTestContext, + ) { + MigrationOutcome::default() + .target(MigrationState::Completed, VmmState::Running) + .run_unwinding_test(cptestctx) + .await; + } + + // === migration completed and source destroyed tests === + + #[nexus_test(server = crate::Server)] + async fn test_migration_completed_source_destroyed_succeeds( + cptestctx: &ControlPlaneTestContext, + ) { + let _project_id = setup_test_project(&cptestctx.external_client).await; + let other_sleds = test_helpers::add_sleds(cptestctx, 1).await; + + MigrationOutcome::default() + .target(MigrationState::Completed, VmmState::Running) + .source(MigrationState::Completed, VmmState::Destroyed) + .setup_test(cptestctx, &other_sleds) + .await + .run_saga_basic_usage_succeeds_test(cptestctx) + .await; + } + + #[nexus_test(server = crate::Server)] + async fn test_migration_completed_source_destroyed_actions_succeed_idempotently( + cptestctx: &ControlPlaneTestContext, + ) { + let _project_id = setup_test_project(&cptestctx.external_client).await; + let other_sleds = test_helpers::add_sleds(cptestctx, 1).await; + + MigrationOutcome::default() + .target(MigrationState::Completed, VmmState::Running) + .source(MigrationState::Completed, VmmState::Destroyed) + .setup_test(cptestctx, &other_sleds) + .await + .run_actions_succeed_idempotently_test(cptestctx) + .await; + } + + #[nexus_test(server = crate::Server)] + async fn test_migration_completed_source_destroyed_can_unwind( + cptestctx: &ControlPlaneTestContext, + ) { + MigrationOutcome::default() + .target(MigrationState::Completed, VmmState::Running) + .source(MigrationState::Completed, VmmState::Destroyed) + .run_unwinding_test(cptestctx) + .await; + } + + // === migration failed, target not destroyed === + + #[nexus_test(server = crate::Server)] + async fn test_migration_target_failed_succeeds( + cptestctx: &ControlPlaneTestContext, + ) { + let _project_id = setup_test_project(&cptestctx.external_client).await; + let other_sleds = test_helpers::add_sleds(cptestctx, 1).await; + + MigrationOutcome::default() + .target(MigrationState::Failed, VmmState::Failed) + .source(MigrationState::Failed, VmmState::Running) + .setup_test(cptestctx, &other_sleds) + .await + .run_saga_basic_usage_succeeds_test(cptestctx) + .await; + } + + #[nexus_test(server = crate::Server)] + async fn test_migration_target_failed_actions_succeed_idempotently( + cptestctx: &ControlPlaneTestContext, + ) { + let _project_id = setup_test_project(&cptestctx.external_client).await; + let other_sleds = test_helpers::add_sleds(cptestctx, 1).await; + + MigrationOutcome::default() + .target(MigrationState::Failed, VmmState::Failed) + .source(MigrationState::Failed, VmmState::Running) + .setup_test(cptestctx, &other_sleds) + .await + .run_actions_succeed_idempotently_test(cptestctx) + .await; + } + + #[nexus_test(server = crate::Server)] + async fn test_migration_target_failed_can_unwind( + cptestctx: &ControlPlaneTestContext, + ) { + MigrationOutcome::default() + .target(MigrationState::Failed, VmmState::Failed) + .source(MigrationState::Failed, VmmState::Running) + .run_unwinding_test(cptestctx) + .await; + } + + // === migration failed, migration target destroyed tests === + + #[nexus_test(server = crate::Server)] + async fn test_migration_target_failed_destroyed_succeeds( + cptestctx: &ControlPlaneTestContext, + ) { + let _project_id = setup_test_project(&cptestctx.external_client).await; + let other_sleds = test_helpers::add_sleds(cptestctx, 1).await; + + MigrationOutcome::default() + .target(MigrationState::Failed, VmmState::Destroyed) + .source(MigrationState::Failed, VmmState::Running) + .setup_test(cptestctx, &other_sleds) + .await + .run_saga_basic_usage_succeeds_test(cptestctx) + .await; + } + + #[nexus_test(server = crate::Server)] + async fn test_migration_target_failed_destroyed_actions_succeed_idempotently( + cptestctx: &ControlPlaneTestContext, + ) { + let _project_id = setup_test_project(&cptestctx.external_client).await; + let other_sleds = test_helpers::add_sleds(cptestctx, 1).await; + + MigrationOutcome::default() + .target(MigrationState::Failed, VmmState::Destroyed) + .source(MigrationState::Failed, VmmState::Running) + .setup_test(cptestctx, &other_sleds) + .await + .run_actions_succeed_idempotently_test(cptestctx) + .await; + } + + #[nexus_test(server = crate::Server)] + async fn test_migration_target_failed_destroyed_can_unwind( + cptestctx: &ControlPlaneTestContext, + ) { + MigrationOutcome::default() + .target(MigrationState::Failed, VmmState::Destroyed) + .source(MigrationState::Failed, VmmState::Running) + .run_unwinding_test(cptestctx) + .await; + } + + // === migration failed, migration source destroyed tests === + + #[nexus_test(server = crate::Server)] + async fn test_migration_source_failed_destroyed_succeeds( + cptestctx: &ControlPlaneTestContext, + ) { + let _project_id = setup_test_project(&cptestctx.external_client).await; + let other_sleds = test_helpers::add_sleds(cptestctx, 1).await; + + MigrationOutcome::default() + .target(MigrationState::InProgress, VmmState::Running) + .source(MigrationState::Failed, VmmState::Destroyed) + .setup_test(cptestctx, &other_sleds) + .await + .run_saga_basic_usage_succeeds_test(cptestctx) + .await; + } + + #[nexus_test(server = crate::Server)] + async fn test_migration_source_failed_destroyed_actions_succeed_idempotently( + cptestctx: &ControlPlaneTestContext, + ) { + let _project_id = setup_test_project(&cptestctx.external_client).await; + let other_sleds = test_helpers::add_sleds(cptestctx, 1).await; + + MigrationOutcome::default() + .target(MigrationState::InProgress, VmmState::Running) + .source(MigrationState::Failed, VmmState::Destroyed) + .setup_test(cptestctx, &other_sleds) + .await + .run_actions_succeed_idempotently_test(cptestctx) + .await; + } + + #[nexus_test(server = crate::Server)] + async fn test_migration_source_failed_destroyed_can_unwind( + cptestctx: &ControlPlaneTestContext, + ) { + MigrationOutcome::default() + .target(MigrationState::InProgress, VmmState::Running) + .source(MigrationState::Failed, VmmState::Destroyed) + .run_unwinding_test(cptestctx) + .await; + } + + // === migration failed, source and target both destroyed === + + #[nexus_test(server = crate::Server)] + async fn test_migration_failed_everyone_died_succeeds( + cptestctx: &ControlPlaneTestContext, + ) { + let _project_id = setup_test_project(&cptestctx.external_client).await; + let other_sleds = test_helpers::add_sleds(cptestctx, 1).await; + + MigrationOutcome::default() + .target(MigrationState::Failed, VmmState::Destroyed) + .source(MigrationState::Failed, VmmState::Destroyed) + .setup_test(cptestctx, &other_sleds) + .await + .run_saga_basic_usage_succeeds_test(cptestctx) + .await; + } + + #[nexus_test(server = crate::Server)] + async fn test_migration_failed_everyone_died_actions_succeed_idempotently( + cptestctx: &ControlPlaneTestContext, + ) { + let _project_id = setup_test_project(&cptestctx.external_client).await; + let other_sleds = test_helpers::add_sleds(cptestctx, 1).await; + + MigrationOutcome::default() + .target(MigrationState::Failed, VmmState::Destroyed) + .source(MigrationState::Failed, VmmState::Destroyed) + .setup_test(cptestctx, &other_sleds) + .await + .run_actions_succeed_idempotently_test(cptestctx) + .await; + } + + #[nexus_test(server = crate::Server)] + async fn test_migration_failed_everyone_died_can_unwind( + cptestctx: &ControlPlaneTestContext, + ) { + MigrationOutcome::default() + .target(MigrationState::Failed, VmmState::Destroyed) + .source(MigrationState::Failed, VmmState::Destroyed) + .run_unwinding_test(cptestctx) + .await; + } + + // === migration completed, but then the target was destroyed === + + #[nexus_test(server = crate::Server)] + async fn test_migration_completed_but_target_destroyed_succeeds( + cptestctx: &ControlPlaneTestContext, + ) { + let _project_id = setup_test_project(&cptestctx.external_client).await; + let other_sleds = test_helpers::add_sleds(cptestctx, 1).await; + + MigrationOutcome::default() + .target(MigrationState::Completed, VmmState::Destroyed) + .source(MigrationState::Completed, VmmState::Stopping) + .setup_test(cptestctx, &other_sleds) + .await + .run_saga_basic_usage_succeeds_test(cptestctx) + .await; + } + + #[nexus_test(server = crate::Server)] + async fn test_migration_completed_but_target_destroyed_actions_succeed_idempotently( + cptestctx: &ControlPlaneTestContext, + ) { + let _project_id = setup_test_project(&cptestctx.external_client).await; + let other_sleds = test_helpers::add_sleds(cptestctx, 1).await; + + MigrationOutcome::default() + .target(MigrationState::Completed, VmmState::Destroyed) + .source(MigrationState::Completed, VmmState::Stopping) + .setup_test(cptestctx, &other_sleds) + .await + .run_actions_succeed_idempotently_test(cptestctx) + .await; + } + + #[nexus_test(server = crate::Server)] + async fn test_migration_completed_but_target_destroyed_can_unwind( + cptestctx: &ControlPlaneTestContext, + ) { + MigrationOutcome::default() + .target(MigrationState::Completed, VmmState::Destroyed) + .source(MigrationState::Completed, VmmState::Stopping) + .run_unwinding_test(cptestctx) + .await; + } + + #[derive(Clone, Copy, Default)] + struct MigrationOutcome { + source: Option<(MigrationState, VmmState)>, + target: Option<(MigrationState, VmmState)>, + failed: bool, + } + + impl MigrationOutcome { + fn source(self, migration: MigrationState, vmm: VmmState) -> Self { + let failed = self.failed + || migration == MigrationState::Failed + || vmm == VmmState::Failed; + Self { source: Some((migration, vmm)), failed, ..self } + } + + fn target(self, migration: MigrationState, vmm: VmmState) -> Self { + let failed = self.failed + || migration == MigrationState::Failed + || vmm == VmmState::Failed; + Self { target: Some((migration, vmm)), failed, ..self } + } + + async fn setup_test( + self, + cptestctx: &ControlPlaneTestContext, + other_sleds: &[(SledUuid, omicron_sled_agent::sim::Server)], + ) -> MigrationTest { + MigrationTest::setup(self, cptestctx, other_sleds).await + } + + async fn run_unwinding_test( + &self, + cptestctx: &ControlPlaneTestContext, + ) { + let nexus = &cptestctx.server.server_context().nexus; + let other_sleds = test_helpers::add_sleds(cptestctx, 1).await; + let _project_id = + setup_test_project(&cptestctx.external_client).await; + let opctx = test_helpers::test_opctx(&cptestctx); + + // Stupid side channel for passing the expected parent start saga's lock + // ID into the "after unwinding" method, so that it can check that the + // lock is either released or was never acquired. + let parent_saga_id = Arc::new(Mutex::new(None)); + + test_helpers::action_failure_can_unwind::< + SagaDoActualInstanceUpdate, + _, + _, + >( + nexus, + || { + let parent_saga_id = parent_saga_id.clone(); + let other_sleds = &other_sleds; + let opctx = &opctx; + Box::pin(async move { + // Since the unwinding test will test unwinding from each + // individual saga node *in the saga DAG constructed by the + // provided params*, we need to give it the "real saga"'s + // params rather than the start saga's params. Otherwise, + // we're just testing the unwinding behavior of the trivial + // two-node start saga. + let start_saga_params = self + .setup_test(cptestctx, other_sleds) + .await + .start_saga_params(); + let real_params = make_real_params( + cptestctx, + opctx, + start_saga_params, + ) + .await; + *parent_saga_id.lock().unwrap() = + Some(real_params.orig_lock.updater_id); + real_params + }) + }, + || { + let parent_saga_id = parent_saga_id.clone(); + Box::pin(async move { + let parent_saga_id = + parent_saga_id.lock().unwrap().take().expect( + "parent saga's lock ID must have been set by \ + the `before_saga` function; this is a test \ + bug", + ); + after_unwinding(Some(parent_saga_id), cptestctx).await + }) + }, + &cptestctx.logctx.log, + ) + .await; + } + } + + struct MigrationTest { + outcome: MigrationOutcome, + instance_id: InstanceUuid, + initial_state: InstanceGestalt, + authz_instance: authz::Instance, + opctx: OpContext, + } + + impl MigrationTest { + fn target_vmm_id(&self) -> Uuid { + self.initial_state + .target_vmm + .as_ref() + .expect("migrating instance must have a target VMM") + .id + } + + fn src_vmm_id(&self) -> Uuid { + self.initial_state + .active_vmm + .as_ref() + .expect("migrating instance must have a source VMM") + .id + } + + async fn setup( + outcome: MigrationOutcome, + cptestctx: &ControlPlaneTestContext, + other_sleds: &[(SledUuid, omicron_sled_agent::sim::Server)], + ) -> Self { + use crate::app::sagas::instance_migrate; + + let client = &cptestctx.external_client; + let nexus = &cptestctx.server.server_context().nexus; + let datastore = nexus.datastore(); + + let opctx = test_helpers::test_opctx(cptestctx); + let instance = create_instance(client).await; + let instance_id = + InstanceUuid::from_untyped_uuid(instance.identity.id); + + // Poke the instance to get it into the Running state. + let state = + test_helpers::instance_fetch(cptestctx, instance_id).await; + test_helpers::instance_simulate(cptestctx, &instance_id).await; + + let vmm = state.vmm().as_ref().unwrap(); + let dst_sled_id = + test_helpers::select_first_alternate_sled(vmm, other_sleds); + let params = instance_migrate::Params { + serialized_authn: authn::saga::Serialized::for_opctx(&opctx), + instance: state.instance().clone(), + src_vmm: vmm.clone(), + migrate_params: params::InstanceMigrate { + dst_sled_id: dst_sled_id.into_untyped_uuid(), + }, + }; + + nexus + .sagas + .saga_execute::(params) + .await + .expect("Migration saga should succeed"); + + // Poke the destination sled just enough to make it appear to have a VMM. + test_helpers::instance_single_step_on_sled( + cptestctx, + &instance_id, + &dst_sled_id, + ) + .await; + + let (_, _, authz_instance, ..) = + LookupPath::new(&opctx, &datastore) + .instance_id(instance_id.into_untyped_uuid()) + .fetch() + .await + .expect("test instance should be present in datastore"); + let initial_state = datastore + .instance_fetch_all(&opctx, &authz_instance) + .await + .expect("test instance should be present in datastore"); + + let this = Self { + authz_instance, + initial_state, + outcome, + opctx, + instance_id, + }; + if let Some((migration_state, vmm_state)) = this.outcome.source { + this.update_src_state(cptestctx, vmm_state, migration_state) + .await; + } + + if let Some((migration_state, vmm_state)) = this.outcome.target { + this.update_target_state(cptestctx, vmm_state, migration_state) + .await; + } + + this + } + + async fn run_saga_basic_usage_succeeds_test( + &self, + cptestctx: &ControlPlaneTestContext, + ) { + // Run the instance-update saga. + let nexus = &cptestctx.server.server_context().nexus; + nexus + .sagas + .saga_execute::(self.start_saga_params()) + .await + .expect("update saga should succeed"); + + // Check the results + self.verify(cptestctx).await; + } + + async fn run_actions_succeed_idempotently_test( + &self, + cptestctx: &ControlPlaneTestContext, + ) { + let params = make_real_params( + cptestctx, + &self.opctx, + self.start_saga_params(), + ) + .await; + + // Build the saga DAG with the provided test parameters + let dag = + create_saga_dag::(params).unwrap(); + + // Run the actions-succeed-idempotently test + test_helpers::actions_succeed_idempotently( + &cptestctx.server.server_context().nexus, + dag, + ) + .await; + + // Check the results + self.verify(cptestctx).await; + } + + async fn update_src_state( + &self, + cptestctx: &ControlPlaneTestContext, + vmm_state: VmmState, + migration_state: MigrationState, + ) { + let src_vmm = self + .initial_state + .active_vmm + .as_ref() + .expect("must have an active VMM"); + let vmm_id = PropolisUuid::from_untyped_uuid(src_vmm.id); + let new_runtime = nexus_db_model::VmmRuntimeState { + time_state_updated: Utc::now(), + gen: Generation(src_vmm.runtime.gen.0.next()), + state: vmm_state, + }; + + let migration = self + .initial_state + .migration + .as_ref() + .expect("must have an active migration"); + let migration_out = MigrationRuntimeState { + migration_id: migration.id, + state: migration_state, + gen: migration.source_gen.0.next(), + time_updated: Utc::now(), + }; + let migrations = Migrations { + migration_in: None, + migration_out: Some(&migration_out), + }; + + info!( + cptestctx.logctx.log, + "updating source VMM state..."; + "propolis_id" => %vmm_id, + "new_runtime" => ?new_runtime, + "migration_out" => ?migration_out, + ); + + cptestctx + .server + .server_context() + .nexus + .datastore() + .vmm_and_migration_update_runtime( + &self.opctx, + vmm_id, + &new_runtime, + migrations, + ) + .await + .expect("updating migration source state should succeed"); + } + + async fn update_target_state( + &self, + cptestctx: &ControlPlaneTestContext, + vmm_state: VmmState, + migration_state: MigrationState, + ) { + let target_vmm = self + .initial_state + .target_vmm + .as_ref() + .expect("must have a target VMM"); + let vmm_id = PropolisUuid::from_untyped_uuid(target_vmm.id); + let new_runtime = nexus_db_model::VmmRuntimeState { + time_state_updated: Utc::now(), + gen: Generation(target_vmm.runtime.gen.0.next()), + state: vmm_state, + }; + + let migration = self + .initial_state + .migration + .as_ref() + .expect("must have an active migration"); + let migration_in = MigrationRuntimeState { + migration_id: migration.id, + state: migration_state, + gen: migration.target_gen.0.next(), + time_updated: Utc::now(), + }; + let migrations = Migrations { + migration_in: Some(&migration_in), + migration_out: None, + }; + + info!( + cptestctx.logctx.log, + "updating target VMM state..."; + "propolis_id" => %vmm_id, + "new_runtime" => ?new_runtime, + "migration_in" => ?migration_in, + ); + + cptestctx + .server + .server_context() + .nexus + .datastore() + .vmm_and_migration_update_runtime( + &self.opctx, + vmm_id, + &new_runtime, + migrations, + ) + .await + .expect("updating migration target state should succeed"); + } + + fn start_saga_params(&self) -> Params { + Params { + authz_instance: self.authz_instance.clone(), + serialized_authn: authn::saga::Serialized::for_opctx( + &self.opctx, + ), + } + } + + async fn verify(&self, cptestctx: &ControlPlaneTestContext) { + info!( + cptestctx.logctx.log, + "checking update saga results after migration"; + "source_outcome" => ?dbg!(self.outcome.source.as_ref()), + "target_outcome" => ?dbg!(self.outcome.target.as_ref()), + "migration_failed" => dbg!(self.outcome.failed), + ); + + use test_helpers::*; + let state = + test_helpers::instance_fetch(cptestctx, self.instance_id).await; + let instance = state.instance(); + let instance_runtime = instance.runtime(); + + let active_vmm_id = instance_runtime.propolis_id; + + assert_instance_unlocked(instance); + assert_instance_record_is_consistent(instance); + + let target_destroyed = self + .outcome + .target + .as_ref() + .map(|(_, state)| state == &VmmState::Destroyed) + .unwrap_or(false); + + if self.outcome.failed { + assert_eq!( + instance_runtime.migration_id, None, + "migration ID must be unset when a migration has failed" + ); + assert_eq!( + instance_runtime.dst_propolis_id, None, + "target VMM ID must be unset when a migration has failed" + ); + } else { + if dbg!(target_destroyed) { + assert_eq!( + active_vmm_id, None, + "if the target VMM was destroyed, it should be unset, \ + even if a migration succeeded", + ); + assert_eq!( + instance_runtime.nexus_state, + InstanceState::NoVmm + ); + } else { + assert_eq!( + active_vmm_id, + Some(self.target_vmm_id()), + "target VMM must be in the active VMM position after \ + migration success", + ); + + assert_eq!( + instance_runtime.nexus_state, + InstanceState::Vmm + ); + } + if self + .outcome + .target + .as_ref() + .map(|(state, _)| state == &MigrationState::Completed) + .unwrap_or(false) + { + assert_eq!( + instance_runtime.dst_propolis_id, None, + "target VMM ID must be unset once target VMM reports success", + ); + assert_eq!( + instance_runtime.migration_id, None, + "migration ID must be unset once target VMM reports success", + ); + } else { + assert_eq!( + instance_runtime.dst_propolis_id, + Some(self.target_vmm_id()), + "target VMM ID must remain set until the target VMM reports success", + ); + assert_eq!( + instance_runtime.migration_id, + self.initial_state.instance.runtime().migration_id, + "migration ID must remain set until target VMM reports success", + ); + } + } + + let src_destroyed = self + .outcome + .source + .as_ref() + .map(|(_, state)| state == &VmmState::Destroyed) + .unwrap_or(false); + assert_eq!( + self.src_resource_records_exist(cptestctx).await, + !src_destroyed, + "source VMM should exist if and only if the source hasn't been destroyed", + ); + + assert_eq!( + self.target_resource_records_exist(cptestctx).await, + !target_destroyed, + "target VMM should exist if and only if the target hasn't been destroyed", + ); + + // VThe instance has a VMM if (and only if): + let has_vmm = if self.outcome.failed { + // If the migration failed, the instance should have a VMM if + // and only if the source VMM is still okay. It doesn't matter + // whether the target is still there or not, because we didn't + // migrate to it successfully. + !src_destroyed + } else { + // Otherwise, if the migration succeeded, the instance should be + // on the target VMM, and virtual provisioning records should + // exist as long as the + !target_destroyed + }; + + assert_eq!( + no_virtual_provisioning_resource_records_exist(cptestctx).await, + !has_vmm, + "virtual provisioning resource records must exist as long as \ + the instance has a VMM", + ); + assert_eq!( + no_virtual_provisioning_collection_records_using_instances( + cptestctx + ) + .await, + !has_vmm, + "virtual provisioning collection records must exist as long \ + as the instance has a VMM", + ); + + let instance_state = + if has_vmm { InstanceState::Vmm } else { InstanceState::NoVmm }; + assert_eq!(instance_runtime.nexus_state, instance_state); + } + + async fn src_resource_records_exist( + &self, + cptestctx: &ControlPlaneTestContext, + ) -> bool { + test_helpers::sled_resources_exist_for_vmm( + cptestctx, + PropolisUuid::from_untyped_uuid(self.src_vmm_id()), + ) + .await + } + + async fn target_resource_records_exist( + &self, + cptestctx: &ControlPlaneTestContext, + ) -> bool { + test_helpers::sled_resources_exist_for_vmm( + cptestctx, + PropolisUuid::from_untyped_uuid(self.target_vmm_id()), + ) + .await + } + } + + async fn make_real_params( + cptestctx: &ControlPlaneTestContext, + opctx: &OpContext, + Params { authz_instance, serialized_authn }: Params, + ) -> RealParams { + let nexus = &cptestctx.server.server_context().nexus; + let datastore = nexus.datastore(); + let log = &cptestctx.logctx.log; + + let lock_id = Uuid::new_v4(); + let orig_lock = datastore + .instance_updater_lock(opctx, &authz_instance, lock_id) + .await + .expect("must lock instance"); + let state = datastore + .instance_fetch_all(&opctx, &authz_instance) + .await + .expect("instance must exist"); + let update = UpdatesRequired::for_instance(&log, &state) + .expect("the test's precondition should require updates"); + + info!( + log, + "made params for real saga"; + "instance" => ?state.instance, + "active_vmm" => ?state.active_vmm, + "target_vmm" => ?state.target_vmm, + "migration" => ?state.migration, + "update.new_runtime" => ?update.new_runtime, + "update.destroy_active_vmm" => ?update.destroy_active_vmm, + "update.destroy_target_vmm" => ?update.destroy_target_vmm, + "update.deprovision" => ?update.deprovision, + "update.network_config" => ?update.network_config, + ); + + RealParams { authz_instance, serialized_authn, update, orig_lock } + } +} diff --git a/nexus/src/app/sagas/instance_update/start.rs b/nexus/src/app/sagas/instance_update/start.rs new file mode 100644 index 0000000000..fbd8cbffc2 --- /dev/null +++ b/nexus/src/app/sagas/instance_update/start.rs @@ -0,0 +1,308 @@ +// This Source Code Form is subject to the terms of the Mozilla Public +// License, v. 2.0. If a copy of the MPL was not distributed with this +// file, You can obtain one at https://mozilla.org/MPL/2.0/. + +// instance update start saga + +use super::{ + ActionRegistry, NexusActionContext, NexusSaga, RealParams, + SagaDoActualInstanceUpdate, SagaInitError, UpdatesRequired, + ACTION_GENERATE_ID, INSTANCE_LOCK, INSTANCE_LOCK_ID, +}; +use crate::app::saga; +use crate::app::sagas::declare_saga_actions; +use nexus_db_queries::db::datastore::instance; +use nexus_db_queries::{authn, authz}; +use serde::{Deserialize, Serialize}; +use steno::{ActionError, DagBuilder, Node, SagaResultErr}; +use uuid::Uuid; + +/// Parameters to the start instance update saga. +#[derive(Debug, Deserialize, Serialize)] +pub(crate) struct Params { + /// Authentication context to use to fetch the instance's current state from + /// the database. + pub(crate) serialized_authn: authn::saga::Serialized, + + pub(crate) authz_instance: authz::Instance, +} + +// instance update saga: actions + +declare_saga_actions! { + start_instance_update; + + // Acquire the instance updater" lock with this saga's ID if no other saga + // is currently updating the instance. + LOCK_INSTANCE -> "updater_lock" { + + siu_lock_instance + - siu_lock_instance_undo + } + + // Fetch the instance and VMM's state, and start the "real" instance update saga. + // N.B. that this must be performed as a separate action from + // `LOCK_INSTANCE`, so that if the lookup fails, we will still unwind the + // `LOCK_INSTANCE` action and release the lock. + FETCH_STATE_AND_START_REAL_SAGA -> "state" { + + siu_fetch_state_and_start_real_saga + } +} + +// instance update saga: definition + +#[derive(Debug)] +pub(crate) struct SagaInstanceUpdate; +impl NexusSaga for SagaInstanceUpdate { + const NAME: &'static str = "start-instance-update"; + type Params = Params; + + fn register_actions(registry: &mut ActionRegistry) { + start_instance_update_register_actions(registry); + super::SagaDoActualInstanceUpdate::register_actions(registry); + super::destroyed::SagaDestroyVmm::register_actions(registry); + } + + fn make_saga_dag( + _params: &Self::Params, + mut builder: DagBuilder, + ) -> Result { + builder.append(Node::action( + INSTANCE_LOCK_ID, + "GenerateInstanceLockId", + ACTION_GENERATE_ID.as_ref(), + )); + builder.append(lock_instance_action()); + builder.append(fetch_state_and_start_real_saga_action()); + + Ok(builder.build()?) + } +} + +// start instance update saga: action implementations + +async fn siu_lock_instance( + sagactx: NexusActionContext, +) -> Result, ActionError> { + let osagactx = sagactx.user_data(); + let Params { ref serialized_authn, ref authz_instance, .. } = + sagactx.saga_params::()?; + let lock_id = sagactx.lookup::(INSTANCE_LOCK_ID)?; + let opctx = + crate::context::op_context_for_saga_action(&sagactx, serialized_authn); + + info!( + osagactx.log(), + "instance update: attempting to lock instance"; + "instance_id" => %authz_instance.id(), + "saga_id" => %lock_id, + ); + + let locked = osagactx + .datastore() + .instance_updater_lock(&opctx, authz_instance, lock_id) + .await; + match locked { + Ok(lock) => Ok(Some(lock)), + // Don't return an error if we can't take the lock. This saga will + // simply not start the real instance update saga, rather than having to unwind. + Err(instance::UpdaterLockError::AlreadyLocked) => Ok(None), + // Okay, that's a real error. Time to die! + Err(instance::UpdaterLockError::Query(e)) => { + Err(ActionError::action_failed(e)) + } + } +} + +async fn siu_lock_instance_undo( + sagactx: NexusActionContext, +) -> Result<(), anyhow::Error> { + let Params { ref serialized_authn, ref authz_instance, .. } = + sagactx.saga_params::()?; + + // If the instance lock node in the saga context was `None`, that means + // we didn't acquire the lock, and we can die happily without having to + // worry about unlocking the instance. It would be pretty surprising if this + // saga unwound without having acquired the lock, but...whatever. + if let Some(lock) = + sagactx.lookup::>(INSTANCE_LOCK)? + { + super::unwind_instance_lock( + lock, + serialized_authn, + authz_instance, + &sagactx, + ) + .await; + } + + Ok(()) +} + +async fn siu_fetch_state_and_start_real_saga( + sagactx: NexusActionContext, +) -> Result<(), ActionError> { + let Params { serialized_authn, authz_instance, .. } = + sagactx.saga_params::()?; + let osagactx = sagactx.user_data(); + let lock_id = sagactx.lookup::(INSTANCE_LOCK_ID)?; + let instance_id = authz_instance.id(); + let log = osagactx.log(); + + // Did we get the lock? If so, we can start the next saga, otherwise, just + // exit gracefully. + let Some(orig_lock) = + sagactx.lookup::>(INSTANCE_LOCK)? + else { + info!( + log, + "instance update: instance is already locked! doing nothing..."; + "instance_id" => %instance_id, + "saga_id" => %lock_id, + ); + return Ok(()); + }; + + let opctx = + crate::context::op_context_for_saga_action(&sagactx, &serialized_authn); + let datastore = osagactx.datastore(); + let nexus = osagactx.nexus(); + + let state = datastore + .instance_fetch_all(&opctx, &authz_instance) + .await + .map_err(ActionError::action_failed)?; + + // Determine what updates are required based on the instance's current + // state snapshot. If there are updates to perform, execute the "real" + // update saga. Otherwise, if we don't need to do anything else, simply + // release the lock and finish this saga. + if let Some(update) = UpdatesRequired::for_instance(log, &state) { + info!( + log, + "instance update: starting real update saga..."; + "instance_id" => %instance_id, + "current.runtime_state" => ?state.instance.runtime(), + "current.migration" => ?state.migration, + "current.active_vmm" => ?state.active_vmm, + "current.target_vmm" => ?state.target_vmm, + "update.new_runtime_state" => ?update.new_runtime, + "update.network_config_update" => ?update.network_config, + "update.destroy_active_vmm" => ?update.destroy_active_vmm, + "update.destroy_target_vmm" => ?update.destroy_target_vmm, + "update.deprovision" => update.deprovision.is_some(), + ); + // Prepare the child saga. + // + // /!\ WARNING /!\ This is really finicky: whether or not the start saga + // should unwind depends on *whether the child `instance-update` saga + // has advanced far enough to have inherited the lock or not. If the + // child has not inherited the lock, we *must* unwind to ensure the lock + // is dropped. + // + // Note that we *don't* use `SagaExecutor::saga_execute`, which prepares + // the child saga and waits for it to complete. That function wraps all + // the errors returned by this whole process in an external API error, + // which makes it difficult for us to figure out *why* the child saga + // failed, and whether we should unwind or not. + + let dag = + saga::create_saga_dag::(RealParams { + serialized_authn, + authz_instance, + update, + orig_lock, + }) + // If we can't build a DAG for the child saga, we should unwind, so + // that we release the lock. + .map_err(|e| { + nexus.background_tasks.task_instance_updater.activate(); + ActionError::action_failed(e) + })?; + let child_result = nexus + .sagas + .saga_prepare(dag) + .await + // Similarly, if we can't prepare the child saga, we need to unwind + // and release the lock. + .map_err(|e| { + nexus.background_tasks.task_instance_updater.activate(); + ActionError::action_failed(e) + })? + .start() + .await + // And, if we can't start it, we need to unwind. + .map_err(|e| { + nexus.background_tasks.task_instance_updater.activate(); + ActionError::action_failed(e) + })? + .wait_until_stopped() + .await + .into_raw_result(); + match child_result.kind { + Ok(_) => { + debug!( + log, + "instance update: child saga completed successfully"; + "instance_id" => %instance_id, + "child_saga_id" => %child_result.saga_id, + ) + } + // Check if the child saga failed to inherit the updater lock from + // this saga. + Err(SagaResultErr { + error_node_name, + error_source: ActionError::ActionFailed { source_error }, + .. + }) if error_node_name.as_ref() == super::INSTANCE_LOCK => { + if let Ok(instance::UpdaterLockError::AlreadyLocked) = + serde_json::from_value(source_error) + { + // If inheriting the lock failed because the lock was held by another + // saga. If this is the case, that's fine: this action must have + // executed more than once, and created multiple child sagas. No big deal. + return Ok(()); + } else { + // Otherwise, the child saga could not inherit the lock for + // some other reason. That means we MUST unwind to ensure + // the lock is released. + return Err(ActionError::action_failed( + "child saga failed to inherit lock".to_string(), + )); + } + } + Err(error) => { + warn!( + log, + "instance update: child saga failed, unwinding..."; + "instance_id" => %instance_id, + "child_saga_id" => %child_result.saga_id, + "error" => ?error, + ); + + // If the real saga failed, kick the background task. If the real + // saga failed because this action was executed twice and the second + // child saga couldn't lock the instance, that's fine, because the + // background task will only start new sagas for instances whose DB + // state actually *needs* an update. + nexus.background_tasks.task_instance_updater.activate(); + return Err(error.error_source); + } + } + } else { + info!( + log, + "instance update: no updates required, releasing lock."; + "instance_id" => %authz_instance.id(), + "current.runtime_state" => ?state.instance.runtime(), + "current.migration" => ?state.migration, + "current.active_vmm" => ?state.active_vmm, + "current.target_vmm" => ?state.target_vmm, + ); + datastore + .instance_updater_unlock(&opctx, &authz_instance, &orig_lock) + .await + .map_err(ActionError::action_failed)?; + } + + Ok(()) +} diff --git a/nexus/src/app/sagas/mod.rs b/nexus/src/app/sagas/mod.rs index 17f43b4950..0c57a5b2dc 100644 --- a/nexus/src/app/sagas/mod.rs +++ b/nexus/src/app/sagas/mod.rs @@ -33,6 +33,7 @@ pub mod instance_ip_attach; pub mod instance_ip_detach; pub mod instance_migrate; pub mod instance_start; +pub mod instance_update; pub mod project_create; pub mod region_replacement_drive; pub mod region_replacement_finish; @@ -156,6 +157,9 @@ fn make_action_registry() -> ActionRegistry { ::register_actions( &mut registry, ); + ::register_actions( + &mut registry, + ); ::register_actions( &mut registry, ); diff --git a/nexus/src/app/sagas/snapshot_create.rs b/nexus/src/app/sagas/snapshot_create.rs index 76a82e7491..eeb14091b2 100644 --- a/nexus/src/app/sagas/snapshot_create.rs +++ b/nexus/src/app/sagas/snapshot_create.rs @@ -2308,6 +2308,20 @@ mod test { PROJECT_NAME, ) .await; + // Wait until the instance has advanced to the `NoVmm` + // state before deleting it. This may not happen + // immediately, as the `Nexus::cpapi_instances_put` API + // endpoint simply writes the new VMM state to the + // database and *starts* an `instance-update` saga, and + // the instance record isn't updated until that saga + // completes. + test_helpers::instance_wait_for_state_by_name( + cptestctx, + INSTANCE_NAME, + PROJECT_NAME, + nexus_db_model::InstanceState::NoVmm, + ) + .await; test_helpers::instance_delete_by_name( cptestctx, INSTANCE_NAME, diff --git a/nexus/src/app/sagas/test_helpers.rs b/nexus/src/app/sagas/test_helpers.rs index a5d9d0a843..b9388a1116 100644 --- a/nexus/src/app/sagas/test_helpers.rs +++ b/nexus/src/app/sagas/test_helpers.rs @@ -11,21 +11,31 @@ use crate::{ Nexus, }; use async_bb8_diesel::{AsyncRunQueryDsl, AsyncSimpleConnection}; +use camino::Utf8Path; use diesel::{ BoolExpressionMethods, ExpressionMethods, QueryDsl, SelectableHelper, }; use futures::future::BoxFuture; +use nexus_db_model::InstanceState; use nexus_db_queries::{ authz, context::OpContext, - db::{datastore::InstanceAndActiveVmm, lookup::LookupPath, DataStore}, + db::{ + datastore::{InstanceAndActiveVmm, InstanceGestalt}, + lookup::LookupPath, + DataStore, + }, }; +use nexus_test_interface::NexusServer; +use nexus_test_utils::start_sled_agent; use nexus_types::identity::Resource; +use omicron_common::api::external::Error; use omicron_common::api::external::NameOrId; -use omicron_uuid_kinds::{GenericUuid, InstanceUuid}; +use omicron_test_utils::dev::poll; +use omicron_uuid_kinds::{GenericUuid, InstanceUuid, PropolisUuid, SledUuid}; use sled_agent_client::TestInterfaces as _; use slog::{info, warn, Logger}; -use std::{num::NonZeroU32, sync::Arc}; +use std::{num::NonZeroU32, sync::Arc, time::Duration}; use steno::SagaDag; type ControlPlaneTestContext = @@ -136,6 +146,26 @@ pub(crate) async fn instance_simulate( sa.instance_finish_transition(instance_id.into_untyped_uuid()).await; } +pub(crate) async fn instance_single_step_on_sled( + cptestctx: &ControlPlaneTestContext, + instance_id: &InstanceUuid, + sled_id: &SledUuid, +) { + info!( + &cptestctx.logctx.log, + "Single-stepping simulated instance on sled"; + "instance_id" => %instance_id, + "sled_id" => %sled_id, + ); + let nexus = &cptestctx.server.server_context().nexus; + let sa = nexus + .sled_client(sled_id) + .await + .expect("sled must exist to simulate a state change"); + + sa.instance_single_step(instance_id.into_untyped_uuid()).await; +} + pub(crate) async fn instance_simulate_by_name( cptestctx: &ControlPlaneTestContext, name: &str, @@ -188,9 +218,169 @@ pub async fn instance_fetch( db_state } +pub async fn instance_fetch_all( + cptestctx: &ControlPlaneTestContext, + instance_id: InstanceUuid, +) -> InstanceGestalt { + let datastore = cptestctx.server.server_context().nexus.datastore().clone(); + let opctx = test_opctx(&cptestctx); + let (.., authz_instance) = LookupPath::new(&opctx, &datastore) + .instance_id(instance_id.into_untyped_uuid()) + .lookup_for(authz::Action::Read) + .await + .expect("test instance should be present in datastore"); + + let db_state = datastore + .instance_fetch_all(&opctx, &authz_instance) + .await + .expect("test instance's info should be fetchable"); + + info!(&cptestctx.logctx.log, "refetched all instance info from db"; + "instance_id" => %instance_id, + "instance" => ?db_state.instance, + "active_vmm" => ?db_state.active_vmm, + "target_vmm" => ?db_state.target_vmm, + "migration" => ?db_state.migration, + ); + + db_state +} +pub async fn instance_fetch_by_name( + cptestctx: &ControlPlaneTestContext, + name: &str, + project_name: &str, +) -> InstanceAndActiveVmm { + let nexus = &cptestctx.server.server_context().nexus; + let datastore = nexus.datastore(); + let opctx = test_opctx(&cptestctx); + let instance_selector = + nexus_types::external_api::params::InstanceSelector { + project: Some(project_name.to_string().try_into().unwrap()), + instance: name.to_string().try_into().unwrap(), + }; + + let instance_lookup = + nexus.instance_lookup(&opctx, instance_selector).unwrap(); + let (_, _, authz_instance, ..) = instance_lookup.fetch().await.unwrap(); + + let db_state = datastore + .instance_fetch_with_vmm(&opctx, &authz_instance) + .await + .expect("test instance's info should be fetchable"); + + info!(&cptestctx.logctx.log, "refetched instance info from db"; + "instance_name" => name, + "project_name" => project_name, + "instance_id" => %authz_instance.id(), + "instance_and_vmm" => ?db_state, + ); + + db_state +} + +pub(crate) async fn instance_wait_for_state( + cptestctx: &ControlPlaneTestContext, + instance_id: InstanceUuid, + desired_state: InstanceState, +) -> InstanceAndActiveVmm { + let opctx = test_opctx(&cptestctx); + let datastore = cptestctx.server.server_context().nexus.datastore(); + let (.., authz_instance) = LookupPath::new(&opctx, &datastore) + .instance_id(instance_id.into_untyped_uuid()) + .lookup_for(authz::Action::Read) + .await + .expect("test instance should be present in datastore"); + instance_poll_state(cptestctx, &opctx, authz_instance, desired_state).await +} + +pub async fn instance_wait_for_state_by_name( + cptestctx: &ControlPlaneTestContext, + name: &str, + project_name: &str, + desired_state: InstanceState, +) -> InstanceAndActiveVmm { + let nexus = &cptestctx.server.server_context().nexus; + let opctx = test_opctx(&cptestctx); + let instance_selector = + nexus_types::external_api::params::InstanceSelector { + project: Some(project_name.to_string().try_into().unwrap()), + instance: name.to_string().try_into().unwrap(), + }; + + let instance_lookup = + nexus.instance_lookup(&opctx, instance_selector).unwrap(); + let (_, _, authz_instance, ..) = instance_lookup.fetch().await.unwrap(); + + instance_poll_state(cptestctx, &opctx, authz_instance, desired_state).await +} + +async fn instance_poll_state( + cptestctx: &ControlPlaneTestContext, + opctx: &OpContext, + authz_instance: authz::Instance, + desired_state: InstanceState, +) -> InstanceAndActiveVmm { + const MAX_WAIT: Duration = Duration::from_secs(120); + + let datastore = cptestctx.server.server_context().nexus.datastore(); + let log = &cptestctx.logctx.log; + let instance_id = authz_instance.id(); + + info!( + log, + "waiting for instance {instance_id} to transition to {desired_state}..."; + "instance_id" => %instance_id, + ); + let result = poll::wait_for_condition( + || async { + let db_state = datastore + .instance_fetch_with_vmm(&opctx, &authz_instance) + .await + .map_err(poll::CondCheckError::::Failed)?; + + if db_state.instance.runtime().nexus_state == desired_state { + info!( + log, + "instance {instance_id} transitioned to {desired_state}"; + "instance_id" => %instance_id, + "instance" => ?db_state.instance(), + "active_vmm" => ?db_state.vmm(), + ); + Ok(db_state) + } else { + info!( + log, + "instance {instance_id} has not yet transitioned to {desired_state}"; + "instance_id" => %instance_id, + "instance" => ?db_state.instance(), + "active_vmm" => ?db_state.vmm(), + ); + Err(poll::CondCheckError::::NotYet) + } + }, + &Duration::from_secs(1), + &MAX_WAIT, + ) + .await; + + match result { + Ok(i) => i, + Err(e) => panic!( + "instance {instance_id} did not transition to {desired_state} \ + after {MAX_WAIT:?}: {e}" + ), + } +} + pub async fn no_virtual_provisioning_resource_records_exist( cptestctx: &ControlPlaneTestContext, ) -> bool { + count_virtual_provisioning_resource_records(cptestctx).await == 0 +} + +pub async fn count_virtual_provisioning_resource_records( + cptestctx: &ControlPlaneTestContext, +) -> usize { use nexus_db_queries::db::model::VirtualProvisioningResource; use nexus_db_queries::db::schema::virtual_provisioning_resource::dsl; @@ -198,7 +388,7 @@ pub async fn no_virtual_provisioning_resource_records_exist( let conn = datastore.pool_connection_for_tests().await.unwrap(); datastore - .transaction_retry_wrapper("no_virtual_provisioning_resource_records_exist") + .transaction_retry_wrapper("count_virtual_provisioning_resource_records") .transaction(&conn, |conn| async move { conn .batch_execute_async(nexus_test_utils::db::ALLOW_FULL_TABLE_SCAN_SQL) @@ -212,7 +402,7 @@ pub async fn no_virtual_provisioning_resource_records_exist( .get_results_async::(&conn) .await .unwrap() - .is_empty() + .len() ) }).await.unwrap() } @@ -220,6 +410,14 @@ pub async fn no_virtual_provisioning_resource_records_exist( pub async fn no_virtual_provisioning_collection_records_using_instances( cptestctx: &ControlPlaneTestContext, ) -> bool { + count_virtual_provisioning_collection_records_using_instances(cptestctx) + .await + == 0 +} + +pub async fn count_virtual_provisioning_collection_records_using_instances( + cptestctx: &ControlPlaneTestContext, +) -> usize { use nexus_db_queries::db::model::VirtualProvisioningCollection; use nexus_db_queries::db::schema::virtual_provisioning_collection::dsl; @@ -228,7 +426,7 @@ pub async fn no_virtual_provisioning_collection_records_using_instances( datastore .transaction_retry_wrapper( - "no_virtual_provisioning_collection_records_using_instances", + "count_virtual_provisioning_collection_records_using_instances", ) .transaction(&conn, |conn| async move { conn.batch_execute_async( @@ -244,12 +442,70 @@ pub async fn no_virtual_provisioning_collection_records_using_instances( .get_results_async::(&conn) .await .unwrap() + .len()) + }) + .await + .unwrap() +} + +pub async fn no_sled_resource_instance_records_exist( + cptestctx: &ControlPlaneTestContext, +) -> bool { + use nexus_db_queries::db::model::SledResource; + use nexus_db_queries::db::model::SledResourceKind; + use nexus_db_queries::db::schema::sled_resource::dsl; + + let datastore = cptestctx.server.server_context().nexus.datastore(); + let conn = datastore.pool_connection_for_tests().await.unwrap(); + + datastore + .transaction_retry_wrapper("no_sled_resource_instance_records_exist") + .transaction(&conn, |conn| async move { + conn.batch_execute_async( + nexus_test_utils::db::ALLOW_FULL_TABLE_SCAN_SQL, + ) + .await + .unwrap(); + + Ok(dsl::sled_resource + .filter(dsl::kind.eq(SledResourceKind::Instance)) + .select(SledResource::as_select()) + .get_results_async::(&conn) + .await + .unwrap() .is_empty()) }) .await .unwrap() } +pub async fn sled_resources_exist_for_vmm( + cptestctx: &ControlPlaneTestContext, + vmm_id: PropolisUuid, +) -> bool { + use nexus_db_queries::db::model::SledResource; + use nexus_db_queries::db::model::SledResourceKind; + use nexus_db_queries::db::schema::sled_resource::dsl; + + let datastore = cptestctx.server.server_context().nexus.datastore(); + let conn = datastore.pool_connection_for_tests().await.unwrap(); + + let results = dsl::sled_resource + .filter(dsl::kind.eq(SledResourceKind::Instance)) + .filter(dsl::id.eq(vmm_id.into_untyped_uuid())) + .select(SledResource::as_select()) + .load_async(&*conn) + .await + .unwrap(); + info!( + cptestctx.logctx.log, + "queried sled reservation records for VMM"; + "vmm_id" => %vmm_id, + "results" => ?results, + ); + !results.is_empty() +} + /// Tests that the saga described by `dag` succeeds if each of its nodes is /// repeated. /// @@ -532,3 +788,51 @@ pub(crate) async fn assert_no_failed_undo_steps( assert!(saga_node_events.is_empty()); } + +pub(crate) async fn add_sleds( + cptestctx: &ControlPlaneTestContext, + num_sleds: usize, +) -> Vec<(SledUuid, omicron_sled_agent::sim::Server)> { + let mut sas = Vec::with_capacity(num_sleds); + for _ in 0..num_sleds { + let sa_id = SledUuid::new_v4(); + let log = cptestctx.logctx.log.new(o!("sled_id" => sa_id.to_string())); + let addr = cptestctx.server.get_http_server_internal_address().await; + + info!(&cptestctx.logctx.log, "Adding simulated sled"; "sled_id" => %sa_id); + let update_dir = Utf8Path::new("/should/be/unused"); + let sa = start_sled_agent( + log, + addr, + sa_id, + &update_dir, + omicron_sled_agent::sim::SimMode::Explicit, + ) + .await + .unwrap(); + sas.push((sa_id, sa)); + } + + sas +} + +pub(crate) fn select_first_alternate_sled( + db_vmm: &crate::app::db::model::Vmm, + other_sleds: &[(SledUuid, omicron_sled_agent::sim::Server)], +) -> SledUuid { + let default_sled_uuid: SledUuid = + nexus_test_utils::SLED_AGENT_UUID.parse().unwrap(); + if other_sleds.is_empty() { + panic!("need at least one other sled"); + } + + if other_sleds.iter().any(|sled| sled.0 == default_sled_uuid) { + panic!("default test sled agent was in other_sleds"); + } + + if db_vmm.sled_id == default_sled_uuid.into_untyped_uuid() { + other_sleds[0].0 + } else { + default_sled_uuid + } +} diff --git a/nexus/src/internal_api/http_entrypoints.rs b/nexus/src/internal_api/http_entrypoints.rs index 28ff712c24..33b626a7fc 100644 --- a/nexus/src/internal_api/http_entrypoints.rs +++ b/nexus/src/internal_api/http_entrypoints.rs @@ -177,7 +177,7 @@ impl NexusInternalApi for NexusInternalApiImpl { nexus .notify_instance_updated( &opctx, - &InstanceUuid::from_untyped_uuid(path.instance_id), + InstanceUuid::from_untyped_uuid(path.instance_id), &new_state, ) .await?; diff --git a/nexus/tests/config.test.toml b/nexus/tests/config.test.toml index e231f665fa..8f65a73204 100644 --- a/nexus/tests/config.test.toml +++ b/nexus/tests/config.test.toml @@ -124,6 +124,19 @@ v2p_mapping_propagation.period_secs = 30 abandoned_vmm_reaper.period_secs = 60 saga_recovery.period_secs = 600 lookup_region_port.period_secs = 60 +# The purpose of the `instance-updater` background task is to ensure that update +# sagas are always *eventually* started for instances whose database state has +# changed, even if the update saga was not started by the Nexus replica handling +# an update from sled-agent. This is to ensure that updates are performed even +# in cases where a Nexus crashes or otherwise disappears between when the +# updated VMM and migration state is written to CRDB and when the resulting +# update saga actually starts executing. However, we would prefer update sagas +# to be executed in a timely manner, so for integration tests, we don't want to +# *rely* on the instance-updater background task for running these sagas. +# +# Therefore, disable the background task during tests. +instance_updater.disable = true +instance_updater.period_secs = 60 [default_region_allocation_strategy] # we only have one sled in the test environment, so we need to use the diff --git a/nexus/tests/integration_tests/disks.rs b/nexus/tests/integration_tests/disks.rs index ded4a346fb..234ab5f382 100644 --- a/nexus/tests/integration_tests/disks.rs +++ b/nexus/tests/integration_tests/disks.rs @@ -4,6 +4,7 @@ //! Tests basic disk support in the API +use super::instances::instance_wait_for_state; use super::metrics::{get_latest_silo_metric, query_for_metrics}; use chrono::Utc; use dropshot::test_util::ClientTestContext; @@ -37,6 +38,7 @@ use omicron_common::api::external::Disk; use omicron_common::api::external::DiskState; use omicron_common::api::external::IdentityMetadataCreateParams; use omicron_common::api::external::Instance; +use omicron_common::api::external::InstanceState; use omicron_common::api::external::Name; use omicron_common::api::external::NameOrId; use omicron_nexus::app::{MAX_DISK_SIZE_BYTES, MIN_DISK_SIZE_BYTES}; @@ -236,18 +238,15 @@ async fn test_disk_create_attach_detach_delete( // Create an instance to attach the disk. let instance = create_instance(&client, PROJECT_NAME, INSTANCE_NAME).await; + let instance_id = InstanceUuid::from_untyped_uuid(instance.identity.id); // TODO(https://github.com/oxidecomputer/omicron/issues/811): // // Instances must be stopped before disks can be attached - this // is an artificial limitation without hotplug support. - let instance_next = - set_instance_state(&client, INSTANCE_NAME, "stop").await; - instance_simulate( - nexus, - &InstanceUuid::from_untyped_uuid(instance_next.identity.id), - ) - .await; + set_instance_state(&client, INSTANCE_NAME, "stop").await; + instance_simulate(nexus, &instance_id).await; + instance_wait_for_state(client, instance_id, InstanceState::Stopped).await; // Verify that there are no disks attached to the instance, and specifically // that our disk is not attached to this instance. @@ -395,6 +394,8 @@ async fn test_disk_slot_assignment(cptestctx: &ControlPlaneTestContext) { let instance_id = InstanceUuid::from_untyped_uuid(instance.identity.id); set_instance_state(&client, INSTANCE_NAME, "stop").await; instance_simulate(nexus, &instance_id).await; + instance_wait_for_state(&client, instance_id, InstanceState::Stopped).await; + let url_instance_disks = get_instance_disks_url(instance.identity.name.as_str()); let listed_disks = disks_list(&client, &url_instance_disks).await; @@ -504,6 +505,7 @@ async fn test_disk_move_between_instances(cptestctx: &ControlPlaneTestContext) { // is an artificial limitation without hotplug support. set_instance_state(&client, INSTANCE_NAME, "stop").await; instance_simulate(nexus, &instance_id).await; + instance_wait_for_state(&client, instance_id, InstanceState::Stopped).await; // Verify that there are no disks attached to the instance, and specifically // that our disk is not attached to this instance. @@ -541,6 +543,8 @@ async fn test_disk_move_between_instances(cptestctx: &ControlPlaneTestContext) { let instance2_id = InstanceUuid::from_untyped_uuid(instance2.identity.id); set_instance_state(&client, "instance2", "stop").await; instance_simulate(nexus, &instance2_id).await; + instance_wait_for_state(&client, instance2_id, InstanceState::Stopped) + .await; let url_instance2_attach_disk = get_disk_attach_url(&instance2.identity.id.into()); diff --git a/nexus/tests/integration_tests/external_ips.rs b/nexus/tests/integration_tests/external_ips.rs index 2789318855..0940c8675b 100644 --- a/nexus/tests/integration_tests/external_ips.rs +++ b/nexus/tests/integration_tests/external_ips.rs @@ -9,6 +9,7 @@ use std::net::Ipv4Addr; use crate::integration_tests::instances::fetch_instance_external_ips; use crate::integration_tests::instances::instance_simulate; +use crate::integration_tests::instances::instance_wait_for_state; use dropshot::test_util::ClientTestContext; use dropshot::HttpErrorResponseBody; use http::Method; @@ -47,6 +48,7 @@ use omicron_common::api::external::IdentityMetadataCreateParams; use omicron_common::api::external::IdentityMetadataUpdateParams; use omicron_common::api::external::Instance; use omicron_common::api::external::InstanceCpuCount; +use omicron_common::api::external::InstanceState; use omicron_common::api::external::Name; use omicron_common::api::external::NameOrId; use omicron_uuid_kinds::GenericUuid; @@ -696,6 +698,7 @@ async fn test_floating_ip_create_attachment( .unwrap(); instance_simulate(nexus, &instance_id).await; + instance_wait_for_state(client, instance_id, InstanceState::Stopped).await; NexusRequest::object_delete( &client, diff --git a/nexus/tests/integration_tests/instances.rs b/nexus/tests/integration_tests/instances.rs index 9c965ccf8a..2e41fac3a4 100644 --- a/nexus/tests/integration_tests/instances.rs +++ b/nexus/tests/integration_tests/instances.rs @@ -421,8 +421,9 @@ async fn test_instances_create_reboot_halt( let instance = instance_next; instance_simulate(nexus, &instance_id).await; - let instance_next = instance_get(&client, &instance_url).await; - assert_eq!(instance_next.runtime.run_state, InstanceState::Stopped); + let instance_next = + instance_wait_for_state(client, instance_id, InstanceState::Stopped) + .await; assert!( instance_next.runtime.time_run_state_updated > instance.runtime.time_run_state_updated @@ -516,8 +517,9 @@ async fn test_instances_create_reboot_halt( // assert_eq!(error.message, "cannot reboot instance in state \"stopping\""); let instance = instance_next; instance_simulate(nexus, &instance_id).await; - let instance_next = instance_get(&client, &instance_url).await; - assert_eq!(instance_next.runtime.run_state, InstanceState::Stopped); + let instance_next = + instance_wait_for_state(client, instance_id, InstanceState::Stopped) + .await; assert!( instance_next.runtime.time_run_state_updated > instance.runtime.time_run_state_updated @@ -629,8 +631,7 @@ async fn test_instance_start_creates_networking_state( instance_simulate(nexus, &instance_id).await; instance_post(&client, instance_name, InstanceOp::Stop).await; instance_simulate(nexus, &instance_id).await; - let instance = instance_get(&client, &instance_url).await; - assert_eq!(instance.runtime.run_state, InstanceState::Stopped); + instance_wait_for_state(client, instance_id, InstanceState::Stopped).await; // Forcibly clear the instance's V2P mappings to simulate what happens when // the control plane comes up when an instance is stopped. @@ -837,18 +838,56 @@ async fn test_instance_migrate(cptestctx: &ControlPlaneTestContext) { assert_eq!(migration.target_state, MigrationState::Pending.into()); assert_eq!(migration.source_state, MigrationState::Pending.into()); - // Explicitly simulate the migration action on the target. Simulated - // migrations always succeed. The state transition on the target is - // sufficient to move the instance back into a Running state (strictly - // speaking no further updates from the source are required if the target - // successfully takes over). - instance_simulate_on_sled(cptestctx, nexus, dst_sled_id, instance_id).await; - // Ensure that both sled agents report that the migration has completed. - instance_simulate_on_sled(cptestctx, nexus, original_sled, instance_id) + // Simulate the migration. We will use `instance_single_step_on_sled` to + // single-step both sled-agents through the migration state machine and + // ensure that the migration state looks nice at each step. + instance_simulate_migration_source( + cptestctx, + nexus, + original_sled, + instance_id, + migration_id, + ) + .await; + + // Move source to "migrating". + instance_single_step_on_sled(cptestctx, nexus, original_sled, instance_id) + .await; + instance_single_step_on_sled(cptestctx, nexus, original_sled, instance_id) .await; + let migration = dbg!(migration_fetch(cptestctx, migration_id).await); + assert_eq!(migration.source_state, MigrationState::InProgress.into()); + assert_eq!(migration.target_state, MigrationState::Pending.into()); let instance = instance_get(&client, &instance_url).await; - assert_eq!(instance.runtime.run_state, InstanceState::Running); + assert_eq!(instance.runtime.run_state, InstanceState::Migrating); + + // Move target to "migrating". + instance_single_step_on_sled(cptestctx, nexus, dst_sled_id, instance_id) + .await; + instance_single_step_on_sled(cptestctx, nexus, dst_sled_id, instance_id) + .await; + + let migration = dbg!(migration_fetch(cptestctx, migration_id).await); + assert_eq!(migration.source_state, MigrationState::InProgress.into()); + assert_eq!(migration.target_state, MigrationState::InProgress.into()); + let instance = instance_get(&client, &instance_url).await; + assert_eq!(instance.runtime.run_state, InstanceState::Migrating); + + // Move the source to "completed" + instance_simulate_on_sled(cptestctx, nexus, original_sled, instance_id) + .await; + + let migration = dbg!(migration_fetch(cptestctx, migration_id).await); + assert_eq!(migration.source_state, MigrationState::Completed.into()); + assert_eq!(migration.target_state, MigrationState::InProgress.into()); + let instance = dbg!(instance_get(&client, &instance_url).await); + assert_eq!(instance.runtime.run_state, InstanceState::Migrating); + + // Move the target to "completed". + instance_simulate_on_sled(cptestctx, nexus, dst_sled_id, instance_id).await; + + instance_wait_for_state(&client, instance_id, InstanceState::Running).await; let current_sled = nexus .instance_sled_id(&instance_id) @@ -973,9 +1012,40 @@ async fn test_instance_migrate_v2p_and_routes( .parsed_body::() .unwrap(); + let migration_id = { + let datastore = apictx.nexus.datastore(); + let opctx = OpContext::for_tests( + cptestctx.logctx.log.new(o!()), + datastore.clone(), + ); + let (.., authz_instance) = LookupPath::new(&opctx, &datastore) + .instance_id(instance.identity.id) + .lookup_for(nexus_db_queries::authz::Action::Read) + .await + .unwrap(); + datastore + .instance_refetch(&opctx, &authz_instance) + .await + .unwrap() + .runtime_state + .migration_id + .expect("since we've started a migration, the instance record must have a migration id!") + }; + + // Tell both sled-agents to pretend to do the migration. + instance_simulate_migration_source( + cptestctx, + nexus, + original_sled_id, + instance_id, + migration_id, + ) + .await; + instance_simulate_on_sled(cptestctx, nexus, original_sled_id, instance_id) + .await; instance_simulate_on_sled(cptestctx, nexus, dst_sled_id, instance_id).await; - let instance = instance_get(&client, &instance_url).await; - assert_eq!(instance.runtime.run_state, InstanceState::Running); + instance_wait_for_state(&client, instance_id, InstanceState::Running).await; + let current_sled = nexus .instance_sled_id(&instance_id) .await @@ -1186,9 +1256,7 @@ async fn test_instance_metrics(cptestctx: &ControlPlaneTestContext) { instance_post(&client, instance_name, InstanceOp::Stop).await; let instance_id = InstanceUuid::from_untyped_uuid(instance.identity.id); instance_simulate(nexus, &instance_id).await; - let instance = - instance_get(&client, &get_instance_url(&instance_name)).await; - assert_eq!(instance.runtime.run_state, InstanceState::Stopped); + instance_wait_for_state(client, instance_id, InstanceState::Stopped).await; let virtual_provisioning_collection = datastore .virtual_provisioning_collection_get(&opctx, project_id) @@ -1328,14 +1396,54 @@ async fn test_instance_metrics_with_migration( .parsed_body::() .unwrap(); + let migration_id = { + let datastore = apictx.nexus.datastore(); + let opctx = OpContext::for_tests( + cptestctx.logctx.log.new(o!()), + datastore.clone(), + ); + let (.., authz_instance) = LookupPath::new(&opctx, &datastore) + .instance_id(instance.identity.id) + .lookup_for(nexus_db_queries::authz::Action::Read) + .await + .unwrap(); + datastore + .instance_refetch(&opctx, &authz_instance) + .await + .unwrap() + .runtime_state + .migration_id + .expect("since we've started a migration, the instance record must have a migration id!") + }; + + // Wait for the instance to be in the `Migrating` state. Otherwise, the + // subsequent `instance_wait_for_state(..., Running)` may see the `Running` + // state from the *old* VMM, rather than waiting for the migration to + // complete. + instance_simulate_migration_source( + cptestctx, + nexus, + original_sled, + instance_id, + migration_id, + ) + .await; + instance_single_step_on_sled(cptestctx, nexus, original_sled, instance_id) + .await; + instance_single_step_on_sled(cptestctx, nexus, dst_sled_id, instance_id) + .await; + instance_wait_for_state(&client, instance_id, InstanceState::Migrating) + .await; + check_provisioning_state(4, 1).await; // Complete migration on the target. Simulated migrations always succeed. // After this the instance should be running and should continue to appear // to be provisioned. + instance_simulate_on_sled(cptestctx, nexus, original_sled, instance_id) + .await; instance_simulate_on_sled(cptestctx, nexus, dst_sled_id, instance_id).await; - let instance = instance_get(&client, &instance_url).await; - assert_eq!(instance.runtime.run_state, InstanceState::Running); + instance_wait_for_state(&client, instance_id, InstanceState::Running).await; check_provisioning_state(4, 1).await; @@ -1347,9 +1455,7 @@ async fn test_instance_metrics_with_migration( // logical states of instances ignoring migration). instance_post(&client, instance_name, InstanceOp::Stop).await; instance_simulate(nexus, &instance_id).await; - let instance = - instance_get(&client, &get_instance_url(&instance_name)).await; - assert_eq!(instance.runtime.run_state, InstanceState::Stopped); + instance_wait_for_state(&client, instance_id, InstanceState::Stopped).await; check_provisioning_state(0, 0).await; } @@ -1449,8 +1555,7 @@ async fn test_instances_delete_fails_when_running_succeeds_when_stopped( // Stop the instance instance_post(&client, instance_name, InstanceOp::Stop).await; instance_simulate(nexus, &instance_id).await; - let instance = instance_get(&client, &instance_url).await; - assert_eq!(instance.runtime.run_state, InstanceState::Stopped); + instance_wait_for_state(&client, instance_id, InstanceState::Stopped).await; // Now deletion should succeed. NexusRequest::object_delete(&client, &instance_url) @@ -2051,6 +2156,7 @@ async fn test_instance_create_delete_network_interface( let instance = instance_post(client, instance_name, InstanceOp::Stop).await; let instance_id = InstanceUuid::from_untyped_uuid(instance.identity.id); instance_simulate(nexus, &instance_id).await; + instance_wait_for_state(client, instance_id, InstanceState::Stopped).await; // Verify we can now make the requests again let mut interfaces = Vec::with_capacity(2); @@ -2120,6 +2226,7 @@ async fn test_instance_create_delete_network_interface( // Stop the instance and verify we can delete the interface instance_post(client, instance_name, InstanceOp::Stop).await; instance_simulate(nexus, &instance_id).await; + instance_wait_for_state(client, instance_id, InstanceState::Stopped).await; // We should not be able to delete the primary interface, while the // secondary still exists @@ -2258,6 +2365,7 @@ async fn test_instance_update_network_interfaces( let instance = instance_post(client, instance_name, InstanceOp::Stop).await; let instance_id = InstanceUuid::from_untyped_uuid(instance.identity.id); instance_simulate(nexus, &instance_id).await; + instance_wait_for_state(client, instance_id, InstanceState::Stopped).await; // Create the first interface on the instance. let primary_iface = NexusRequest::objects_post( @@ -2318,6 +2426,8 @@ async fn test_instance_update_network_interfaces( // Stop the instance again, and now verify that the update works. instance_post(client, instance_name, InstanceOp::Stop).await; instance_simulate(nexus, &instance_id).await; + instance_wait_for_state(client, instance_id, InstanceState::Stopped).await; + let updated_primary_iface = NexusRequest::object_put( client, &format!("/v1/network-interfaces/{}", primary_iface.identity.id), @@ -2451,6 +2561,7 @@ async fn test_instance_update_network_interfaces( // Stop the instance again. instance_post(client, instance_name, InstanceOp::Stop).await; instance_simulate(nexus, &instance_id).await; + instance_wait_for_state(client, instance_id, InstanceState::Stopped).await; // Verify that we can set the secondary as the new primary, and that nothing // else changes about the NICs. @@ -3231,8 +3342,7 @@ async fn test_disks_detached_when_instance_destroyed( instance_post(&client, instance_name, InstanceOp::Stop).await; instance_simulate(nexus, &instance_id).await; - let instance = instance_get(&client, &instance_url).await; - assert_eq!(instance.runtime.run_state, InstanceState::Stopped); + instance_wait_for_state(&client, instance_id, InstanceState::Stopped).await; NexusRequest::object_delete(&client, &instance_url) .authn_as(AuthnMode::PrivilegedUser) @@ -3750,6 +3860,8 @@ async fn test_cannot_provision_instance_beyond_cpu_capacity( instance_simulate(nexus, &instance_id).await; instances[1] = instance_post(client, configs[1].0, InstanceOp::Stop).await; instance_simulate(nexus, &instance_id).await; + instance_wait_for_state(client, instance_id, InstanceState::Stopped).await; + expect_instance_start_ok(client, configs[2].0).await; } @@ -3857,6 +3969,8 @@ async fn test_cannot_provision_instance_beyond_ram_capacity( instance_simulate(nexus, &instance_id).await; instances[1] = instance_post(client, configs[1].0, InstanceOp::Stop).await; instance_simulate(nexus, &instance_id).await; + instance_wait_for_state(client, instance_id, InstanceState::Stopped).await; + expect_instance_start_ok(client, configs[2].0).await; } @@ -3979,8 +4093,9 @@ async fn test_instance_serial(cptestctx: &ControlPlaneTestContext) { let instance = instance_next; instance_simulate(nexus, &instance_id).await; - let instance_next = instance_get(&client, &instance_url).await; - assert_eq!(instance_next.runtime.run_state, InstanceState::Stopped); + let instance_next = + instance_wait_for_state(&client, instance_id, InstanceState::Stopped) + .await; assert!( instance_next.runtime.time_run_state_updated > instance.runtime.time_run_state_updated @@ -4146,12 +4261,10 @@ async fn stop_and_delete_instance( let client = &cptestctx.external_client; let instance = instance_post(&client, instance_name, InstanceOp::Stop).await; + let instance_id = InstanceUuid::from_untyped_uuid(instance.identity.id); let nexus = &cptestctx.server.server_context().nexus; - instance_simulate( - nexus, - &InstanceUuid::from_untyped_uuid(instance.identity.id), - ) - .await; + instance_simulate(nexus, &instance_id).await; + instance_wait_for_state(client, instance_id, InstanceState::Stopped).await; let url = format!("/v1/instances/{}?project={}", instance_name, PROJECT_NAME); object_delete(client, &url).await; @@ -4577,6 +4690,13 @@ async fn test_instance_create_in_silo(cptestctx: &ControlPlaneTestContext) { .expect("Failed to stop the instance"); instance_simulate_with_opctx(nexus, &instance_id, &opctx).await; + instance_wait_for_state_as( + client, + AuthnMode::SiloUser(user_id), + instance_id, + InstanceState::Stopped, + ) + .await; // Delete the instance NexusRequest::object_delete(client, &instance_url) @@ -4664,6 +4784,7 @@ async fn test_instance_v2p_mappings(cptestctx: &ControlPlaneTestContext) { instance_simulate(nexus, &instance_id).await; instance_post(&client, instance_name, InstanceOp::Stop).await; instance_simulate(nexus, &instance_id).await; + instance_wait_for_state(client, instance_id, InstanceState::Stopped).await; let instance_url = get_instance_url(instance_name); NexusRequest::object_delete(client, &instance_url) @@ -4730,6 +4851,73 @@ pub enum InstanceOp { Reboot, } +pub async fn instance_wait_for_state( + client: &ClientTestContext, + instance_id: InstanceUuid, + state: omicron_common::api::external::InstanceState, +) -> Instance { + instance_wait_for_state_as( + client, + AuthnMode::PrivilegedUser, + instance_id, + state, + ) + .await +} + +/// Line [`instance_wait_for_state`], but with an [`AuthnMode`] parameter for +/// the instance lookup requests. +pub async fn instance_wait_for_state_as( + client: &ClientTestContext, + authn_as: AuthnMode, + instance_id: InstanceUuid, + state: omicron_common::api::external::InstanceState, +) -> Instance { + const MAX_WAIT: Duration = Duration::from_secs(120); + + slog::info!( + &client.client_log, + "waiting for instance {instance_id} to transition to {state}..."; + ); + let url = format!("/v1/instances/{instance_id}"); + let result = wait_for_condition( + || async { + let instance: Instance = NexusRequest::object_get(client, &url) + .authn_as(authn_as.clone()) + .execute() + .await? + .parsed_body()?; + if instance.runtime.run_state == state { + Ok(instance) + } else { + slog::info!( + &client.client_log, + "instance {instance_id} has not transitioned to {state}"; + "instance_id" => %instance.identity.id, + "instance_runtime_state" => ?instance.runtime, + ); + Err(CondCheckError::::NotYet) + } + }, + &Duration::from_secs(1), + &MAX_WAIT, + ) + .await; + match result { + Ok(instance) => { + slog::info!( + &client.client_log, + "instance {instance_id} has transitioned to {state}" + ); + instance + } + Err(e) => panic!( + "instance {instance_id} did not transition to {state:?} \ + after {MAX_WAIT:?}: {e}" + ), + } +} + pub async fn instance_post( client: &ClientTestContext, instance_name: &str, @@ -4896,6 +5084,22 @@ pub async fn instance_simulate(nexus: &Arc, id: &InstanceUuid) { sa.instance_finish_transition(id.into_untyped_uuid()).await; } +/// Simulate one step of an ongoing instance state transition. To do this, we +/// have to look up the instance, then get the sled agent associated with that +/// instance, and then tell it to finish simulating whatever async transition is +/// going on. +async fn instance_single_step_on_sled( + cptestctx: &ControlPlaneTestContext, + nexus: &Arc, + sled_id: SledUuid, + instance_id: InstanceUuid, +) { + info!(&cptestctx.logctx.log, "Single-stepping simulated instance on sled"; + "instance_id" => %instance_id, "sled_id" => %sled_id); + let sa = nexus.sled_client(&sled_id).await.unwrap(); + sa.instance_single_step(instance_id.into_untyped_uuid()).await; +} + pub async fn instance_simulate_with_opctx( nexus: &Arc, id: &InstanceUuid, @@ -4923,3 +5127,30 @@ async fn instance_simulate_on_sled( let sa = nexus.sled_client(&sled_id).await.unwrap(); sa.instance_finish_transition(instance_id.into_untyped_uuid()).await; } + +/// Simulates a migration source for the provided instance ID, sled ID, and +/// migration ID. +async fn instance_simulate_migration_source( + cptestctx: &ControlPlaneTestContext, + nexus: &Arc, + sled_id: SledUuid, + instance_id: InstanceUuid, + migration_id: Uuid, +) { + info!( + &cptestctx.logctx.log, + "Simulating migration source sled"; + "instance_id" => %instance_id, + "sled_id" => %sled_id, + "migration_id" => %migration_id, + ); + let sa = nexus.sled_client(&sled_id).await.unwrap(); + sa.instance_simulate_migration_source( + instance_id.into_untyped_uuid(), + sled_agent_client::SimulateMigrationSource { + migration_id, + result: sled_agent_client::SimulatedMigrationResult::Success, + }, + ) + .await; +} diff --git a/nexus/tests/integration_tests/ip_pools.rs b/nexus/tests/integration_tests/ip_pools.rs index d044eb735c..e872cc6fe3 100644 --- a/nexus/tests/integration_tests/ip_pools.rs +++ b/nexus/tests/integration_tests/ip_pools.rs @@ -6,6 +6,7 @@ use std::net::Ipv4Addr; +use crate::integration_tests::instances::instance_wait_for_state; use dropshot::test_util::ClientTestContext; use dropshot::HttpErrorResponseBody; use dropshot::ResultsPage; @@ -54,6 +55,7 @@ use nexus_types::external_api::views::SiloIpPool; use nexus_types::identity::Resource; use omicron_common::address::Ipv6Range; use omicron_common::api::external::IdentityMetadataUpdateParams; +use omicron_common::api::external::InstanceState; use omicron_common::api::external::LookupType; use omicron_common::api::external::NameOrId; use omicron_common::api::external::SimpleIdentity; @@ -1348,6 +1350,7 @@ async fn test_ip_range_delete_with_allocated_external_ip_fails( .unwrap() .expect("running instance should be on a sled"); sa.instance_finish_transition(instance.identity.id).await; + instance_wait_for_state(client, instance_id, InstanceState::Stopped).await; // Delete the instance NexusRequest::object_delete(client, &instance_url) diff --git a/nexus/tests/integration_tests/pantry.rs b/nexus/tests/integration_tests/pantry.rs index 29e590b1a9..d77ad49db6 100644 --- a/nexus/tests/integration_tests/pantry.rs +++ b/nexus/tests/integration_tests/pantry.rs @@ -4,6 +4,7 @@ //! Tests Nexus' interactions with Crucible's pantry +use crate::integration_tests::instances::instance_wait_for_state; use dropshot::test_util::ClientTestContext; use http::method::Method; use http::StatusCode; @@ -24,6 +25,7 @@ use omicron_common::api::external::Disk; use omicron_common::api::external::DiskState; use omicron_common::api::external::IdentityMetadataCreateParams; use omicron_common::api::external::Instance; +use omicron_common::api::external::InstanceState; use omicron_nexus::Nexus; use omicron_nexus::TestInterfaces as _; use omicron_uuid_kinds::GenericUuid; @@ -157,6 +159,7 @@ async fn create_instance_and_attach_disk( // is an artificial limitation without hotplug support. set_instance_state(&client, INSTANCE_NAME, "stop").await; instance_simulate(nexus, &instance_id).await; + instance_wait_for_state(&client, instance_id, InstanceState::Stopped).await; let url_instance_attach_disk = get_disk_attach_url(instance.identity.name.as_str()); diff --git a/nexus/tests/integration_tests/vpc_subnets.rs b/nexus/tests/integration_tests/vpc_subnets.rs index b12c43aecc..f063c7e9a2 100644 --- a/nexus/tests/integration_tests/vpc_subnets.rs +++ b/nexus/tests/integration_tests/vpc_subnets.rs @@ -4,6 +4,7 @@ use crate::integration_tests::instances::instance_post; use crate::integration_tests::instances::instance_simulate; +use crate::integration_tests::instances::instance_wait_for_state; use crate::integration_tests::instances::InstanceOp; use dropshot::HttpErrorResponseBody; use http::method::Method; @@ -20,6 +21,7 @@ use nexus_test_utils_macros::nexus_test; use nexus_types::external_api::{params, views::VpcSubnet}; use omicron_common::api::external::IdentityMetadataCreateParams; use omicron_common::api::external::IdentityMetadataUpdateParams; +use omicron_common::api::external::InstanceState; use omicron_common::api::external::Ipv6NetExt; use omicron_uuid_kinds::GenericUuid; use omicron_uuid_kinds::InstanceUuid; @@ -80,6 +82,7 @@ async fn test_delete_vpc_subnet_with_interfaces_fails( // Stop and then delete the instance instance_post(client, instance_name, InstanceOp::Stop).await; instance_simulate(&nexus, &instance_id).await; + instance_wait_for_state(client, instance_id, InstanceState::Stopped).await; NexusRequest::object_delete(&client, &instance_url) .authn_as(AuthnMode::PrivilegedUser) .execute() diff --git a/openapi/nexus-internal.json b/openapi/nexus-internal.json index 912ccbcf00..7e4d6e6c02 100644 --- a/openapi/nexus-internal.json +++ b/openapi/nexus-internal.json @@ -3183,53 +3183,6 @@ } ] }, - "InstanceRuntimeState": { - "description": "The dynamic runtime properties of an instance: its current VMM ID (if any), migration information (if any), and the instance state to report if there is no active VMM.", - "type": "object", - "properties": { - "dst_propolis_id": { - "nullable": true, - "description": "If a migration is active, the ID of the target VMM.", - "allOf": [ - { - "$ref": "#/components/schemas/TypedUuidForPropolisKind" - } - ] - }, - "gen": { - "description": "Generation number for this state.", - "allOf": [ - { - "$ref": "#/components/schemas/Generation" - } - ] - }, - "migration_id": { - "nullable": true, - "description": "If a migration is active, the ID of that migration.", - "type": "string", - "format": "uuid" - }, - "propolis_id": { - "nullable": true, - "description": "The instance's currently active VMM ID.", - "allOf": [ - { - "$ref": "#/components/schemas/TypedUuidForPropolisKind" - } - ] - }, - "time_updated": { - "description": "Timestamp for this information.", - "type": "string", - "format": "date-time" - } - }, - "required": [ - "gen", - "time_updated" - ] - }, "IpNet": { "x-rust-type": { "crate": "oxnet", @@ -3471,24 +3424,6 @@ "minLength": 5, "maxLength": 17 }, - "MigrationRole": { - "oneOf": [ - { - "description": "This update concerns the source VMM of a migration.", - "type": "string", - "enum": [ - "source" - ] - }, - { - "description": "This update concerns the target VMM of a migration.", - "type": "string", - "enum": [ - "target" - ] - } - ] - }, "MigrationRuntimeState": { "description": "An update from a sled regarding the state of a migration, indicating the role of the VMM whose migration state was updated.", "type": "object", @@ -3500,9 +3435,6 @@ "type": "string", "format": "uuid" }, - "role": { - "$ref": "#/components/schemas/MigrationRole" - }, "state": { "$ref": "#/components/schemas/MigrationState" }, @@ -3515,7 +3447,6 @@ "required": [ "gen", "migration_id", - "role", "state", "time_updated" ] @@ -4716,17 +4647,18 @@ "description": "A wrapper type containing a sled's total knowledge of the state of a specific VMM and the instance it incarnates.", "type": "object", "properties": { - "instance_state": { - "description": "The sled's conception of the state of the instance.", + "migration_in": { + "nullable": true, + "description": "The current state of any inbound migration to this VMM.", "allOf": [ { - "$ref": "#/components/schemas/InstanceRuntimeState" + "$ref": "#/components/schemas/MigrationRuntimeState" } ] }, - "migration_state": { + "migration_out": { "nullable": true, - "description": "The current state of any in-progress migration for this instance, as understood by this sled.", + "description": "The state of any outbound migration from this VMM.", "allOf": [ { "$ref": "#/components/schemas/MigrationRuntimeState" @@ -4751,7 +4683,6 @@ } }, "required": [ - "instance_state", "propolis_id", "vmm_state" ] diff --git a/openapi/sled-agent.json b/openapi/sled-agent.json index 3e96ab3a0c..ecaff33042 100644 --- a/openapi/sled-agent.json +++ b/openapi/sled-agent.json @@ -419,49 +419,6 @@ } } }, - "/instances/{instance_id}/migration-ids": { - "put": { - "operationId": "instance_put_migration_ids", - "parameters": [ - { - "in": "path", - "name": "instance_id", - "required": true, - "schema": { - "$ref": "#/components/schemas/TypedUuidForInstanceKind" - } - } - ], - "requestBody": { - "content": { - "application/json": { - "schema": { - "$ref": "#/components/schemas/InstancePutMigrationIdsBody" - } - } - }, - "required": true - }, - "responses": { - "200": { - "description": "successful operation", - "content": { - "application/json": { - "schema": { - "$ref": "#/components/schemas/SledInstanceState" - } - } - } - }, - "4XX": { - "$ref": "#/components/responses/Error" - }, - "5XX": { - "$ref": "#/components/responses/Error" - } - } - } - }, "/instances/{instance_id}/state": { "get": { "operationId": "instance_get_state", @@ -3063,23 +3020,6 @@ "silo_id" ] }, - "InstanceMigrationSourceParams": { - "description": "Instance runtime state to update for a migration.", - "type": "object", - "properties": { - "dst_propolis_id": { - "$ref": "#/components/schemas/TypedUuidForPropolisKind" - }, - "migration_id": { - "type": "string", - "format": "uuid" - } - }, - "required": [ - "dst_propolis_id", - "migration_id" - ] - }, "InstanceMigrationTargetParams": { "description": "Parameters used when directing Propolis to initialize itself via live migration.", "type": "object", @@ -3124,32 +3064,6 @@ "ncpus" ] }, - "InstancePutMigrationIdsBody": { - "description": "The body of a request to set or clear the migration identifiers from a sled agent's instance state records.", - "type": "object", - "properties": { - "migration_params": { - "nullable": true, - "description": "The migration identifiers to set. If `None`, this operation clears the migration IDs.", - "allOf": [ - { - "$ref": "#/components/schemas/InstanceMigrationSourceParams" - } - ] - }, - "old_runtime": { - "description": "The last instance runtime state known to this requestor. This request will succeed if either (a) the state generation in the sled agent's runtime state matches the generation in this record, or (b) the sled agent's runtime state matches what would result from applying this request to the caller's runtime state. This latter condition provides idempotency.", - "allOf": [ - { - "$ref": "#/components/schemas/InstanceRuntimeState" - } - ] - } - }, - "required": [ - "old_runtime" - ] - }, "InstancePutStateBody": { "description": "The body of a request to move a previously-ensured instance into a specific runtime state.", "type": "object", @@ -3500,24 +3414,6 @@ "minLength": 5, "maxLength": 17 }, - "MigrationRole": { - "oneOf": [ - { - "description": "This update concerns the source VMM of a migration.", - "type": "string", - "enum": [ - "source" - ] - }, - { - "description": "This update concerns the target VMM of a migration.", - "type": "string", - "enum": [ - "target" - ] - } - ] - }, "MigrationRuntimeState": { "description": "An update from a sled regarding the state of a migration, indicating the role of the VMM whose migration state was updated.", "type": "object", @@ -3529,9 +3425,6 @@ "type": "string", "format": "uuid" }, - "role": { - "$ref": "#/components/schemas/MigrationRole" - }, "state": { "$ref": "#/components/schemas/MigrationState" }, @@ -3544,7 +3437,6 @@ "required": [ "gen", "migration_id", - "role", "state", "time_updated" ] @@ -4615,17 +4507,18 @@ "description": "A wrapper type containing a sled's total knowledge of the state of a specific VMM and the instance it incarnates.", "type": "object", "properties": { - "instance_state": { - "description": "The sled's conception of the state of the instance.", + "migration_in": { + "nullable": true, + "description": "The current state of any inbound migration to this VMM.", "allOf": [ { - "$ref": "#/components/schemas/InstanceRuntimeState" + "$ref": "#/components/schemas/MigrationRuntimeState" } ] }, - "migration_state": { + "migration_out": { "nullable": true, - "description": "The current state of any in-progress migration for this instance, as understood by this sled.", + "description": "The state of any outbound migration from this VMM.", "allOf": [ { "$ref": "#/components/schemas/MigrationRuntimeState" @@ -4650,7 +4543,6 @@ } }, "required": [ - "instance_state", "propolis_id", "vmm_state" ] diff --git a/sled-agent/src/common/instance.rs b/sled-agent/src/common/instance.rs index 0fe2e27698..adbeb9158f 100644 --- a/sled-agent/src/common/instance.rs +++ b/sled-agent/src/common/instance.rs @@ -4,26 +4,26 @@ //! Describes the states of VM instances. -use crate::params::InstanceMigrationSourceParams; use chrono::{DateTime, Utc}; use omicron_common::api::external::Generation; use omicron_common::api::internal::nexus::{ - InstanceRuntimeState, MigrationRole, MigrationRuntimeState, MigrationState, - SledInstanceState, VmmRuntimeState, VmmState, + MigrationRuntimeState, MigrationState, SledInstanceState, VmmRuntimeState, + VmmState, }; use omicron_uuid_kinds::PropolisUuid; use propolis_client::types::{ - InstanceState as PropolisApiState, InstanceStateMonitorResponse, - MigrationState as PropolisMigrationState, + InstanceMigrationStatus, InstanceState as PropolisApiState, + InstanceStateMonitorResponse, MigrationState as PropolisMigrationState, }; +use uuid::Uuid; /// The instance and VMM state that sled agent maintains on a per-VMM basis. #[derive(Clone, Debug)] pub struct InstanceStates { - instance: InstanceRuntimeState, vmm: VmmRuntimeState, propolis_id: PropolisUuid, - migration: Option, + migration_in: Option, + migration_out: Option, } /// Newtype to allow conversion from Propolis API states (returned by the @@ -101,9 +101,8 @@ pub(crate) struct ObservedPropolisState { /// The state reported by Propolis's instance state monitor API. pub vmm_state: PropolisInstanceState, - /// Information about whether the state observer queried migration status at - /// all and, if so, what response it got from Propolis. - pub migration_status: ObservedMigrationStatus, + pub migration_in: Option, + pub migration_out: Option, /// The approximate time at which this observation was made. pub time: DateTime, @@ -111,68 +110,43 @@ pub(crate) struct ObservedPropolisState { impl ObservedPropolisState { /// Constructs a Propolis state observation from an instance's current - /// runtime state and an instance state monitor response received from + /// state and an instance state monitor response received from /// Propolis. - pub fn new( - instance_runtime: &InstanceRuntimeState, - propolis_state: &InstanceStateMonitorResponse, - ) -> Self { - // If there's no migration currently registered with this sled, report - // the current state and that no migration is currently in progress, - // even if Propolis has some migration data to share. (This case arises - // when Propolis returns state from a previous migration that sled agent - // has already retired.) - // - // N.B. This needs to be read from the instance runtime state and not - // the migration runtime state to ensure that, once a migration in - // completes, the "completed" observation is reported to - // `InstanceStates::apply_propolis_observation` exactly once. - // Otherwise that routine will try to apply the "inbound migration - // complete" instance state transition twice. - let Some(migration_id) = instance_runtime.migration_id else { - return Self { - vmm_state: PropolisInstanceState(propolis_state.state), - migration_status: ObservedMigrationStatus::NoMigration, - time: Utc::now(), - }; - }; - - // Sled agent believes a live migration may be in progress. See if - // either of the Propolis migrations corresponds to it. - let propolis_migration = match ( - &propolis_state.migration.migration_in, - &propolis_state.migration.migration_out, - ) { - (Some(inbound), _) if inbound.id == migration_id => inbound, - (_, Some(outbound)) if outbound.id == migration_id => outbound, - _ => { - // Sled agent believes this instance should be migrating, but - // Propolis isn't reporting a matching migration yet, so assume - // the migration is still pending. - return Self { - vmm_state: PropolisInstanceState(propolis_state.state), - migration_status: ObservedMigrationStatus::Pending, - time: Utc::now(), - }; - } - }; - + pub fn new(propolis_state: &InstanceStateMonitorResponse) -> Self { Self { vmm_state: PropolisInstanceState(propolis_state.state), - migration_status: match propolis_migration.state { - PropolisMigrationState::Finish => { - ObservedMigrationStatus::Succeeded - } - PropolisMigrationState::Error => { - ObservedMigrationStatus::Failed - } - _ => ObservedMigrationStatus::InProgress, - }, + migration_in: propolis_state + .migration + .migration_in + .as_ref() + .map(ObservedMigrationState::from), + migration_out: propolis_state + .migration + .migration_out + .as_ref() + .map(ObservedMigrationState::from), time: Utc::now(), } } } +#[derive(Copy, Clone, Debug)] +pub struct ObservedMigrationState { + state: MigrationState, + id: Uuid, +} + +impl From<&'_ InstanceMigrationStatus> for ObservedMigrationState { + fn from(observed: &InstanceMigrationStatus) -> Self { + let state = match observed.state { + PropolisMigrationState::Error => MigrationState::Failed, + PropolisMigrationState::Finish => MigrationState::Completed, + _ => MigrationState::InProgress, + }; + Self { state, id: observed.id } + } +} + /// The set of instance states that sled agent can publish to Nexus. This is /// a subset of the instance states Nexus knows about: the Creating and /// Destroyed states are reserved for Nexus to use for instances that are being @@ -191,20 +165,6 @@ impl From for VmmState { } } -/// The possible roles a VMM can have vis-a-vis an instance. -#[derive(Clone, Copy, Debug, PartialEq)] -enum PropolisRole { - /// The VMM is its instance's current active VMM. - Active, - - /// The VMM is its instance's migration target VMM. - MigrationTarget, - - /// The instance does not refer to this VMM (but it may have done so in the - /// past). - Retired, -} - /// Action to be taken on behalf of state transition. #[derive(Clone, Copy, Debug, PartialEq)] pub enum Action { @@ -214,30 +174,20 @@ pub enum Action { impl InstanceStates { pub fn new( - instance: InstanceRuntimeState, vmm: VmmRuntimeState, propolis_id: PropolisUuid, + migration_id: Option, ) -> Self { - let migration = instance.migration_id.map(|migration_id| { - let dst_propolis_id = instance.dst_propolis_id.expect("if an instance has a migration ID, it should also have a target VMM ID"); - let role = if dst_propolis_id == propolis_id { - MigrationRole::Target - } else { - MigrationRole::Source - }; - MigrationRuntimeState { + // If this instance is created with a migration ID, we are the intended + // target of a migration in. Set that up now. + let migration_in = + migration_id.map(|migration_id| MigrationRuntimeState { migration_id, - state: MigrationState::InProgress, - role, + state: MigrationState::Pending, gen: Generation::new(), time_updated: Utc::now(), - } - }); - InstanceStates { instance, vmm, propolis_id, migration } - } - - pub fn instance(&self) -> &InstanceRuntimeState { - &self.instance + }); + InstanceStates { vmm, propolis_id, migration_in, migration_out: None } } pub fn vmm(&self) -> &VmmRuntimeState { @@ -248,8 +198,12 @@ impl InstanceStates { self.propolis_id } - pub(crate) fn migration(&self) -> Option<&MigrationRuntimeState> { - self.migration.as_ref() + pub fn migration_in(&self) -> Option<&MigrationRuntimeState> { + self.migration_in.as_ref() + } + + pub fn migration_out(&self) -> Option<&MigrationRuntimeState> { + self.migration_out.as_ref() } /// Creates a `SledInstanceState` structure containing the entirety of this @@ -257,28 +211,10 @@ impl InstanceStates { /// use the `instance` or `vmm` accessors instead. pub fn sled_instance_state(&self) -> SledInstanceState { SledInstanceState { - instance_state: self.instance.clone(), vmm_state: self.vmm.clone(), propolis_id: self.propolis_id, - migration_state: self.migration.clone(), - } - } - - fn transition_migration( - &mut self, - state: MigrationState, - time_updated: DateTime, - ) { - let migration = self.migration.as_mut().expect( - "an ObservedMigrationState should only be constructed when the \ - VMM has an active migration", - ); - // Don't generate spurious state updates if the migration is already in - // the state we're transitioning to. - if migration.state != state { - migration.state = state; - migration.time_updated = time_updated; - migration.gen = migration.gen.next(); + migration_in: self.migration_in.clone(), + migration_out: self.migration_out.clone(), } } @@ -288,6 +224,52 @@ impl InstanceStates { &mut self, observed: &ObservedPropolisState, ) -> Option { + fn transition_migration( + current: &mut Option, + ObservedMigrationState { id, state }: ObservedMigrationState, + now: DateTime, + ) { + if let Some(ref mut m) = current { + // Don't generate spurious state updates if the migration is already in + // the state we're transitioning to. + if m.migration_id == id && m.state == state { + return; + } + m.state = state; + if m.migration_id == id { + m.gen = m.gen.next(); + } else { + m.migration_id = id; + m.gen = Generation::new().next(); + } + m.time_updated = now; + } else { + *current = Some(MigrationRuntimeState { + migration_id: id, + // We are creating a new migration record, but the state + // will not be `Pending`, because we've actually gotten a + // migration observation from Propolis. Therefore, we have + // to advance the initial generation once to be ahead of + // what the generation in the database is when Nexus creates + // the initial migration record at generation 1. + gen: Generation::new().next(), + state, + time_updated: now, + }); + } + } + + fn destroy_migration( + migration: &mut MigrationRuntimeState, + now: DateTime, + ) { + if !migration.state.is_terminal() { + migration.gen = migration.gen.next(); + migration.time_updated = now; + migration.state = MigrationState::Failed; + } + } + let vmm_gone = matches!( observed.vmm_state.0, PropolisApiState::Destroyed | PropolisApiState::Failed @@ -303,78 +285,11 @@ impl InstanceStates { // Update the instance record to reflect the result of any completed // migration. - match observed.migration_status { - ObservedMigrationStatus::Succeeded => { - self.transition_migration( - MigrationState::Completed, - observed.time, - ); - match self.propolis_role() { - // This is a successful migration out. Point the instance to the - // target VMM, but don't clear migration IDs; let the target do - // that so that the instance will continue to appear to be - // migrating until it is safe to migrate again. - PropolisRole::Active => { - self.switch_propolis_id_to_target(observed.time); - - assert_eq!(self.propolis_role(), PropolisRole::Retired); - } - - // This is a successful migration in. Point the instance to the - // target VMM and clear migration IDs so that another migration - // in can begin. Propolis will continue reporting that this - // migration was successful, but because its ID has been - // discarded the observed migration status will change from - // Succeeded to NoMigration. - // - // Note that these calls increment the instance's generation - // number twice. This is by design and allows the target's - // migration-ID-clearing update to overtake the source's update. - PropolisRole::MigrationTarget => { - self.switch_propolis_id_to_target(observed.time); - self.clear_migration_ids(observed.time); - - assert_eq!(self.propolis_role(), PropolisRole::Active); - } - - // This is a migration source that previously reported success - // and removed itself from the active Propolis position. Don't - // touch the instance. - PropolisRole::Retired => {} - } - } - ObservedMigrationStatus::Failed => { - self.transition_migration( - MigrationState::Failed, - observed.time, - ); - - match self.propolis_role() { - // This is a failed migration out. CLear migration IDs so that - // Nexus can try again. - PropolisRole::Active => { - self.clear_migration_ids(observed.time); - } - - // This is a failed migration in. Leave the migration IDs alone - // so that the migration won't appear to have concluded until - // the source is ready to start a new one. - PropolisRole::MigrationTarget => {} - - // This VMM was part of a failed migration and was subsequently - // removed from the instance record entirely. There's nothing to - // update. - PropolisRole::Retired => {} - } - } - ObservedMigrationStatus::InProgress => { - self.transition_migration( - MigrationState::InProgress, - observed.time, - ); - } - ObservedMigrationStatus::NoMigration - | ObservedMigrationStatus::Pending => {} + if let Some(m) = observed.migration_in { + transition_migration(&mut self.migration_in, m, observed.time); + } + if let Some(m) = observed.migration_out { + transition_migration(&mut self.migration_out, m, observed.time); } // If this Propolis has exited, tear down its zone. If it was in the @@ -389,19 +304,13 @@ impl InstanceStates { // been transferred to the target, and what was once an active VMM // is now retired.) if vmm_gone { - if self.propolis_role() == PropolisRole::Active { - self.clear_migration_ids(observed.time); - self.retire_active_propolis(observed.time); - } // If there's an active migration and the VMM is suddenly gone, // that should constitute a migration failure! - if let Some(MigrationState::Pending | MigrationState::InProgress) = - self.migration.as_ref().map(|m| m.state) - { - self.transition_migration( - MigrationState::Failed, - observed.time, - ); + if let Some(ref mut m) = self.migration_in { + destroy_migration(m, observed.time); + } + if let Some(ref mut m) = self.migration_out { + destroy_migration(m, observed.time); } Some(Action::Destroy) } else { @@ -409,54 +318,6 @@ impl InstanceStates { } } - /// Yields the role that this structure's VMM has given the structure's - /// current instance state. - fn propolis_role(&self) -> PropolisRole { - if let Some(active_id) = self.instance.propolis_id { - if active_id == self.propolis_id { - return PropolisRole::Active; - } - } - - if let Some(dst_id) = self.instance.dst_propolis_id { - if dst_id == self.propolis_id { - return PropolisRole::MigrationTarget; - } - } - - PropolisRole::Retired - } - - /// Sets the no-VMM fallback state of the current instance to reflect the - /// state of its terminated VMM and clears the instance's current Propolis - /// ID. Note that this routine does not touch any migration IDs. - /// - /// This should only be called by the state block for an active VMM and only - /// when that VMM is in a terminal state (Destroyed or Failed). - fn retire_active_propolis(&mut self, now: DateTime) { - assert!(self.propolis_role() == PropolisRole::Active); - - self.instance.propolis_id = None; - self.instance.gen = self.instance.gen.next(); - self.instance.time_updated = now; - } - - /// Moves the instance's destination Propolis ID into the current active - /// position and updates the generation number, but does not clear the - /// destination ID or the active migration ID. This promotes a migration - /// target VMM into the active position without actually allowing a new - /// migration to begin. - /// - /// This routine should only be called when - /// `instance.dst_propolis_id.is_some()`. - fn switch_propolis_id_to_target(&mut self, now: DateTime) { - assert!(self.instance.dst_propolis_id.is_some()); - - self.instance.propolis_id = self.instance.dst_propolis_id; - self.instance.gen = self.instance.gen.next(); - self.instance.time_updated = now; - } - /// Forcibly transitions this instance's VMM into the specified `next` /// state and updates its generation number. pub(crate) fn transition_vmm( @@ -495,135 +356,29 @@ impl InstanceStates { let fake_observed = ObservedPropolisState { vmm_state, - migration_status: if self.instance.migration_id.is_some() { - ObservedMigrationStatus::Failed - } else { - ObservedMigrationStatus::NoMigration - }, + // We don't actually need to populate these, because observing a + // `Destroyed` instance state will fail any in progress migrations anyway. + migration_in: None, + migration_out: None, time: Utc::now(), }; self.apply_propolis_observation(&fake_observed); } - - /// Sets or clears this instance's migration IDs and advances its Propolis - /// generation number. - pub(crate) fn set_migration_ids( - &mut self, - ids: &Option, - now: DateTime, - ) { - if let Some(InstanceMigrationSourceParams { - migration_id, - dst_propolis_id, - }) = *ids - { - self.instance.migration_id = Some(migration_id); - self.instance.dst_propolis_id = Some(dst_propolis_id); - let role = if dst_propolis_id == self.propolis_id { - MigrationRole::Target - } else { - MigrationRole::Source - }; - self.migration = Some(MigrationRuntimeState { - migration_id, - state: MigrationState::Pending, - role, - gen: Generation::new(), - time_updated: now, - }) - } else { - self.instance.migration_id = None; - self.instance.dst_propolis_id = None; - self.migration = None; - } - - self.instance.gen = self.instance.gen.next(); - self.instance.time_updated = now; - } - - /// Unconditionally clears the instance's migration IDs and advances its - /// Propolis generation. Not public; used internally to conclude migrations. - fn clear_migration_ids(&mut self, now: DateTime) { - self.instance.migration_id = None; - self.instance.dst_propolis_id = None; - self.instance.gen = self.instance.gen.next(); - self.instance.time_updated = now; - } - - /// Returns true if the migration IDs in this instance are already set as they - /// would be on a successful transition from the migration IDs in - /// `old_runtime` to the ones in `migration_ids`. - pub(crate) fn migration_ids_already_set( - &self, - old_runtime: &InstanceRuntimeState, - migration_ids: &Option, - ) -> bool { - // For the old and new records to match, the new record's Propolis - // generation must immediately succeed the old record's. - // - // This is an equality check to try to avoid the following A-B-A - // problem: - // - // 1. Instance starts on sled 1. - // 2. Parallel sagas start, one to migrate the instance to sled 2 - // and one to migrate the instance to sled 3. - // 3. The "migrate to sled 2" saga completes. - // 4. A new migration starts that migrates the instance back to sled 1. - // 5. The "migrate to sled 3" saga attempts to set its migration - // ID. - // - // A simple less-than check allows the migration to sled 3 to proceed - // even though the most-recently-expressed intent to migrate put the - // instance on sled 1. - if old_runtime.gen.next() != self.instance.gen { - return false; - } - - match (self.instance.migration_id, migration_ids) { - // If the migration ID is already set, and this is a request to set - // IDs, the records match if the relevant IDs match. - (Some(current_migration_id), Some(ids)) => { - let current_dst_id = self.instance.dst_propolis_id.expect( - "migration ID and destination ID must be set together", - ); - - current_migration_id == ids.migration_id - && current_dst_id == ids.dst_propolis_id - } - // If the migration ID is already cleared, and this is a request to - // clear IDs, the records match. - (None, None) => { - assert!(self.instance.dst_propolis_id.is_none()); - true - } - _ => false, - } - } } #[cfg(test)] mod test { use super::*; - use crate::params::InstanceMigrationSourceParams; - use chrono::Utc; use omicron_common::api::external::Generation; - use omicron_common::api::internal::nexus::InstanceRuntimeState; use propolis_client::types::InstanceState as Observed; use uuid::Uuid; fn make_instance() -> InstanceStates { let propolis_id = PropolisUuid::new_v4(); let now = Utc::now(); - let instance = InstanceRuntimeState { - propolis_id: Some(propolis_id), - dst_propolis_id: None, - migration_id: None, - gen: Generation::new(), - time_updated: now, - }; let vmm = VmmRuntimeState { state: VmmState::Starting, @@ -631,19 +386,16 @@ mod test { time_updated: now, }; - InstanceStates::new(instance, vmm, propolis_id) + InstanceStates::new(vmm, propolis_id, None) } fn make_migration_source_instance() -> InstanceStates { let mut state = make_instance(); state.vmm.state = VmmState::Migrating; let migration_id = Uuid::new_v4(); - state.instance.migration_id = Some(migration_id); - state.instance.dst_propolis_id = Some(PropolisUuid::new_v4()); - state.migration = Some(MigrationRuntimeState { + state.migration_out = Some(MigrationRuntimeState { migration_id, state: MigrationState::InProgress, - role: MigrationRole::Source, // advance the generation once, since we are starting out in the // `InProgress` state. gen: Generation::new().next(), @@ -654,22 +406,16 @@ mod test { } fn make_migration_target_instance() -> InstanceStates { - let mut state = make_instance(); - state.vmm.state = VmmState::Migrating; - let migration_id = Uuid::new_v4(); - state.instance.migration_id = Some(migration_id); - state.propolis_id = PropolisUuid::new_v4(); - state.instance.dst_propolis_id = Some(state.propolis_id); - state.migration = Some(MigrationRuntimeState { - migration_id, - state: MigrationState::InProgress, - role: MigrationRole::Target, - // advance the generation once, since we are starting out in the - // `InProgress` state. - gen: Generation::new().next(), - time_updated: Utc::now(), - }); - state + let propolis_id = PropolisUuid::new_v4(); + let now = Utc::now(); + + let vmm = VmmRuntimeState { + state: VmmState::Migrating, + gen: Generation::new(), + time_updated: now, + }; + + InstanceStates::new(vmm, propolis_id, Some(Uuid::new_v4())) } fn make_observed_state( @@ -677,7 +423,8 @@ mod test { ) -> ObservedPropolisState { ObservedPropolisState { vmm_state: propolis_state, - migration_status: ObservedMigrationStatus::NoMigration, + migration_in: None, + migration_out: None, time: Utc::now(), } } @@ -689,36 +436,6 @@ mod test { prev: &InstanceStates, next: &InstanceStates, ) { - // The predicate under test below is "if an interesting field changed, - // then the generation number changed." Testing the contrapositive is a - // little nicer because the assertion that trips identifies exactly - // which field changed without updating the generation number. - // - // The else branch tests the converse to make sure the generation number - // does not update unexpectedly. While this won't cause an important - // state update to be dropped, it can interfere with updates from other - // sleds that expect their own attempts to advance the generation number - // to cause new state to be recorded. - if prev.instance.gen == next.instance.gen { - assert_eq!(prev.instance.propolis_id, next.instance.propolis_id); - assert_eq!( - prev.instance.dst_propolis_id, - next.instance.dst_propolis_id - ); - assert_eq!(prev.instance.migration_id, next.instance.migration_id); - } else { - assert!( - (prev.instance.propolis_id != next.instance.propolis_id) - || (prev.instance.dst_propolis_id - != next.instance.dst_propolis_id) - || (prev.instance.migration_id - != next.instance.migration_id), - "prev: {:?}, next: {:?}", - prev, - next - ); - } - // Propolis is free to publish no-op VMM state updates (e.g. when an // in-progress migration's state changes but the migration is not yet // complete), so don't test the converse here. @@ -731,60 +448,63 @@ mod test { fn propolis_terminal_states_request_destroy_action() { for state in [Observed::Destroyed, Observed::Failed] { let mut instance_state = make_instance(); - let original_instance_state = instance_state.clone(); let requested_action = instance_state .apply_propolis_observation(&make_observed_state(state.into())); assert!(matches!(requested_action, Some(Action::Destroy))); - assert!( - instance_state.instance.gen - > original_instance_state.instance.gen - ); } } - fn test_termination_fails_in_progress_migration( - mk_instance: impl Fn() -> InstanceStates, - ) { + #[test] + fn source_termination_fails_in_progress_migration() { for state in [Observed::Destroyed, Observed::Failed] { - let mut instance_state = mk_instance(); - let original_migration = instance_state.clone().migration.unwrap(); + let mut instance_state = make_migration_source_instance(); + let original_migration = + instance_state.clone().migration_out.unwrap(); let requested_action = instance_state .apply_propolis_observation(&make_observed_state(state.into())); - let migration = - instance_state.migration.expect("state must have a migration"); + let migration = instance_state + .migration_out + .expect("state must have a migration"); assert_eq!(migration.state, MigrationState::Failed); assert!(migration.gen > original_migration.gen); assert!(matches!(requested_action, Some(Action::Destroy))); } } - #[test] - fn source_termination_fails_in_progress_migration() { - test_termination_fails_in_progress_migration( - make_migration_source_instance, - ) - } - #[test] fn target_termination_fails_in_progress_migration() { - test_termination_fails_in_progress_migration( - make_migration_target_instance, - ) + for state in [Observed::Destroyed, Observed::Failed] { + let mut instance_state = make_migration_target_instance(); + let original_migration = + instance_state.clone().migration_in.unwrap(); + let requested_action = instance_state + .apply_propolis_observation(&make_observed_state(state.into())); + + let migration = instance_state + .migration_in + .expect("state must have a migration"); + assert_eq!(migration.state, MigrationState::Failed); + assert!(migration.gen > original_migration.gen); + assert!(matches!(requested_action, Some(Action::Destroy))); + } } #[test] fn destruction_after_migration_out_does_not_transition() { let mut state = make_migration_source_instance(); - assert!(state.instance.dst_propolis_id.is_some()); - assert_ne!(state.instance.propolis_id, state.instance.dst_propolis_id); + let migration_id = state.migration_out.as_ref().unwrap().migration_id; // After a migration succeeds, the source VM appears to stop but reports // that the migration has succeeded. let mut observed = ObservedPropolisState { vmm_state: PropolisInstanceState(Observed::Stopping), - migration_status: ObservedMigrationStatus::Succeeded, + migration_out: Some(ObservedMigrationState { + state: MigrationState::Completed, + id: migration_id, + }), + migration_in: None, time: Utc::now(), }; @@ -794,21 +514,14 @@ mod test { let prev = state.clone(); assert!(state.apply_propolis_observation(&observed).is_none()); assert_state_change_has_gen_change(&prev, &state); - assert!(state.instance.gen > prev.instance.gen); - assert_eq!( - state.instance.dst_propolis_id, - prev.instance.dst_propolis_id - ); - assert_eq!(state.instance.propolis_id, state.instance.dst_propolis_id); - assert!(state.instance.migration_id.is_some()); // The migration state should transition to "completed" let migration = state - .migration + .migration_out .clone() .expect("instance must have a migration state"); let prev_migration = - prev.migration.expect("previous state must have a migration"); + prev.migration_out.expect("previous state must have a migration"); assert_eq!(migration.state, MigrationState::Completed); assert!(migration.gen > prev_migration.gen); let prev_migration = migration; @@ -820,7 +533,6 @@ mod test { observed.vmm_state = PropolisInstanceState(Observed::Stopped); assert!(state.apply_propolis_observation(&observed).is_none()); assert_state_change_has_gen_change(&prev, &state); - assert_eq!(state.instance.gen, prev.instance.gen); // The Stopped state is translated internally to Stopping to prevent // external viewers from perceiving that the instance is stopped before @@ -830,7 +542,7 @@ mod test { // Now that the migration has completed, it should not transition again. let migration = state - .migration + .migration_out .clone() .expect("instance must have a migration state"); assert_eq!(migration.state, MigrationState::Completed); @@ -844,12 +556,19 @@ mod test { Some(Action::Destroy) )); assert_state_change_has_gen_change(&prev, &state); - assert_eq!(state.instance.gen, prev.instance.gen); assert_eq!(state.vmm.state, VmmState::Destroyed); assert!(state.vmm.gen > prev.vmm.gen); let migration = state - .migration + .migration_out + .clone() + .expect("instance must have a migration state"); + assert_eq!(migration.state, MigrationState::Completed); + assert_eq!(migration.gen, prev_migration.gen); + + state.terminate_rudely(false); + let migration = state + .migration_out .clone() .expect("instance must have a migration state"); assert_eq!(migration.state, MigrationState::Completed); @@ -859,12 +578,17 @@ mod test { #[test] fn failure_after_migration_in_does_not_transition() { let mut state = make_migration_target_instance(); + let migration_id = state.migration_in.as_ref().unwrap().migration_id; // Failure to migrate into an instance should mark the VMM as destroyed // but should not change the instance's migration IDs. let observed = ObservedPropolisState { vmm_state: PropolisInstanceState(Observed::Failed), - migration_status: ObservedMigrationStatus::Failed, + migration_in: Some(ObservedMigrationState { + state: MigrationState::Failed, + id: migration_id, + }), + migration_out: None, time: Utc::now(), }; @@ -874,15 +598,14 @@ mod test { Some(Action::Destroy) )); assert_state_change_has_gen_change(&prev, &state); - assert_eq!(state.instance.gen, prev.instance.gen); assert_eq!(state.vmm.state, VmmState::Failed); assert!(state.vmm.gen > prev.vmm.gen); // The migration state should transition. let migration = - state.migration.expect("instance must have a migration state"); + state.migration_in.expect("instance must have a migration state"); let prev_migration = - prev.migration.expect("previous state must have a migration"); + prev.migration_in.expect("previous state must have a migration"); assert_eq!(migration.state, MigrationState::Failed); assert!(migration.gen > prev_migration.gen); } @@ -896,192 +619,19 @@ mod test { #[test] fn rude_terminate_of_migration_target_does_not_transition_instance() { let mut state = make_migration_target_instance(); - assert_eq!(state.propolis_role(), PropolisRole::MigrationTarget); let prev = state.clone(); let mark_failed = false; state.terminate_rudely(mark_failed); assert_state_change_has_gen_change(&prev, &state); - assert_eq!(state.instance.gen, prev.instance.gen); // The migration state should transition. let migration = - state.migration.expect("instance must have a migration state"); + state.migration_in.expect("instance must have a migration state"); let prev_migration = - prev.migration.expect("previous state must have a migration"); + prev.migration_in.expect("previous state must have a migration"); assert_eq!(migration.state, MigrationState::Failed); assert!(migration.gen > prev_migration.gen); } - - #[test] - fn migration_out_after_migration_in() { - let mut state = make_migration_target_instance(); - let mut observed = ObservedPropolisState { - vmm_state: PropolisInstanceState(Observed::Running), - migration_status: ObservedMigrationStatus::Succeeded, - time: Utc::now(), - }; - - // The transition into the Running state on the migration target should - // take over for the source, updating the Propolis generation. - let prev = state.clone(); - assert!(state.apply_propolis_observation(&observed).is_none()); - assert_state_change_has_gen_change(&prev, &state); - assert!(state.instance.migration_id.is_none()); - assert!(state.instance.dst_propolis_id.is_none()); - assert!(state.instance.gen > prev.instance.gen); - assert_eq!(state.vmm.state, VmmState::Running); - assert!(state.vmm.gen > prev.vmm.gen); - - // The migration state should transition to completed. - let migration = state - .migration - .clone() - .expect("instance must have a migration state"); - let prev_migration = - prev.migration.expect("previous state must have a migration"); - assert_eq!(migration.state, MigrationState::Completed); - assert!(migration.gen > prev_migration.gen); - - // Pretend Nexus set some new migration IDs. - let migration_id = Uuid::new_v4(); - let prev = state.clone(); - state.set_migration_ids( - &Some(InstanceMigrationSourceParams { - migration_id, - dst_propolis_id: PropolisUuid::new_v4(), - }), - Utc::now(), - ); - assert_state_change_has_gen_change(&prev, &state); - assert!(state.instance.gen > prev.instance.gen); - assert_eq!(state.vmm.gen, prev.vmm.gen); - - // There should be a new, pending migration state. - let migration = state - .migration - .clone() - .expect("instance must have a migration state"); - assert_eq!(migration.state, MigrationState::Pending); - assert_eq!(migration.migration_id, migration_id); - let prev_migration = migration; - - // Mark that the new migration out is in progress. This doesn't change - // anything in the instance runtime state, but does update the VMM state - // generation. - let prev = state.clone(); - observed.vmm_state = PropolisInstanceState(Observed::Migrating); - observed.migration_status = ObservedMigrationStatus::InProgress; - assert!(state.apply_propolis_observation(&observed).is_none()); - assert_state_change_has_gen_change(&prev, &state); - assert_eq!( - state.instance.migration_id.unwrap(), - prev.instance.migration_id.unwrap() - ); - assert_eq!( - state.instance.dst_propolis_id.unwrap(), - prev.instance.dst_propolis_id.unwrap() - ); - assert_eq!(state.vmm.state, VmmState::Migrating); - assert!(state.vmm.gen > prev.vmm.gen); - assert_eq!(state.instance.gen, prev.instance.gen); - - // The migration state should transition to in progress. - let migration = state - .migration - .clone() - .expect("instance must have a migration state"); - assert_eq!(migration.state, MigrationState::InProgress); - assert!(migration.gen > prev_migration.gen); - let prev_migration = migration; - - // Propolis will publish that the migration succeeds before changing any - // state. This should transfer control to the target but should not - // touch the migration ID (that is the new target's job). - let prev = state.clone(); - observed.vmm_state = PropolisInstanceState(Observed::Migrating); - observed.migration_status = ObservedMigrationStatus::Succeeded; - assert!(state.apply_propolis_observation(&observed).is_none()); - assert_state_change_has_gen_change(&prev, &state); - assert_eq!(state.vmm.state, VmmState::Migrating); - assert!(state.vmm.gen > prev.vmm.gen); - assert_eq!(state.instance.migration_id, prev.instance.migration_id); - assert_eq!( - state.instance.dst_propolis_id, - prev.instance.dst_propolis_id, - ); - assert_eq!(state.instance.propolis_id, state.instance.dst_propolis_id); - assert!(state.instance.gen > prev.instance.gen); - - // The migration state should transition to completed. - let migration = state - .migration - .clone() - .expect("instance must have a migration state"); - assert_eq!(migration.state, MigrationState::Completed); - assert!(migration.gen > prev_migration.gen); - - // The rest of the destruction sequence is covered by other tests. - } - - #[test] - fn test_migration_ids_already_set() { - let orig_instance = make_instance(); - let mut old_instance = orig_instance.clone(); - let mut new_instance = old_instance.clone(); - - // Advancing the old instance's migration IDs and then asking if the - // new IDs are present should indicate that they are indeed present. - let migration_ids = InstanceMigrationSourceParams { - migration_id: Uuid::new_v4(), - dst_propolis_id: PropolisUuid::new_v4(), - }; - - new_instance.set_migration_ids(&Some(migration_ids), Utc::now()); - assert!(new_instance.migration_ids_already_set( - old_instance.instance(), - &Some(migration_ids) - )); - - // The IDs aren't already set if the new record has an ID that's - // advanced from the old record by more than one generation. - let mut newer_instance = new_instance.clone(); - newer_instance.instance.gen = newer_instance.instance.gen.next(); - assert!(!newer_instance.migration_ids_already_set( - old_instance.instance(), - &Some(migration_ids) - )); - - // They also aren't set if the old generation has somehow equaled or - // surpassed the current generation. - old_instance.instance.gen = old_instance.instance.gen.next(); - assert!(!new_instance.migration_ids_already_set( - old_instance.instance(), - &Some(migration_ids) - )); - - // If the generation numbers are right, but either requested ID is not - // present in the current instance, the requested IDs aren't set. - old_instance = orig_instance; - new_instance.instance.migration_id = Some(Uuid::new_v4()); - assert!(!new_instance.migration_ids_already_set( - old_instance.instance(), - &Some(migration_ids) - )); - - new_instance.instance.migration_id = Some(migration_ids.migration_id); - new_instance.instance.dst_propolis_id = Some(PropolisUuid::new_v4()); - assert!(!new_instance.migration_ids_already_set( - old_instance.instance(), - &Some(migration_ids) - )); - - new_instance.instance.migration_id = None; - new_instance.instance.dst_propolis_id = None; - assert!(!new_instance.migration_ids_already_set( - old_instance.instance(), - &Some(migration_ids) - )); - } } diff --git a/sled-agent/src/http_entrypoints.rs b/sled-agent/src/http_entrypoints.rs index 407254419c..820ec746b8 100644 --- a/sled-agent/src/http_entrypoints.rs +++ b/sled-agent/src/http_entrypoints.rs @@ -8,9 +8,9 @@ use super::sled_agent::SledAgent; use crate::bootstrap::params::AddSledRequest; use crate::params::{ BootstoreStatus, CleanupContextUpdate, DiskEnsureBody, InstanceEnsureBody, - InstanceExternalIpBody, InstancePutMigrationIdsBody, InstancePutStateBody, - InstancePutStateResponse, InstanceUnregisterResponse, TimeSync, - VpcFirewallRulesEnsureBody, ZoneBundleId, ZoneBundleMetadata, Zpool, + InstanceExternalIpBody, InstancePutStateBody, InstancePutStateResponse, + InstanceUnregisterResponse, TimeSync, VpcFirewallRulesEnsureBody, + ZoneBundleId, ZoneBundleMetadata, Zpool, }; use crate::sled_agent::Error as SledAgentError; use crate::zone_bundle; @@ -54,7 +54,6 @@ pub fn api() -> SledApiDescription { api.register(disk_put)?; api.register(cockroachdb_init)?; api.register(instance_issue_disk_snapshot_request)?; - api.register(instance_put_migration_ids)?; api.register(instance_put_state)?; api.register(instance_get_state)?; api.register(instance_put_external_ip)?; @@ -496,28 +495,6 @@ async fn instance_get_state( Ok(HttpResponseOk(sa.instance_get_state(instance_id).await?)) } -#[endpoint { - method = PUT, - path = "/instances/{instance_id}/migration-ids", -}] -async fn instance_put_migration_ids( - rqctx: RequestContext, - path_params: Path, - body: TypedBody, -) -> Result, HttpError> { - let sa = rqctx.context(); - let instance_id = path_params.into_inner().instance_id; - let body_args = body.into_inner(); - Ok(HttpResponseOk( - sa.instance_put_migration_ids( - instance_id, - &body_args.old_runtime, - &body_args.migration_params, - ) - .await?, - )) -} - #[endpoint { method = PUT, path = "/instances/{instance_id}/external-ip", diff --git a/sled-agent/src/instance.rs b/sled-agent/src/instance.rs index 7bfe308f94..631f2b83f6 100644 --- a/sled-agent/src/instance.rs +++ b/sled-agent/src/instance.rs @@ -16,9 +16,9 @@ use crate::nexus::NexusClientWithResolver; use crate::params::ZoneBundleMetadata; use crate::params::{InstanceExternalIpBody, ZoneBundleCause}; use crate::params::{ - InstanceHardware, InstanceMetadata, InstanceMigrationSourceParams, - InstanceMigrationTargetParams, InstancePutStateResponse, - InstanceStateRequested, InstanceUnregisterResponse, VpcFirewallRule, + InstanceHardware, InstanceMetadata, InstanceMigrationTargetParams, + InstancePutStateResponse, InstanceStateRequested, + InstanceUnregisterResponse, VpcFirewallRule, }; use crate::profile::*; use crate::zone_bundle::BundleError; @@ -33,7 +33,7 @@ use illumos_utils::running_zone::{RunningZone, ZoneBuilderFactory}; use illumos_utils::svc::wait_for_service; use illumos_utils::zone::PROPOLIS_ZONE_PREFIX; use omicron_common::api::internal::nexus::{ - InstanceRuntimeState, SledInstanceState, VmmRuntimeState, + SledInstanceState, VmmRuntimeState, }; use omicron_common::api::internal::shared::{ NetworkInterface, SledIdentifiers, SourceNatConfig, @@ -228,11 +228,6 @@ enum InstanceRequest { state: crate::params::InstanceStateRequested, tx: oneshot::Sender>, }, - PutMigrationIds { - old_runtime: InstanceRuntimeState, - migration_ids: Option, - tx: oneshot::Sender>, - }, Terminate { mark_failed: bool, tx: oneshot::Sender>, @@ -384,10 +379,7 @@ impl InstanceRunner { use InstanceMonitorRequest::*; match request { Some(Update { state, tx }) => { - let observed = ObservedPropolisState::new( - self.state.instance(), - &state, - ); + let observed = ObservedPropolisState::new(&state); let reaction = self.observe_state(&observed).await; self.publish_state_to_nexus().await; @@ -431,15 +423,6 @@ impl InstanceRunner { .map_err(|e| e.into())) .map_err(|_| Error::FailedSendClientClosed) }, - Some(PutMigrationIds{ old_runtime, migration_ids, tx }) => { - tx.send( - self.put_migration_ids( - &old_runtime, - &migration_ids - ).await.map_err(|e| e.into()) - ) - .map_err(|_| Error::FailedSendClientClosed) - }, Some(Terminate { mark_failed, tx }) => { tx.send(Ok(InstanceUnregisterResponse { updated_runtime: Some(self.terminate(mark_failed).await) @@ -504,9 +487,6 @@ impl InstanceRunner { PutState { tx, .. } => { tx.send(Err(Error::Terminating.into())).map_err(|_| ()) } - PutMigrationIds { tx, .. } => { - tx.send(Err(Error::Terminating.into())).map_err(|_| ()) - } Terminate { tx, .. } => { tx.send(Err(Error::Terminating.into())).map_err(|_| ()) } @@ -649,7 +629,6 @@ impl InstanceRunner { self.log, "updated state after observing Propolis state change"; "propolis_id" => %self.state.propolis_id(), - "new_instance_state" => ?self.state.instance(), "new_vmm_state" => ?self.state.vmm() ); @@ -711,10 +690,27 @@ impl InstanceRunner { let migrate = match migrate { Some(params) => { - let migration_id = - self.state.instance().migration_id.ok_or_else(|| { - Error::Migration(anyhow!("Missing Migration UUID")) - })?; + let migration_id = self.state + .migration_in() + // TODO(eliza): This is a bit of an unfortunate dance: the + // initial instance-ensure-registered request is what sends + // the migration ID, but it's the subsequent + // instance-ensure-state request (which we're handling here) + // that includes migration the source VMM's UUID and IP + // address. Because the API currently splits the migration + // IDs between the instance-ensure-registered and + // instance-ensure-state requests, we have to stash the + // migration ID in an `Option` and `expect()` it here, + // panicking if we get an instance-ensure-state request with + // a source Propolis ID if the instance wasn't registered + // with a migration in ID. + // + // This is kind of a shame. Eventually, we should consider + // reworking the API ensure-state request contains the + // migration ID, and we don't have to unwrap here. See: + // https://github.com/oxidecomputer/omicron/issues/6073 + .expect("if we have migration target params, we should also have a migration in") + .migration_id; Some(propolis_client::types::InstanceMigrateInitiateRequest { src_addr: params.src_propolis_addr.to_string(), src_uuid: params.src_propolis_id, @@ -969,9 +965,11 @@ pub struct Instance { #[derive(Debug)] pub(crate) struct InstanceInitialState { pub hardware: InstanceHardware, - pub instance_runtime: InstanceRuntimeState, pub vmm_runtime: VmmRuntimeState, pub propolis_addr: SocketAddr, + /// UUID of the migration in to this VMM, if the VMM is being created as the + /// target of an active migration. + pub migration_id: Option, } impl Instance { @@ -1002,13 +1000,14 @@ impl Instance { info!(log, "initializing new Instance"; "instance_id" => %id, "propolis_id" => %propolis_id, + "migration_id" => ?state.migration_id, "state" => ?state); let InstanceInitialState { hardware, - instance_runtime, vmm_runtime, propolis_addr, + migration_id, } = state; let InstanceManagerServices { @@ -1098,11 +1097,7 @@ impl Instance { dhcp_config, requested_disks: hardware.disks, cloud_init_bytes: hardware.cloud_init_bytes, - state: InstanceStates::new( - instance_runtime, - vmm_runtime, - propolis_id, - ), + state: InstanceStates::new(vmm_runtime, propolis_id, migration_id), running_state: None, nexus_client, storage, @@ -1173,23 +1168,6 @@ impl Instance { Ok(()) } - pub async fn put_migration_ids( - &self, - tx: oneshot::Sender>, - old_runtime: InstanceRuntimeState, - migration_ids: Option, - ) -> Result<(), Error> { - self.tx - .send(InstanceRequest::PutMigrationIds { - old_runtime, - migration_ids, - tx, - }) - .await - .map_err(|_| Error::FailedSendChannelClosed)?; - Ok(()) - } - /// Rudely terminates this instance's Propolis (if it has one) and /// immediately transitions the instance to the Destroyed state. pub async fn terminate( @@ -1376,36 +1354,6 @@ impl InstanceRunner { Ok(self.state.sled_instance_state()) } - async fn put_migration_ids( - &mut self, - old_runtime: &InstanceRuntimeState, - migration_ids: &Option, - ) -> Result { - // Check that the instance's current generation matches the one the - // caller expects to transition from. This helps Nexus ensure that if - // multiple migration sagas launch at Propolis generation N, then only - // one of them will successfully set the instance's migration IDs. - if self.state.instance().gen != old_runtime.gen { - // Allow this transition for idempotency if the instance is - // already in the requested goal state. - if self.state.migration_ids_already_set(old_runtime, migration_ids) - { - return Ok(self.state.sled_instance_state()); - } - - return Err(Error::Transition( - omicron_common::api::external::Error::conflict(format!( - "wrong instance state generation: expected {}, got {}", - self.state.instance().gen, - old_runtime.gen - )), - )); - } - - self.state.set_migration_ids(migration_ids, Utc::now()); - Ok(self.state.sled_instance_state()) - } - async fn setup_propolis_inner(&mut self) -> Result { // Create OPTE ports for the instance. We also store the names of all // those ports to notify the metrics task to start collecting statistics @@ -1637,7 +1585,9 @@ mod tests { use omicron_common::api::external::{ ByteCount, Generation, Hostname, InstanceCpuCount, }; - use omicron_common::api::internal::nexus::{InstanceProperties, VmmState}; + use omicron_common::api::internal::nexus::{ + InstanceProperties, InstanceRuntimeState, VmmState, + }; use omicron_common::api::internal::shared::SledIdentifiers; use omicron_common::FileKv; use sled_storage::manager_test_harness::StorageManagerTestHarness; @@ -1819,8 +1769,7 @@ mod tests { let ticket = InstanceTicket::new_without_manager_for_test(id); - let initial_state = - fake_instance_initial_state(propolis_id, propolis_addr); + let initial_state = fake_instance_initial_state(propolis_addr); let (services, rx) = fake_instance_manager_services( log, @@ -1856,7 +1805,6 @@ mod tests { } fn fake_instance_initial_state( - propolis_id: PropolisUuid, propolis_addr: SocketAddr, ) -> InstanceInitialState { let hardware = InstanceHardware { @@ -1886,19 +1834,13 @@ mod tests { InstanceInitialState { hardware, - instance_runtime: InstanceRuntimeState { - propolis_id: Some(propolis_id), - dst_propolis_id: None, - migration_id: None, - gen: Generation::new(), - time_updated: Default::default(), - }, vmm_runtime: VmmRuntimeState { state: VmmState::Starting, gen: Generation::new(), time_updated: Default::default(), }, propolis_addr, + migration_id: None, } } @@ -2283,10 +2225,10 @@ mod tests { let propolis_id = PropolisUuid::from_untyped_uuid(PROPOLIS_ID); let InstanceInitialState { hardware, - instance_runtime, vmm_runtime, propolis_addr, - } = fake_instance_initial_state(propolis_id, propolis_addr); + migration_id: _, + } = fake_instance_initial_state(propolis_addr); let metadata = InstanceMetadata { silo_id: Uuid::new_v4(), @@ -2300,6 +2242,14 @@ mod tests { serial: "fake-serial".into(), }; + let instance_runtime = InstanceRuntimeState { + propolis_id: Some(propolis_id), + dst_propolis_id: None, + migration_id: None, + gen: Generation::new(), + time_updated: Default::default(), + }; + mgr.ensure_registered( instance_id, propolis_id, diff --git a/sled-agent/src/instance_manager.rs b/sled-agent/src/instance_manager.rs index bb9303f5e2..1b2fb204d0 100644 --- a/sled-agent/src/instance_manager.rs +++ b/sled-agent/src/instance_manager.rs @@ -12,8 +12,8 @@ use crate::params::InstanceExternalIpBody; use crate::params::InstanceMetadata; use crate::params::ZoneBundleMetadata; use crate::params::{ - InstanceHardware, InstanceMigrationSourceParams, InstancePutStateResponse, - InstanceStateRequested, InstanceUnregisterResponse, + InstanceHardware, InstancePutStateResponse, InstanceStateRequested, + InstanceUnregisterResponse, }; use crate::vmm_reservoir::VmmReservoirManagerHandle; use crate::zone_bundle::BundleError; @@ -166,7 +166,7 @@ impl InstanceManager { instance_runtime, vmm_runtime, propolis_addr, - sled_identifiers, + sled_identifiers: Box::new(sled_identifiers), metadata, tx, }) @@ -225,26 +225,6 @@ impl InstanceManager { } } - pub async fn put_migration_ids( - &self, - instance_id: InstanceUuid, - old_runtime: &InstanceRuntimeState, - migration_ids: &Option, - ) -> Result { - let (tx, rx) = oneshot::channel(); - self.inner - .tx - .send(InstanceManagerRequest::PutMigrationIds { - instance_id, - old_runtime: old_runtime.clone(), - migration_ids: *migration_ids, - tx, - }) - .await - .map_err(|_| Error::FailedSendInstanceManagerClosed)?; - rx.await? - } - pub async fn instance_issue_disk_snapshot_request( &self, instance_id: InstanceUuid, @@ -369,7 +349,12 @@ enum InstanceManagerRequest { instance_runtime: InstanceRuntimeState, vmm_runtime: VmmRuntimeState, propolis_addr: SocketAddr, - sled_identifiers: SledIdentifiers, + // These are boxed because they are, apparently, quite large, and Clippy + // whinges about the overall size of this variant relative to the + // others. Since we will generally send `EnsureRegistered` requests much + // less frequently than most of the others, boxing this seems like a + // reasonable choice... + sled_identifiers: Box, metadata: InstanceMetadata, tx: oneshot::Sender>, }, @@ -382,12 +367,7 @@ enum InstanceManagerRequest { target: InstanceStateRequested, tx: oneshot::Sender>, }, - PutMigrationIds { - instance_id: InstanceUuid, - old_runtime: InstanceRuntimeState, - migration_ids: Option, - tx: oneshot::Sender>, - }, + InstanceIssueDiskSnapshot { instance_id: InstanceUuid, disk_id: Uuid, @@ -505,7 +485,7 @@ impl InstanceManagerRunner { instance_runtime, vmm_runtime, propolis_addr, - sled_identifiers, + *sled_identifiers, metadata ).await).map_err(|_| Error::FailedSendClientClosed) }, @@ -515,9 +495,6 @@ impl InstanceManagerRunner { Some(EnsureState { instance_id, target, tx }) => { self.ensure_state(tx, instance_id, target).await }, - Some(PutMigrationIds { instance_id, old_runtime, migration_ids, tx }) => { - self.put_migration_ids(tx, instance_id, &old_runtime, &migration_ids).await - }, Some(InstanceIssueDiskSnapshot { instance_id, disk_id, snapshot_id, tx }) => { self.instance_issue_disk_snapshot_request(tx, instance_id, disk_id, snapshot_id).await }, @@ -631,7 +608,8 @@ impl InstanceManagerRunner { info!(&self.log, "registering new instance"; "instance_id" => ?instance_id); - let instance_log = self.log.new(o!()); + let instance_log = + self.log.new(o!("instance_id" => format!("{instance_id}"))); let ticket = InstanceTicket::new(instance_id, self.terminate_tx.clone()); @@ -647,9 +625,9 @@ impl InstanceManagerRunner { let state = crate::instance::InstanceInitialState { hardware, - instance_runtime, vmm_runtime, propolis_addr, + migration_id: instance_runtime.migration_id, }; let instance = Instance::new( @@ -729,25 +707,6 @@ impl InstanceManagerRunner { Ok(()) } - /// Idempotently attempts to set the instance's migration IDs to the - /// supplied IDs. - async fn put_migration_ids( - &mut self, - tx: oneshot::Sender>, - instance_id: InstanceUuid, - old_runtime: &InstanceRuntimeState, - migration_ids: &Option, - ) -> Result<(), Error> { - let (_, instance) = self - .instances - .get(&instance_id) - .ok_or_else(|| Error::NoSuchInstance(instance_id))?; - instance - .put_migration_ids(tx, old_runtime.clone(), *migration_ids) - .await?; - Ok(()) - } - async fn instance_issue_disk_snapshot_request( &self, tx: oneshot::Sender>, diff --git a/sled-agent/src/params.rs b/sled-agent/src/params.rs index b7a143cf87..4a7885279c 100644 --- a/sled-agent/src/params.rs +++ b/sled-agent/src/params.rs @@ -210,23 +210,6 @@ pub struct InstanceMigrationSourceParams { pub dst_propolis_id: PropolisUuid, } -/// The body of a request to set or clear the migration identifiers from a -/// sled agent's instance state records. -#[derive(Debug, Serialize, Deserialize, JsonSchema)] -pub struct InstancePutMigrationIdsBody { - /// The last instance runtime state known to this requestor. This request - /// will succeed if either (a) the state generation in the sled agent's - /// runtime state matches the generation in this record, or (b) the sled - /// agent's runtime state matches what would result from applying this - /// request to the caller's runtime state. This latter condition provides - /// idempotency. - pub old_runtime: InstanceRuntimeState, - - /// The migration identifiers to set. If `None`, this operation clears the - /// migration IDs. - pub migration_params: Option, -} - #[derive(Clone, Debug, Deserialize, Serialize, JsonSchema, PartialEq)] pub enum DiskType { U2, diff --git a/sled-agent/src/sim/collection.rs b/sled-agent/src/sim/collection.rs index 8af71ac026..ffb7327ce7 100644 --- a/sled-agent/src/sim/collection.rs +++ b/sled-agent/src/sim/collection.rs @@ -422,7 +422,6 @@ mod test { use omicron_common::api::external::Error; use omicron_common::api::external::Generation; use omicron_common::api::internal::nexus::DiskRuntimeState; - use omicron_common::api::internal::nexus::InstanceRuntimeState; use omicron_common::api::internal::nexus::SledInstanceState; use omicron_common::api::internal::nexus::VmmRuntimeState; use omicron_common::api::internal::nexus::VmmState; @@ -433,14 +432,6 @@ mod test { logctx: &LogContext, ) -> (SimObject, Receiver<()>) { let propolis_id = PropolisUuid::new_v4(); - let instance_vmm = InstanceRuntimeState { - propolis_id: Some(propolis_id), - dst_propolis_id: None, - migration_id: None, - gen: Generation::new(), - time_updated: Utc::now(), - }; - let vmm_state = VmmRuntimeState { state: VmmState::Starting, gen: Generation::new(), @@ -448,10 +439,10 @@ mod test { }; let state = SledInstanceState { - instance_state: instance_vmm, vmm_state, propolis_id, - migration_state: None, + migration_in: None, + migration_out: None, }; SimObject::new_simulated_auto(&state, logctx.log.new(o!())) @@ -501,14 +492,8 @@ mod test { assert!(dropped.is_none()); assert!(instance.object.desired().is_none()); let rnext = instance.object.current(); - assert!(rnext.instance_state.gen > rprev.instance_state.gen); assert!(rnext.vmm_state.gen > rprev.vmm_state.gen); - assert!( - rnext.instance_state.time_updated - >= rprev.instance_state.time_updated - ); assert!(rnext.vmm_state.time_updated >= rprev.vmm_state.time_updated); - assert!(rnext.instance_state.propolis_id.is_none()); assert_eq!(rnext.vmm_state.state, VmmState::Destroyed); assert!(rx.try_next().is_err()); @@ -632,7 +617,6 @@ mod test { assert!(rnext.vmm_state.time_updated >= rprev.vmm_state.time_updated); assert_eq!(rprev.vmm_state.state, VmmState::Stopping); assert_eq!(rnext.vmm_state.state, VmmState::Destroyed); - assert!(rnext.instance_state.gen > rprev.instance_state.gen); logctx.cleanup_successful(); } diff --git a/sled-agent/src/sim/http_entrypoints.rs b/sled-agent/src/sim/http_entrypoints.rs index 268e8a9cf1..d042e19814 100644 --- a/sled-agent/src/sim/http_entrypoints.rs +++ b/sled-agent/src/sim/http_entrypoints.rs @@ -4,11 +4,11 @@ //! HTTP entrypoint functions for the sled agent's exposed API +use super::collection::PokeMode; use crate::bootstrap::params::AddSledRequest; use crate::params::{ DiskEnsureBody, InstanceEnsureBody, InstanceExternalIpBody, - InstancePutMigrationIdsBody, InstancePutStateBody, - InstancePutStateResponse, InstanceUnregisterResponse, + InstancePutStateBody, InstancePutStateResponse, InstanceUnregisterResponse, VpcFirewallRulesEnsureBody, }; use dropshot::ApiDescription; @@ -45,7 +45,6 @@ pub fn api() -> SledApiDescription { fn register_endpoints( api: &mut SledApiDescription, ) -> Result<(), ApiDescriptionRegisterError> { - api.register(instance_put_migration_ids)?; api.register(instance_put_state)?; api.register(instance_get_state)?; api.register(instance_register)?; @@ -53,6 +52,8 @@ pub fn api() -> SledApiDescription { api.register(instance_put_external_ip)?; api.register(instance_delete_external_ip)?; api.register(instance_poke_post)?; + api.register(instance_poke_single_step_post)?; + api.register(instance_post_sim_migration_source)?; api.register(disk_put)?; api.register(disk_poke_post)?; api.register(update_artifact)?; @@ -157,28 +158,6 @@ async fn instance_get_state( Ok(HttpResponseOk(sa.instance_get_state(instance_id).await?)) } -#[endpoint { - method = PUT, - path = "/instances/{instance_id}/migration-ids", -}] -async fn instance_put_migration_ids( - rqctx: RequestContext>, - path_params: Path, - body: TypedBody, -) -> Result, HttpError> { - let sa = rqctx.context(); - let instance_id = path_params.into_inner().instance_id; - let body_args = body.into_inner(); - Ok(HttpResponseOk( - sa.instance_put_migration_ids( - instance_id, - &body_args.old_runtime, - &body_args.migration_params, - ) - .await?, - )) -} - #[endpoint { method = PUT, path = "/instances/{instance_id}/external-ip", @@ -221,7 +200,37 @@ async fn instance_poke_post( ) -> Result { let sa = rqctx.context(); let instance_id = path_params.into_inner().instance_id; - sa.instance_poke(instance_id).await; + sa.instance_poke(instance_id, PokeMode::Drain).await; + Ok(HttpResponseUpdatedNoContent()) +} + +#[endpoint { + method = POST, + path = "/instances/{instance_id}/poke-single-step", +}] +async fn instance_poke_single_step_post( + rqctx: RequestContext>, + path_params: Path, +) -> Result { + let sa = rqctx.context(); + let instance_id = path_params.into_inner().instance_id; + sa.instance_poke(instance_id, PokeMode::SingleStep).await; + Ok(HttpResponseUpdatedNoContent()) +} + +#[endpoint { + method = POST, + path = "/instances/{instance_id}/sim-migration-source", +}] +async fn instance_post_sim_migration_source( + rqctx: RequestContext>, + path_params: Path, + body: TypedBody, +) -> Result { + let sa = rqctx.context(); + let instance_id = path_params.into_inner().instance_id; + sa.instance_simulate_migration_source(instance_id, body.into_inner()) + .await?; Ok(HttpResponseUpdatedNoContent()) } diff --git a/sled-agent/src/sim/instance.rs b/sled-agent/src/sim/instance.rs index e94b3b4984..8ee0130262 100644 --- a/sled-agent/src/sim/instance.rs +++ b/sled-agent/src/sim/instance.rs @@ -8,16 +8,14 @@ use super::simulatable::Simulatable; use crate::common::instance::{ObservedPropolisState, PublishedVmmState}; use crate::nexus::NexusClient; -use crate::params::{InstanceMigrationSourceParams, InstanceStateRequested}; +use crate::params::InstanceStateRequested; use async_trait::async_trait; use chrono::Utc; use nexus_client; use omicron_common::api::external::Error; use omicron_common::api::external::Generation; use omicron_common::api::external::ResourceType; -use omicron_common::api::internal::nexus::{ - InstanceRuntimeState, MigrationRole, SledInstanceState, VmmState, -}; +use omicron_common::api::internal::nexus::{SledInstanceState, VmmState}; use propolis_client::types::{ InstanceMigrateStatusResponse as PropolisMigrateResponse, InstanceMigrationStatus as PropolisMigrationStatus, @@ -30,6 +28,10 @@ use uuid::Uuid; use crate::common::instance::{Action as InstanceAction, InstanceStates}; +pub use sled_agent_client::{ + SimulateMigrationSource, SimulatedMigrationResult, +}; + #[derive(Clone, Debug)] enum MonitorChange { PropolisState(PropolisInstanceState), @@ -79,56 +81,67 @@ impl SimInstanceInner { self.queue.push_back(MonitorChange::MigrateStatus(migrate_status)) } - /// Queue a successful simulated migration. - /// - fn queue_successful_migration(&mut self, role: MigrationRole) { + /// Queue a simulated migration out. + fn queue_migration_out( + &mut self, + migration_id: Uuid, + result: SimulatedMigrationResult, + ) { + let migration_update = |state| PropolisMigrateResponse { + migration_in: None, + migration_out: Some(PropolisMigrationStatus { + id: migration_id, + state, + }), + }; // Propolis transitions to the Migrating state once before // actually starting migration. self.queue_propolis_state(PropolisInstanceState::Migrating); - let migration_id = - self.state.instance().migration_id.unwrap_or_else(|| { - panic!( - "should have migration ID set before getting request to - migrate in (current state: {:?})", - self - ) - }); - - match role { - MigrationRole::Source => { - self.queue_migration_update(PropolisMigrateResponse { - migration_in: None, - migration_out: Some(PropolisMigrationStatus { - id: migration_id, - state: propolis_client::types::MigrationState::Sync, - }), - }); - self.queue_migration_update(PropolisMigrateResponse { - migration_in: None, - migration_out: Some(PropolisMigrationStatus { - id: migration_id, - state: propolis_client::types::MigrationState::Finish, - }), - }); + self.queue_migration_update(migration_update( + propolis_client::types::MigrationState::Sync, + )); + match result { + SimulatedMigrationResult::Success => { + self.queue_migration_update(migration_update( + propolis_client::types::MigrationState::Finish, + )); self.queue_graceful_stop(); } - MigrationRole::Target => { - self.queue_migration_update(PropolisMigrateResponse { - migration_in: Some(PropolisMigrationStatus { - id: migration_id, - state: propolis_client::types::MigrationState::Sync, - }), - migration_out: None, - }); - self.queue_migration_update(PropolisMigrateResponse { - migration_in: Some(PropolisMigrationStatus { - id: migration_id, - state: propolis_client::types::MigrationState::Finish, - }), - migration_out: None, - }); + SimulatedMigrationResult::Failure => { + todo!("finish this part when we actuall need it...") + } + } + } + + /// Queue a simulated migration in. + fn queue_migration_in( + &mut self, + migration_id: Uuid, + result: SimulatedMigrationResult, + ) { + let migration_update = |state| PropolisMigrateResponse { + migration_in: Some(PropolisMigrationStatus { + id: migration_id, + state, + }), + migration_out: None, + }; + // Propolis transitions to the Migrating state once before + // actually starting migration. + self.queue_propolis_state(PropolisInstanceState::Migrating); + self.queue_migration_update(migration_update( + propolis_client::types::MigrationState::Sync, + )); + match result { + SimulatedMigrationResult::Success => { + self.queue_migration_update(migration_update( + propolis_client::types::MigrationState::Finish, + )); self.queue_propolis_state(PropolisInstanceState::Running) } + SimulatedMigrationResult::Failure => { + todo!("finish this part when we actually need it...") + } } } @@ -179,7 +192,20 @@ impl SimInstanceInner { ))); } - self.queue_successful_migration(MigrationRole::Target) + let migration_id = self + .state + .migration_in() + .ok_or_else(|| { + Error::invalid_request( + "can't request migration in for a vmm that wasn't \ + created with a migration ID", + ) + })? + .migration_id; + self.queue_migration_in( + migration_id, + SimulatedMigrationResult::Success, + ); } InstanceStateRequested::Running => { match self.next_resting_state() { @@ -279,7 +305,6 @@ impl SimInstanceInner { } self.state.apply_propolis_observation(&ObservedPropolisState::new( - self.state.instance(), &self.last_response, )) } else { @@ -370,46 +395,6 @@ impl SimInstanceInner { self.destroyed = true; self.state.sled_instance_state() } - - /// Stores a set of migration IDs in the instance's runtime state. - fn put_migration_ids( - &mut self, - old_runtime: &InstanceRuntimeState, - ids: &Option, - ) -> Result { - if self.state.migration_ids_already_set(old_runtime, ids) { - return Ok(self.state.sled_instance_state()); - } - - if self.state.instance().gen != old_runtime.gen { - return Err(Error::invalid_request(format!( - "wrong Propolis ID generation: expected {}, got {}", - self.state.instance().gen, - old_runtime.gen - ))); - } - - self.state.set_migration_ids(ids, Utc::now()); - - // If we set migration IDs and are the migration source, ensure that we - // will perform the correct state transitions to simulate a successful - // migration. - if ids.is_some() { - let role = self - .state - .migration() - .expect( - "we just got a `put_migration_ids` request with `Some` IDs, \ - so we should have a migration" - ) - .role; - if role == MigrationRole::Source { - self.queue_successful_migration(MigrationRole::Source) - } - } - - Ok(self.state.sled_instance_state()) - } } /// A simulation of an Instance created by the external Oxide API. @@ -437,13 +422,14 @@ impl SimInstance { self.inner.lock().unwrap().terminate() } - pub async fn put_migration_ids( + pub(crate) fn set_simulated_migration_source( &self, - old_runtime: &InstanceRuntimeState, - ids: &Option, - ) -> Result { - let mut inner = self.inner.lock().unwrap(); - inner.put_migration_ids(old_runtime, ids) + migration: SimulateMigrationSource, + ) { + self.inner + .lock() + .unwrap() + .queue_migration_out(migration.migration_id, migration.result); } } @@ -466,9 +452,9 @@ impl Simulatable for SimInstance { SimInstance { inner: Arc::new(Mutex::new(SimInstanceInner { state: InstanceStates::new( - current.instance_state, current.vmm_state, current.propolis_id, + current.migration_in.map(|m| m.migration_id), ), last_response: InstanceStateMonitorResponse { gen: 1, diff --git a/sled-agent/src/sim/sled_agent.rs b/sled-agent/src/sim/sled_agent.rs index 9acfa24b3d..79d57a42e6 100644 --- a/sled-agent/src/sim/sled_agent.rs +++ b/sled-agent/src/sim/sled_agent.rs @@ -7,14 +7,14 @@ use super::collection::{PokeMode, SimCollection}; use super::config::Config; use super::disk::SimDisk; -use super::instance::SimInstance; +use super::instance::{self, SimInstance}; use super::storage::CrucibleData; use super::storage::Storage; use crate::nexus::NexusClient; use crate::params::{ DiskStateRequested, InstanceExternalIpBody, InstanceHardware, - InstanceMetadata, InstanceMigrationSourceParams, InstancePutStateResponse, - InstanceStateRequested, InstanceUnregisterResponse, + InstanceMetadata, InstancePutStateResponse, InstanceStateRequested, + InstanceUnregisterResponse, }; use crate::sim::simulatable::Simulatable; use crate::updates::UpdateManager; @@ -30,7 +30,7 @@ use omicron_common::api::external::{ ByteCount, DiskState, Error, Generation, ResourceType, }; use omicron_common::api::internal::nexus::{ - DiskRuntimeState, SledInstanceState, + DiskRuntimeState, MigrationRuntimeState, MigrationState, SledInstanceState, }; use omicron_common::api::internal::nexus::{ InstanceRuntimeState, VmmRuntimeState, @@ -368,15 +368,24 @@ impl SledAgent { } } + let migration_in = instance_runtime.migration_id.map(|migration_id| { + MigrationRuntimeState { + migration_id, + state: MigrationState::Pending, + gen: Generation::new(), + time_updated: chrono::Utc::now(), + } + }); + let instance_run_time_state = self .instances .sim_ensure( &instance_id.into_untyped_uuid(), SledInstanceState { - instance_state: instance_runtime, vmm_state: vmm_runtime, propolis_id, - migration_state: None, + migration_in, + migration_out: None, }, None, ) @@ -540,6 +549,24 @@ impl SledAgent { Ok(instance.current()) } + pub async fn instance_simulate_migration_source( + &self, + instance_id: InstanceUuid, + migration: instance::SimulateMigrationSource, + ) -> Result<(), HttpError> { + let instance = self + .instances + .sim_get_cloned_object(&instance_id.into_untyped_uuid()) + .await + .map_err(|_| { + crate::sled_agent::Error::Instance( + crate::instance_manager::Error::NoSuchInstance(instance_id), + ) + })?; + instance.set_simulated_migration_source(migration); + Ok(()) + } + pub async fn set_instance_ensure_state_error(&self, error: Option) { *self.instance_ensure_state_error.lock().await = error; } @@ -563,20 +590,6 @@ impl SledAgent { Ok(()) } - pub async fn instance_put_migration_ids( - self: &Arc, - instance_id: InstanceUuid, - old_runtime: &InstanceRuntimeState, - migration_ids: &Option, - ) -> Result { - let instance = self - .instances - .sim_get_cloned_object(&instance_id.into_untyped_uuid()) - .await?; - - instance.put_migration_ids(old_runtime, migration_ids).await - } - /// Idempotently ensures that the given API Disk (described by `api_disk`) /// is attached (or not) as specified. This simulates disk attach and /// detach, similar to instance boot and halt. @@ -601,8 +614,8 @@ impl SledAgent { self.disks.size().await } - pub async fn instance_poke(&self, id: InstanceUuid) { - self.instances.sim_poke(id.into_untyped_uuid(), PokeMode::Drain).await; + pub async fn instance_poke(&self, id: InstanceUuid, mode: PokeMode) { + self.instances.sim_poke(id.into_untyped_uuid(), mode).await; } pub async fn disk_poke(&self, id: Uuid) { diff --git a/sled-agent/src/sled_agent.rs b/sled-agent/src/sled_agent.rs index 8fa18b0a63..f8454a0f7b 100644 --- a/sled-agent/src/sled_agent.rs +++ b/sled-agent/src/sled_agent.rs @@ -18,9 +18,9 @@ use crate::nexus::{ }; use crate::params::{ DiskStateRequested, InstanceExternalIpBody, InstanceHardware, - InstanceMetadata, InstanceMigrationSourceParams, InstancePutStateResponse, - InstanceStateRequested, InstanceUnregisterResponse, OmicronZoneTypeExt, - TimeSync, VpcFirewallRule, ZoneBundleMetadata, Zpool, + InstanceMetadata, InstancePutStateResponse, InstanceStateRequested, + InstanceUnregisterResponse, OmicronZoneTypeExt, TimeSync, VpcFirewallRule, + ZoneBundleMetadata, Zpool, }; use crate::probe_manager::ProbeManager; use crate::services::{self, ServiceManager}; @@ -1011,23 +1011,6 @@ impl SledAgent { .map_err(|e| Error::Instance(e)) } - /// Idempotently ensures that the instance's runtime state contains the - /// supplied migration IDs, provided that the caller continues to meet the - /// conditions needed to change those IDs. See the doc comments for - /// [`crate::params::InstancePutMigrationIdsBody`]. - pub async fn instance_put_migration_ids( - &self, - instance_id: InstanceUuid, - old_runtime: &InstanceRuntimeState, - migration_ids: &Option, - ) -> Result { - self.inner - .instances - .put_migration_ids(instance_id, old_runtime, migration_ids) - .await - .map_err(|e| Error::Instance(e)) - } - /// Idempotently ensures that an instance's OPTE/port state includes the /// specified external IP address. /// diff --git a/smf/nexus/multi-sled/config-partial.toml b/smf/nexus/multi-sled/config-partial.toml index 396e3615b2..c502c20b1b 100644 --- a/smf/nexus/multi-sled/config-partial.toml +++ b/smf/nexus/multi-sled/config-partial.toml @@ -64,6 +64,7 @@ instance_watcher.period_secs = 30 abandoned_vmm_reaper.period_secs = 60 saga_recovery.period_secs = 600 lookup_region_port.period_secs = 60 +instance_updater.period_secs = 30 [default_region_allocation_strategy] # by default, allocate across 3 distinct sleds diff --git a/smf/nexus/single-sled/config-partial.toml b/smf/nexus/single-sled/config-partial.toml index df49476eed..30a0243122 100644 --- a/smf/nexus/single-sled/config-partial.toml +++ b/smf/nexus/single-sled/config-partial.toml @@ -64,6 +64,7 @@ instance_watcher.period_secs = 30 abandoned_vmm_reaper.period_secs = 60 saga_recovery.period_secs = 600 lookup_region_port.period_secs = 60 +instance_updater.period_secs = 30 [default_region_allocation_strategy] # by default, allocate without requirement for distinct sleds.