Skip to content

Commit

Permalink
[nexus] handle sled-agent errors as described in RFD 486 (#6455)
Browse files Browse the repository at this point in the history
[RFD 486] describes a principled design for how to use the Failed state
for instances and VMMs. In particular, it proposes that:

- Instances and VMMs should move to the `Failed` state independently.
- A VMM goes to `Failed` if either:
  + Nexus observes that a VMM it thought was resident on a sled is no
    longer resident there, or,
  + it observes through some other means (sled expungement, sled reboot)
    that all VMMs that were resident on that sled must now be gone.
- Instances go to `Failed` when an `instance-update` saga observes that
  an instance's active VMM has been marked `Failed`. An update saga
  transitioning an instance to failed cleans up the instance's resource
  allocations, similarly to how update sagas handle transitioning an
  instance to `Destroyed` when its active VMM has been destroyed.
- For purposes of the external API:
  + An instance that is itself in the `Failed` state appears externally
    to be `Failed`.
  + If an instance in the "has-VMM" internal state and points to a
    `Failed` VMM, the instance is `Stopping` and then transitions to
    `Failed`.
  + Once the actual instance (not just its active VMM) is `Failed`, it
    can be started or destroyed.

This branch implements the behavior described above.

In particular, I've added code to the `instance-update` saga to handle
instances whose active or migration target VMMs have transitioned to
`Failed`, similarly to how it handles the VMMs transitioning to
`Destroyed`. I've changed the code that detects whether a sled-agent
error indicates that an instance has failed to only do so when the error
is a 404 with the specific error code that indicates that the sled-agent
has forgotten about a VMM that it previously knew about, and changed the
Nexus functions that ask sled-agents to  update an instance's state, and
the `instance_watcher` background task, to mark instances as `Failed`
when they encounter only the appropriate sled-agent error. And, I've
changed the `mark_instance_failed` function in Nexus to instead mark the
VMM record as failed and trigger an instance-update saga (and it's now
named `mark_vmm_failed`), and made similar changes to the
`instance_watcher` background task. Finally, I've also made sure that
`Failed` instances can be destroyed and restarted, and that instances
with `Failed` active VMMs appear to be `Stopping` until the VMM is
cleaned up by an update saga.

In addition to this, it was necessary to (re)introduce a
`VmmState::Creating` variant. The purpose of this variant is to
differentiate between VMMs which have been created in the database but
do not yet exist on a sled, from those which are known to sled-agent and
are actually starting/migrating.

This is necessary because presently, the `instance_watcher` background
task will attempt to perform health checks for `Starting` and
`Migrating` VMMs, and --- if the sled-agent does not actually know about
the VMM yet --- will move it to `Failed`. But, when a VMM is newly
created, the sled-agent won't know about it yet until it has been
registered with the sled-agent, so if the `instance_watcher` task
activates while an `instance-start` or `instance-migrate` saga is in
flight, the `instance-watcher` task may incorrectly move its VMM to
`Failed` if the VMM record has been created but not yet registered. Now,
after adding the `Creating` variant, we can avoid performing health
checks for those VMMs until the sled-agent actually acknowledges a
request to register the VMM.

Besides the behavior implemented here, RFD 486 also proposes that the
`boot_on_fault` field in the instance record be used by Nexus to
determine whether to automatically attempt to restart a `Failed`
instance. This is described in issues #6491and #4872. I'm planning
to implement this in a separate branch, PR #6503.

[RFD 486]: https://rfd.shared.oxide.computer/rfd/0486#_determinations
  • Loading branch information
hawkw authored Sep 2, 2024
1 parent 61e1b38 commit 46eb2b4
Show file tree
Hide file tree
Showing 21 changed files with 963 additions and 429 deletions.
11 changes: 11 additions & 0 deletions common/src/api/internal/nexus.rs
Original file line number Diff line number Diff line change
Expand Up @@ -102,6 +102,17 @@ pub enum VmmState {
Destroyed,
}

impl VmmState {
/// States in which the VMM no longer exists and must be cleaned up.
pub const TERMINAL_STATES: &'static [Self] =
&[Self::Failed, Self::Destroyed];

/// Returns `true` if this VMM is in a terminal state.
pub fn is_terminal(&self) -> bool {
Self::TERMINAL_STATES.contains(self)
}
}

/// The dynamic runtime properties of an individual VMM process.
#[derive(Clone, Debug, Deserialize, Serialize, JsonSchema)]
pub struct VmmRuntimeState {
Expand Down
10 changes: 9 additions & 1 deletion dev-tools/omdb/src/bin/omdb/nexus.rs
Original file line number Diff line number Diff line change
Expand Up @@ -1394,6 +1394,9 @@ fn print_task_details(bgtask: &BackgroundTask, details: &serde_json::Value) {
/// number of instances found with destroyed active VMMs
destroyed_active_vmms: usize,

/// number of instances found with failed active VMMs
failed_active_vmms: usize,

/// number of instances found with terminated active migrations
terminated_active_migrations: usize,

Expand All @@ -1419,6 +1422,7 @@ fn print_task_details(bgtask: &BackgroundTask, details: &serde_json::Value) {
),
Ok(UpdaterStatus {
destroyed_active_vmms,
failed_active_vmms,
terminated_active_migrations,
sagas_started,
sagas_completed,
Expand All @@ -1436,9 +1440,13 @@ fn print_task_details(bgtask: &BackgroundTask, details: &serde_json::Value) {
destroyed_active_vmms + terminated_active_migrations
);
println!(
" instances with destroyed active VMMs: {}",
" instances with Destroyed active VMMs: {}",
destroyed_active_vmms,
);
println!(
" instances with Failed active VMMs: {}",
failed_active_vmms,
);
println!(
" instances with terminated active migrations: {}",
terminated_active_migrations,
Expand Down
6 changes: 4 additions & 2 deletions dev-tools/omdb/tests/successes.out
Original file line number Diff line number Diff line change
Expand Up @@ -524,7 +524,8 @@ task: "instance_updater"
last completed activation: <REDACTED ITERATIONS>, triggered by a periodic timer firing
started at <REDACTED TIMESTAMP> (<REDACTED DURATION>s ago) and ran for <REDACTED DURATION>ms
total instances in need of updates: 0
instances with destroyed active VMMs: 0
instances with Destroyed active VMMs: 0
instances with Failed active VMMs: 0
instances with terminated active migrations: 0
update sagas started: 0
update sagas completed successfully: 0
Expand Down Expand Up @@ -950,7 +951,8 @@ task: "instance_updater"
last completed activation: <REDACTED ITERATIONS>, triggered by a periodic timer firing
started at <REDACTED TIMESTAMP> (<REDACTED DURATION>s ago) and ran for <REDACTED DURATION>ms
total instances in need of updates: 0
instances with destroyed active VMMs: 0
instances with Destroyed active VMMs: 0
instances with Failed active VMMs: 0
instances with terminated active migrations: 0
update sagas started: 0
update sagas completed successfully: 0
Expand Down
3 changes: 2 additions & 1 deletion nexus/db-model/src/schema_versions.rs
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ use std::collections::BTreeMap;
///
/// This must be updated when you change the database schema. Refer to
/// schema/crdb/README.adoc in the root of this repository for details.
pub const SCHEMA_VERSION: SemverVersion = SemverVersion::new(93, 0, 0);
pub const SCHEMA_VERSION: SemverVersion = SemverVersion::new(94, 0, 0);

/// List of all past database schema versions, in *reverse* order
///
Expand All @@ -29,6 +29,7 @@ static KNOWN_VERSIONS: Lazy<Vec<KnownVersion>> = Lazy::new(|| {
// | leaving the first copy as an example for the next person.
// v
// KnownVersion::new(next_int, "unique-dirname-with-the-sql-files"),
KnownVersion::new(94, "put-back-creating-vmm-state"),
KnownVersion::new(93, "dataset-kinds-zone-and-debug"),
KnownVersion::new(92, "lldp-link-config-nullable"),
KnownVersion::new(91, "add-management-gateway-producer-kind"),
Expand Down
16 changes: 4 additions & 12 deletions nexus/db-model/src/vmm.rs
Original file line number Diff line number Diff line change
Expand Up @@ -60,27 +60,19 @@ pub struct Vmm {
pub runtime: VmmRuntimeState,
}

/// The set of states that a VMM can have when it is created.
pub enum VmmInitialState {
Starting,
Migrating,
}

impl Vmm {
/// Creates a new VMM record.
///
/// The new VMM record will be in [`VmmState::Creating`] until it is
/// registered with a sled-agent.
pub fn new(
id: PropolisUuid,
instance_id: InstanceUuid,
sled_id: SledUuid,
propolis_ip: ipnetwork::IpNetwork,
propolis_port: u16,
initial_state: VmmInitialState,
) -> Self {
let now = Utc::now();
let state = match initial_state {
VmmInitialState::Starting => VmmState::Starting,
VmmInitialState::Migrating => VmmState::Migrating,
};

Self {
id: id.into_untyped_uuid(),
Expand All @@ -91,7 +83,7 @@ impl Vmm {
propolis_ip,
propolis_port: SqlU16(propolis_port),
runtime: VmmRuntimeState {
state,
state: VmmState::Creating,
time_state_updated: now,
gen: Generation::new(),
},
Expand Down
61 changes: 54 additions & 7 deletions nexus/db-model/src/vmm_state.rs
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
// file, You can obtain one at https://mozilla.org/MPL/2.0/.

use super::impl_enum_type;
use omicron_common::api::internal::nexus::VmmState as ApiState;
use serde::Deserialize;
use serde::Serialize;
use std::fmt;
Expand All @@ -16,6 +17,7 @@ impl_enum_type!(
#[diesel(sql_type = VmmStateEnum)]
pub enum VmmState;

Creating => b"creating"
Starting => b"starting"
Running => b"running"
Stopping => b"stopping"
Expand All @@ -30,6 +32,7 @@ impl_enum_type!(
impl VmmState {
pub fn label(&self) -> &'static str {
match self {
VmmState::Creating => "creating",
VmmState::Starting => "starting",
VmmState::Running => "running",
VmmState::Stopping => "stopping",
Expand All @@ -45,7 +48,26 @@ impl VmmState {
/// States in which it is safe to deallocate a VMM's sled resources and mark
/// it as deleted.
pub const DESTROYABLE_STATES: &'static [Self] =
&[Self::Destroyed, Self::SagaUnwound];
&[Self::Destroyed, Self::Failed, Self::SagaUnwound];

pub const TERMINAL_STATES: &'static [Self] =
&[Self::Destroyed, Self::Failed];

/// States in which a VMM record is present in the database but is not
/// resident on a sled, either because it does not yet exist, was produced
/// by an unwound update saga and will never exist, or has already been
/// destroyed.
pub const NONEXISTENT_STATES: &'static [Self] =
&[Self::Creating, Self::SagaUnwound, Self::Destroyed];

pub fn is_terminal(&self) -> bool {
Self::TERMINAL_STATES.contains(self)
}
/// Returns `true` if the instance is in a state in which it exists on a
/// sled.
pub fn exists_on_sled(&self) -> bool {
!Self::NONEXISTENT_STATES.contains(self)
}
}

impl fmt::Display for VmmState {
Expand All @@ -58,7 +80,9 @@ impl From<VmmState> for omicron_common::api::internal::nexus::VmmState {
fn from(value: VmmState) -> Self {
use omicron_common::api::internal::nexus::VmmState as Output;
match value {
VmmState::Starting => Output::Starting,
// The `Creating` state is internal to Nexus; the outside world
// should treat it as equivalent to `Starting`.
VmmState::Creating | VmmState::Starting => Output::Starting,
VmmState::Running => Output::Running,
VmmState::Stopping => Output::Stopping,
VmmState::Stopped => Output::Stopped,
Expand All @@ -74,7 +98,9 @@ impl From<VmmState> for sled_agent_client::types::VmmState {
fn from(value: VmmState) -> Self {
use sled_agent_client::types::VmmState as Output;
match value {
VmmState::Starting => Output::Starting,
// The `Creating` state is internal to Nexus; the outside world
// should treat it as equivalent to `Starting`.
VmmState::Creating | VmmState::Starting => Output::Starting,
VmmState::Running => Output::Running,
VmmState::Stopping => Output::Stopping,
VmmState::Stopped => Output::Stopped,
Expand All @@ -86,9 +112,8 @@ impl From<VmmState> for sled_agent_client::types::VmmState {
}
}

impl From<omicron_common::api::internal::nexus::VmmState> for VmmState {
fn from(value: omicron_common::api::internal::nexus::VmmState) -> Self {
use omicron_common::api::internal::nexus::VmmState as ApiState;
impl From<ApiState> for VmmState {
fn from(value: ApiState) -> Self {
use VmmState as Output;
match value {
ApiState::Starting => Output::Starting,
Expand All @@ -108,7 +133,12 @@ impl From<VmmState> for omicron_common::api::external::InstanceState {
use omicron_common::api::external::InstanceState as Output;

match value {
VmmState::Starting => Output::Starting,
// An instance with a VMM which is in the `Creating` state maps to
// `InstanceState::Starting`, rather than `InstanceState::Creating`.
// If we are still creating the VMM, this is because we are
// attempting to *start* the instance; instances may be created
// without creating a VMM to run them, and then started later.
VmmState::Creating | VmmState::Starting => Output::Starting,
VmmState::Running => Output::Running,
VmmState::Stopping => Output::Stopping,
// `SagaUnwound` should map to `Stopped` so that an `instance_view`
Expand All @@ -129,3 +159,20 @@ impl diesel::query_builder::QueryId for VmmStateEnum {
type QueryId = ();
const HAS_STATIC_QUERY_ID: bool = false;
}

#[cfg(test)]
mod tests {
use super::*;

#[test]
fn test_all_terminal_api_states_are_terminal_db_states() {
for &api_state in ApiState::TERMINAL_STATES {
let db_state = VmmState::from(api_state);
assert!(
db_state.is_terminal(),
"API VmmState::{api_state:?} is considered terminal, but its \
corresponding DB state ({db_state:?}) is not!"
);
}
}
}
29 changes: 22 additions & 7 deletions nexus/db-queries/src/db/datastore/instance.rs
Original file line number Diff line number Diff line change
Expand Up @@ -156,6 +156,13 @@ impl InstanceAndActiveVmm {
(InstanceState::Vmm, Some(VmmState::SagaUnwound)) => {
external::InstanceState::Stopped
}
// - An instance with a "failed" VMM should *not* be counted as
// failed until the VMM is unlinked, because a start saga must be
// able to run "failed" instance. Until then, it will continue to
// appear "stopping".
(InstanceState::Vmm, Some(VmmState::Failed)) => {
external::InstanceState::Stopping
}
// - An instance with no VMM is always "stopped" (as long as it's
// not "starting" etc.)
(InstanceState::NoVmm, _vmm_state) => {
Expand Down Expand Up @@ -360,21 +367,21 @@ impl DataStore {
.collect())
}

/// List all instances with active VMMs in the `Destroyed` state that don't
/// have currently-running instance-updater sagas.
/// List all instances with active VMMs in the provided [`VmmState`] which
/// don't have currently-running instance-updater sagas.
///
/// This is used by the `instance_updater` background task to ensure that
/// update sagas are scheduled for these instances.
pub async fn find_instances_with_destroyed_active_vmms(
pub async fn find_instances_by_active_vmm_state(
&self,
opctx: &OpContext,
vmm_state: VmmState,
) -> ListResultVec<Instance> {
use db::model::VmmState;
use db::schema::instance::dsl;
use db::schema::vmm::dsl as vmm_dsl;

vmm_dsl::vmm
.filter(vmm_dsl::state.eq(VmmState::Destroyed))
.filter(vmm_dsl::state.eq(vmm_state))
// If the VMM record has already been deleted, we don't need to do
// anything about it --- someone already has.
.filter(vmm_dsl::time_deleted.is_null())
Expand Down Expand Up @@ -734,7 +741,7 @@ impl DataStore {
format!(
"cannot set migration ID {migration_id} for instance \
{instance_id} (perhaps another migration ID is \
already present): {error:#}"
already present): {error:#?}"
),
),
})
Expand Down Expand Up @@ -818,7 +825,15 @@ impl DataStore {
vmm_dsl::vmm
.on(vmm_dsl::sled_id
.eq(sled_dsl::id)
.and(vmm_dsl::time_deleted.is_null()))
.and(vmm_dsl::time_deleted.is_null())
// Ignore instances which are in states that are not
// known to exist on a sled. Since this query drives
// instance-watcher health checking, it is not necessary
// to perform health checks for VMMs that don't actually
// exist in real life.
.and(
vmm_dsl::state.ne_all(VmmState::NONEXISTENT_STATES),
))
.inner_join(
instance_dsl::instance
.on(instance_dsl::id
Expand Down
49 changes: 45 additions & 4 deletions nexus/db-queries/src/db/datastore/vmm.rs
Original file line number Diff line number Diff line change
Expand Up @@ -82,11 +82,9 @@ impl DataStore {
opctx: &OpContext,
vmm_id: &PropolisUuid,
) -> UpdateResult<bool> {
let valid_states = vec![DbVmmState::Destroyed, DbVmmState::Failed];

let updated = diesel::update(dsl::vmm)
.filter(dsl::id.eq(vmm_id.into_untyped_uuid()))
.filter(dsl::state.eq_any(valid_states))
.filter(dsl::state.eq_any(DbVmmState::DESTROYABLE_STATES))
.filter(dsl::time_deleted.is_null())
.set(dsl::time_deleted.eq(Utc::now()))
.check_if_exists::<Vmm>(vmm_id.into_untyped_uuid())
Expand Down Expand Up @@ -307,6 +305,49 @@ impl DataStore {
})
}

/// Transitions a VMM to the `SagaUnwound` state.
///
/// # Warning
///
/// This may *only* be called by the saga that created a VMM record, as it
/// unconditionally increments the generation number and advances the VMM to
/// the `SagaUnwound` state.
///
/// This is necessary as it is executed in compensating actions for
/// unwinding saga nodes which cannot easily determine whether other
/// actions, which advance the VMM's generation, have executed before the
/// saga unwound.
pub async fn vmm_mark_saga_unwound(
&self,
opctx: &OpContext,
vmm_id: &PropolisUuid,
) -> Result<bool, Error> {
diesel::update(dsl::vmm)
.filter(dsl::time_deleted.is_null())
.filter(dsl::id.eq(vmm_id.into_untyped_uuid()))
.set((
dsl::state.eq(DbVmmState::SagaUnwound),
dsl::time_state_updated.eq(chrono::Utc::now()),
dsl::state_generation.eq(dsl::state_generation + 1),
))
.check_if_exists::<Vmm>(vmm_id.into_untyped_uuid())
.execute_and_check(&*self.pool_connection_authorized(opctx).await?)
.await
.map(|r| match r.status {
UpdateStatus::Updated => true,
UpdateStatus::NotUpdatedButExists => false,
})
.map_err(|e| {
public_error_from_diesel(
e,
ErrorHandler::NotFoundByLookup(
ResourceType::Vmm,
LookupType::ById(vmm_id.into_untyped_uuid()),
),
)
})
}

/// Forcibly overwrites the Propolis IP/Port in the supplied VMM's record with
/// the supplied Propolis IP.
///
Expand Down Expand Up @@ -357,7 +398,7 @@ impl DataStore {

paginated(dsl::vmm, dsl::id, pagparams)
// In order to be considered "abandoned", a VMM must be:
// - in the `Destroyed` or `SagaUnwound` state
// - in the `Destroyed`, `SagaUnwound`, or `Failed` states
.filter(dsl::state.eq_any(DbVmmState::DESTROYABLE_STATES))
// - not deleted yet
.filter(dsl::time_deleted.is_null())
Expand Down
Loading

0 comments on commit 46eb2b4

Please sign in to comment.