Skip to content

Commit

Permalink
allow failed instances to be restarted, don't go to failed until rest…
Browse files Browse the repository at this point in the history
…artable
  • Loading branch information
hawkw committed Aug 29, 2024
1 parent 1388532 commit aa9eaf5
Show file tree
Hide file tree
Showing 2 changed files with 16 additions and 7 deletions.
7 changes: 7 additions & 0 deletions nexus/db-queries/src/db/datastore/instance.rs
Original file line number Diff line number Diff line change
Expand Up @@ -156,6 +156,13 @@ impl InstanceAndActiveVmm {
(InstanceState::Vmm, Some(VmmState::SagaUnwound)) => {
external::InstanceState::Stopped
}
// - An instance with a "failed" VMM should *not* be counted as
// failed until the VMM is unlinked, because a start saga must be
// able to run "failed" instance. Until then, it will continue to
// appear "stopping".
(InstanceState::Vmm, Some(VmmState::Failed)) => {
external::InstanceState::Stopping
}
// - An instance with no VMM is always "stopped" (as long as it's
// not "starting" etc.)
(InstanceState::NoVmm, _vmm_state) => {
Expand Down
16 changes: 9 additions & 7 deletions nexus/src/app/instance.rs
Original file line number Diff line number Diff line change
Expand Up @@ -1980,7 +1980,7 @@ fn instance_start_allowed(

Ok(InstanceStartDisposition::AlreadyStarted)
}
InstanceState::Stopped => {
s @ InstanceState::Stopped | s @ InstanceState::Failed => {
match vmm.as_ref() {
// If a previous start saga failed and left behind a VMM in the
// SagaUnwound state, allow a new start saga to try to overwrite
Expand All @@ -1995,18 +1995,20 @@ fn instance_start_allowed(
Ok(InstanceStartDisposition::Start)
}
// This shouldn't happen: `InstanceAndVmm::effective_state` should
// only return `Stopped` if there is no active VMM or if the VMM is
// `SagaUnwound`.
// only return `Stopped` or `Failed` if there is no active VMM
// or if the VMM is `SagaUnwound`.
Some(vmm) => {
error!(log,
"instance is stopped but still has an active VMM";
"instance is {s:?} but still has an active VMM";
"instance_id" => %instance.id(),
"propolis_id" => %vmm.id,
"propolis_state" => ?vmm.runtime.state);

Err(Error::internal_error(
"instance is stopped but still has an active VMM",
))
Err(Error::InternalError {
internal_message: format!(
"instance is {s:?} but still has an active VMM"
),
})
}
// Ah, it's actually stopped. We can restart it.
None => Ok(InstanceStartDisposition::Start),
Expand Down

0 comments on commit aa9eaf5

Please sign in to comment.