Skip to content

Commit

Permalink
[nexus] Consider SagaUnwound instances Failed
Browse files Browse the repository at this point in the history
Currently, instances whose active VMM is `SagaUnwound` appear externally
as `Stopped`. We decided to report them as `Stopped` because start sagas
are permitted to run for instances with `SagaUnwound` active VMMs, and
--- at the time when the `SagaUnwound` VMM state was introduced,
`Failed` instances could not be started. However, #6455 added the
ability to restart `Failed` instances, and #6652 will permit them to be
stopped. Therefore, we should recast instances with `SagaUnwound` active
VMMs as `Failed`: they weren't asked politely to stop; instead, we
attempted to start them and something went wrong...which sounds like
`Failed` to me.

This becomes more important in light of #6638: if we will attempt
automatically restart such instances, they should definitely appear
to be `Failed`. The distinction between `Failed` and `Stopped` becomes
that `Failed` means "this thing isn't running, but it's supposed to be;
we may try to fix that for you if permitted to do so", while `Stopped`
means "this thing isn't running and that's fine, because you asked for
it to no longer be running". Thus, this commit changes `SagaUnwound`
VMMs to appear `Failed` externally.
  • Loading branch information
hawkw committed Sep 24, 2024
1 parent eb4d5a5 commit 8b5c967
Showing 1 changed file with 9 additions and 9 deletions.
18 changes: 9 additions & 9 deletions nexus/db-queries/src/db/datastore/instance.rs
Original file line number Diff line number Diff line change
Expand Up @@ -154,20 +154,20 @@ impl InstanceAndActiveVmm {
InstanceState::Vmm,
Some(VmmState::Stopped | VmmState::Destroyed),
) => external::InstanceState::Stopping,
// - An instance with a "saga unwound" VMM, on the other hand, can
// be treated as "stopped", since --- unlike "destroyed" --- a new
// start saga can run at any time by just clearing out the old VMM
// ID.
(InstanceState::Vmm, Some(VmmState::SagaUnwound)) => {
external::InstanceState::Stopped
}
// - An instance with a "failed" VMM should *not* be counted as
// failed until the VMM is unlinked, because a start saga must be
// able to run "failed" instance. Until then, it will continue to
// appear "stopping".
// able to run for a "failed" instance. Until then, it will
// continue to appear "stopping".
(InstanceState::Vmm, Some(VmmState::Failed)) => {
external::InstanceState::Stopping
}
// - An instance with a "saga unwound" VMM, on the other hand, can
// be treated as "failed", since --- unlike an instance with a
// "failed" active VMM --- a new start saga can run at any time by
// just clearing out the old VMM ID.
(InstanceState::Vmm, Some(VmmState::SagaUnwound)) => {
external::InstanceState::Failed
}
// - An instance with no VMM is always "stopped" (as long as it's
// not "starting" etc.)
(InstanceState::NoVmm, _vmm_state) => {
Expand Down

0 comments on commit 8b5c967

Please sign in to comment.