Skip to content

Commit

Permalink
update to track SledFailuresOnly policy
Browse files Browse the repository at this point in the history
2db6eff added a `SledFailuresOnly`
auto-restart policy in addition to `Never` and `AllFailures`. I
discussed the rationale for that in [this comment][1]. Currently, there
isn't a mechanism to detect whether an instance is `Failed` because the
individual instance crashed or because the whole sled was restarted, so
for now, we assume all failures are instance-level. But, we still need
to handle the new variant.

[1]: #6499 (comment)
  • Loading branch information
hawkw committed Sep 9, 2024
1 parent a4b862c commit 741af37
Show file tree
Hide file tree
Showing 2 changed files with 14 additions and 7 deletions.
11 changes: 8 additions & 3 deletions nexus/db-model/src/instance.rs
Original file line number Diff line number Diff line change
Expand Up @@ -242,9 +242,14 @@ impl InstanceRuntimeState {
match policy {
InstanceAutoRestart::Never => false,
InstanceAutoRestart::AllFailures => true,
// TODO(eliza): future auto-restart policies may
// require additional checks here, such as a limited restart
// budget...
// TODO(eliza): currently, we don't have the ability to determine
// whether an instance is failed because the sled it was on has
// rebooted, or because the individual Propolis VMM crashed. For
// now, we assume all failures are VMM failures rather than sled
// failures. In the future, we will need to determine if a failure
// was a sled-level or VMM-level failure, and use that here to
// determine whether or not the instance is restartable.
InstanceAutoRestart::SledFailuresOnly => false,
}
}
}
Expand Down
10 changes: 6 additions & 4 deletions nexus/src/app/background/tasks/instance_reincarnation.rs
Original file line number Diff line number Diff line change
Expand Up @@ -408,12 +408,14 @@ mod test {
let mut will_not_reincarnate = std::collections::BTreeSet::new();
// Some instances which are `Failed`` but don't have policies permitting
// them to be reincarnated.
for _ in 0..3 {
for policy in
[InstanceAutoRestart::Never, InstanceAutoRestart::SledFailuresOnly]
{
let id = create_instance(
&cptestctx,
&opctx,
&authz_project,
InstanceAutoRestart::Never,
policy,
InstanceState::Failed,
)
.await;
Expand All @@ -422,12 +424,12 @@ mod test {

// Some instances with policies permitting them to be reincarnated, but
// which are not `Failed`.
for _ in 0..3 {
for _ in 0..2 {
let id = create_instance(
&cptestctx,
&opctx,
&authz_project,
InstanceAutoRestart::Never,
InstanceAutoRestart::AllFailures,
InstanceState::NoVmm,
)
.await;
Expand Down

0 comments on commit 741af37

Please sign in to comment.