Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Propolis zone cleanup could happen outside of InstanceRunner #5237

Open
gjcolombo opened this issue Mar 9, 2024 · 0 comments
Open

Propolis zone cleanup could happen outside of InstanceRunner #5237

gjcolombo opened this issue Mar 9, 2024 · 0 comments
Labels
Sled Agent Related to the Per-Sled Configuration and Management virtualization Propolis Integration & VM Management

Comments

@gjcolombo
Copy link
Contributor

Every instance managed by a sled agent has a "runner" task and a "monitor" task. All requests to do anything with the instance (change its state, forcibly remove it from the sled agent, etc.) need to execute on the runner task, which handles these requests sequentially. This includes the processing of state change messages from Propolis, which are sent from the monitor task to the runner task and processed here:

Some(Update { state, tx }) => {
let observed = ObservedPropolisState::new(
self.state.instance(),
&state,
);
let reaction = self.observe_state(&observed).await;
self.publish_state_to_nexus().await;
// NOTE: If we fail to send here, the
// InstanceMonitorRunner has stopped running for
// some reason. We'd presumably handle that on the
// next iteration of the loop.
if let Err(_) = tx.send(reaction) {
warn!(self.log, "InstanceRunner failed to send to InstanceMonitorRunner");
}
},

If Propolis indicated that it shut down, InstanceRunner::observe_state will call InstanceRunner::terminate before returning and allowing new state to be published to Nexus:

// If the zone is now safe to terminate, tear it down and discard the
// instance ticket before returning and publishing the new instance
// state to Nexus. This ensures that the instance is actually gone from
// the sled when Nexus receives the state update saying it's actually
// destroyed.
match action {
Some(InstanceAction::Destroy) => {
info!(self.log, "terminating VMM that has exited";
"instance_id" => %self.id());
self.terminate().await;
Reaction::Terminate
}
None => Reaction::Continue,
}
}

Terminating an instance this way collects a zone bundle from the runner thread before removing the instance from the sled agent's InstanceManager and tearing down the zone:

// Take a zone bundle whenever this instance stops.
if let Err(e) = self
.zone_bundler
.create(
&running_state.running_zone,
ZoneBundleCause::TerminatedInstance,
)
.await
{
error!(
self.log,
"Failed to take zone bundle for terminated instance";
"zone_name" => &zname,
"reason" => ?e,
);
}
// Ensure that no zone exists. This succeeds even if no zone was ever
// created.
// NOTE: we call`Zones::halt_and_remove_logged` directly instead of
// `RunningZone::stop` in case we're called between creating the
// zone and assigning `running_state`.
warn!(self.log, "Halting and removing zone: {}", zname);
Zones::halt_and_remove_logged(&self.log, &zname).await.unwrap();
// Remove ourselves from the instance manager's map of instances.
self.instance_ticket.deregister();
// See if there are any runtime objects to clean up.
//
// We already removed the zone above but mark it as stopped
running_state.running_zone.stop().await.unwrap();
// Remove any OPTE ports from the port manager.
running_state.running_zone.release_opte_ports();

Creating a zone bundle may be an expensive operation, since it (potentially) has to copy and compress many different log files and command outputs. This causes a couple of problems:

  • Instance stop will appear to take much longer than it actually does, because the relevant state transition doesn't go to Nexus until the zone is destroyed.
  • More importantly, the instance remains in the sled agent's instance table, but the InstanceRunner is completely blocked while all this work is going on. This means that sled agent API calls targeting an instance in this state are highly likely to time out waiting for the runner to respond. If the caller is Nexus and the request is to change the instance's state, this can cause instances to be marked as Failed (due to the error conversion rules described in sled agent and Nexus frequently flatten errors into 500 Internal Server Error #3238) even though they would correctly go to Stopped if left alone.

The reason I made these operations happen in this order (zone bundle collection -> zone teardown -> deregister instance -> publish to Nexus) was to try to mitigate #3325. I suspect, though, that that issue is not as much of a problem now that every newly-started instance gets a fresh Propolis ID (not the case prior to #4194), such that every incarnation of an instance on a sled will get a distinct zone name. If that's so, then it should be possible to mitigate these problems by changing what happens on VMM shutdown: the runner can remove the instance from the table, publish the new VMM state to Nexus, and then hand the defunct zone off to some other task to be cleaned up.1

Footnotes

  1. Cleaning up the defunct zone outside of the InstanceRunner task allows that task to return immediately and produce a "runner task closed" error message for anyone who happened to have requested something of the runner while it was in the "observe state change" arm of its request handler. If zone cleanup happens on the runner, Nexus's instance state will still update right away, but calls that land in this window are much more likely to time out than to get a "clean" instance gone error.

@gjcolombo gjcolombo added Sled Agent Related to the Per-Sled Configuration and Management virtualization Propolis Integration & VM Management labels Mar 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Sled Agent Related to the Per-Sled Configuration and Management virtualization Propolis Integration & VM Management
Projects
None yet
Development

No branches or pull requests

1 participant