-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Failed instances should be allowed to stop and restart #2825
Comments
One of the things I think will be helpful is to avoid automatically setting instance state to
|
PR #6503 changed Nexus to attempt to automatically restart instances which are in the `Failed` state. Now that we do this, we should probably change the allowable instance state transitions to permit a user to stop an instance that is `Failed`, as a way to say "stop trying to restart this instance" (as `Stopped` instances are not restarted). This branch changes `Nexus::instance_request_state` and `select_instance_change_action` to permit stopping a `Failed` instance. Fixes #6640 I believe this also fixes #2825, along with #6455 (which allowed restarting `Failed` instances).
#6455 allowed failed instances to be restarted. I'm currently working on allowing them to be stopped as well. |
PR #6503 changed Nexus to attempt to automatically restart instances which are in the `Failed` state. Now that we do this, we should probably change the allowable instance state transitions to permit a user to stop an instance that is `Failed`, as a way to say "stop trying to restart this instance" (as `Stopped` instances are not restarted). This branch changes `Nexus::instance_request_state` and `select_instance_change_action` to permit stopping a `Failed` instance. Fixes #6640 I believe this also fixes #2825, along with #6455 (which allowed restarting `Failed` instances).
Currently Nexus accepts no attempts to change the state of a Failed instance:
omicron/nexus/src/app/instance.rs
Lines 395 to 416 in ee0aac0
There are plenty of reasons an instance could move to the Failed state (e.g. a failure to start the VM in Propolis, a heartbeat failure like those discussed in #2727, etc.). A VM user needs to be able to stop and attempt to restart a failed instance.
(Note that, on the Propolis end, once an instance has failed, it can't be restarted--the Propolis zone needs to be destroyed and recreated.)
The text was updated successfully, but these errors were encountered: