Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[nexus] want to allow stopping Failed instances #6640

Closed
hawkw opened this issue Sep 23, 2024 · 1 comment · Fixed by #6652
Closed

[nexus] want to allow stopping Failed instances #6640

hawkw opened this issue Sep 23, 2024 · 1 comment · Fixed by #6652
Assignees
Labels
nexus Related to nexus

Comments

@hawkw
Copy link
Member

hawkw commented Sep 23, 2024

PR #6503 changed Nexus to attempt to automatically restart instances which are in the Failed state. Now that we do this, we should probably change the allowable instance state transitions to permit a user to stop an instance that is Failed, as a way to say "stop trying to restart this instance" (as Stopped instances are not restarted).

This would have slightly different semantics from changing the instance's auto-restart policy using a future instance-reconfigure API. Stopping a Failed instance would mean "stop trying to restart this now; if it is later started and then transitions to Failed again, continue using whatever its auto-restart policy is", while changing the auto-restart policy would mean "don't try to automatically restart this even if it's eventually restarted again".1

If we do this, we should definitely also make SagaUnwound instances appear as Failed rather than Stopped, as I discussed in #6638 (comment).

Footnotes

  1. Unless, of course, the user changes the auto-restart policy again.

@hawkw hawkw added the nexus Related to nexus label Sep 23, 2024
@hawkw hawkw self-assigned this Sep 23, 2024
@hawkw
Copy link
Member Author

hawkw commented Sep 24, 2024

See also #2825.

hawkw added a commit that referenced this issue Sep 24, 2024
PR #6503 changed Nexus to attempt to automatically restart instances
which are in the `Failed` state. Now that we do this, we should probably
change the allowable instance state transitions to permit a user to stop
an instance that is `Failed`, as a way to say "stop trying to restart
this instance" (as `Stopped` instances are not restarted). This branch
changes `Nexus::instance_request_state` and
`select_instance_change_action` to permit stopping a `Failed` instance.

Fixes #6640

I believe this also fixes #2825, along with #6455 (which allowed
restarting `Failed` instances).
hawkw added a commit that referenced this issue Sep 24, 2024
PR #6503 changed Nexus to attempt to automatically restart instances
which are in the `Failed` state. Now that we do this, we should probably
change the allowable instance state transitions to permit a user to stop
an instance that is `Failed`, as a way to say "stop trying to restart
this instance" (as `Stopped` instances are not restarted). This branch
changes `Nexus::instance_request_state` and
`select_instance_change_action` to permit stopping a `Failed` instance.

Fixes #6640

I believe this also fixes #2825, along with #6455 (which allowed
restarting `Failed` instances).
@hawkw hawkw closed this as completed in 0c7fb27 Sep 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
nexus Related to nexus
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant