-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[nexus] Reincarnate instances with SagaUnwound
VMMs
#6669
Conversation
When merging this, we should also be sure to merge #6658, since otherwise, |
since we print these in OMDB, it breaks the success cases expectorate tests to use unordered hashmaps...
i dont know whats wrong with me
Co-authored-by: Greg Colombo <[email protected]>
Well that's extremely spooky, it looks like this worked fine on commit 0b7f72e but then somehow broke on commit 8f89106: https://buildomat.eng.oxide.computer/wg/0/details/01J8T6F6B4TYVZVGS9NVY6RXJ8/m4ivC9CI7YNrcLE1S1dUTosEDmIl3bfax3fd4qNVIe7XiKua/01J8T6G2PKB9TADGAMV5DAR8R8 |
(also, it occurred to me that we probably want to make unwinding start sagas check if they should immediately kick the reincarnation task...) |
Aaaand it passes on my machine:
I bet this is a race between periodic and explicit activations of the reincarnation task. Cool. |
0645a37
to
19f9f16
Compare
When an
instance-start
saga unwinds, any VMM it created transitions tothe
SagaUnwound
state. This causes the instance's effective state toappear as
Failed
in the external API. PR #6503 added functionality toNexus to automatically restart instances that are in the
Failed
state("instance reincarnation"). However, the current instance-reincarnation
task will not automatically restart instances whose instance-start
sagas have unwound, because such instances are not actually in the
Failed
state from Nexus' perspective.This PR implements reincarnation for instances whose
instance-start
sagas have failed. This is done by changing the
instance_reincarnation
background task to query the database for instances which have
SagaUnwound
active VMMs, and then runinstance-start
sagas for themidentically to how it runs start sagas for
Failed
instances.I decided to perform two separate queries to list
Failed
instances andto list instances with
SagaUnwound
VMMs, because theSagaUnwound
query requires a join with the
vmm
table, and I thought it was a bitnicer to be able to find
Failed
instances without having to do thejoin, and only do it when looking for
SagaUnwound
ones. Also, havingtwo queries makes it easier to distinguish between
Failed
andSagaUnwound
instances in logging and the OMDB status output. Thisended up being implemented by adding a parameter to the
DataStore::find_reincarnatable_instances
method that indicates whichcategory of instances to select; I had previously considered making the
method on the
InstanceReincarnation
struct that finds instances andreincarnates them take the query as a
Fn
taking the datastore andDataPageParams
and returning animpl Future
outputtingResult<Vec<Instance>, ...>
,but figuring out generic lifetimes for thepagination stuff was annoying enough that this felt like the simpler
choice.
Fixes #6638