-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[nexus] nicer OMDB status for instance_updater
#6542
Conversation
While I was working on #6503, I became quite taken with the way @jmpesp had done the OMDB status for his region-replacement background tasks --- the approach of defining a struct in `nexus-types` representing the status JSON that both the background task implementation and OMDB could depend on seems much nicer than constructing an untyped JSON message and then deserializing it to a type that only exists in OMDB, because this way, OMDB and the background task implementation are kept in sync automatically. So, I thought it would be nice to also rework the `abandoned_vmm_reaper` task to use this style. This commit does that. While I was making this change, I also changed the status JSON to include a list of all errors that occurred during the task's activation, instead of just the last one, which will probably make it easier to debug any issues that come up. I also added some code for aligning the numeric columns in the OMDB output.
Depends on #6541. Similarly to #6541, this commit refactors the `instance_updater` background task's OMDB status implementation to use a shared Rust struct in `nexus-types` to represent the status JSON object. I also changed the status message to include a full list of errors that occurred, instead of just the most recent one and a count, and I made some tweaks to the output alignment in OMDB.
instances with terminated active migrations: 0 | ||
update sagas started: 0 | ||
update sagas completed successfully: 0 | ||
task explicitly disabled by config! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little surprised to see this here - the Default
policy should have disabled
be false right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not disabled by default, but it is disabled explicitly in the test config file. Since this task is intended as a backstop to ensure update sagas are always run eventually, but it's not actually the primary mechanism that triggers update sagas, we disable it when running Nexus' integration tests, to make sure the other mechanisms for starting update sagas work in the happy path. See:
omicron/nexus/tests/config.test.toml
Lines 127 to 138 in a1534d7
# The purpose of the `instance-updater` background task is to ensure that update | |
# sagas are always *eventually* started for instances whose database state has | |
# changed, even if the update saga was not started by the Nexus replica handling | |
# an update from sled-agent. This is to ensure that updates are performed even | |
# in cases where a Nexus crashes or otherwise disappears between when the | |
# updated VMM and migration state is written to CRDB and when the resulting | |
# update saga actually starts executing. However, we would prefer update sagas | |
# to be executed in a timely manner, so for integration tests, we don't want to | |
# *rely* on the instance-updater background task for running these sagas. | |
# | |
# Therefore, disable the background task during tests. | |
instance_updater.disable = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
2316680
to
245f167
Compare
Depends on #6541.
Similarly to #6541, this commit refactors the
instance_updater
background task's OMDB status implementation to use a shared Rust struct
in
nexus-types
to represent the status JSON object. I also changed thestatus message to include a full list of errors that occurred, instead
of just the most recent one and a count, and I made some tweaks to the
output alignment in OMDB.