-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[nexus] Make most transactions automatically retry #4487
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I focused on the Oximeter integration portions in transaction_retry.rs
. I've a few suggestions and comments, but overall this looks pretty solid. Thanks for wrangling this massive diff!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Took a look with a focus on some of the parts I've been in more (rack & network_interface) and looks good overall. A couple of comments but not blockers
|
||
Ok((ips, dns_zones)) | ||
}) | ||
.transaction_async_with_retry( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, the retry aside for a moment, the only caller for nexus_external_addresses
is Nexus::silo_create
which later calls Datastore::silo_create
that does its own transaction. Feels like these queries should just be part of that transaction?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with you that these could become a single transaction, but I'm not 100% sure they should be -- there's logic to construct DNS updates via DnsVersionUpdateBuilder
in between these two transactions, and by combining them, we're forcing that logic to happen in a transactional context.
Integrates automatic transaction retry into Nexus for most transactions.
Additionally, this PR provides a "RetryHelper" object to help standardize how transaction retry is performed.
Currently, after a short randomized wait (up to an upper bound), we retry unconditionally, emitting each
attempt to Oximeter for further analysis.
Part of https://github.com/oxidecomputer/customer-support/issues/46
Part of #3814