-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wait for ZFS snapshot directory + agent fixes #838
Conversation
When `zfs snapshot dataset@name` is called, the ZFS snapshot is not instantaneously mounted to the appropriate .../.zfs/snapshot/<name> path (as long as 172 ms was measured for a zpool with file backed vdevs). There's two fixes that result from this: 1. Use `zfs list -t snapshot ...` to tell if a snapshot exists, not the presence of the mounted snapshot directory 2. Wait for the ZFS snapshot directory to get mounted before starting a read-only downstairs service that points to it. Wait a maximum of 5 seconds before returning an error. Also bundled in this commit is a fix for an agent bug: it's possible for a snapshot to have a running read-only downstairs that is in the destroyed state (but not deleted yet), and this will prevent the deletion of said snapshot, as the deletion code simply checked for the presence of a running read-only snapshot and not its state. Fix this by matching on the state and allowing deletion. Also (!) bundled in this commit is another fix for another agent bug: when getting snapshots for a region, if the region wasn't in state Created then the endpoint would return 500. This is wrong, and we can handle more states than !Created. The only state that should return 500 is Failed. Fixes oxidecomputer#827.
Several endpoints of the Crucible agent do not perform work synchronously with that endpoint's handler: changes are instead requested and the actual work proceeds asynchronously. Nexus assumes that the work is synchronous though. This commit creates three functions in common_storage that properly deal with an agent that does asynchronous work (`delete_crucible_region`, `delete_crucible_snapshot`, and `delete_crucible_running_snapshot`), and calls those. Part of testing this commit was creating "disk" antagonists in the omicron-stress tool. This uncovered cases where the transactions related to disk creation and deletion failed to complete. One of these cases is in `regions_hard_delete` - this transaction is now retried until it succeeds. The `TransactionError::retry_transaction` function can be used to see if CRDB is telling Nexus to retry the transaction. Another is `decrease_crucible_resource_count_and_soft_delete_volume` - this was broken into two steps, the first of which enumerates the read-only resources to delete (because those don't currently change!). Another fix that's part of this commit, exposed by the disk antagonist: it should only be ok to delete a disk in state `Creating` if you're in the disk create saga. If this is not true, it's possible for a delete of a disk currently in the disk create saga to cause that saga to fail unwinding and remain stuck. This commit also bundles idempotency related fixes for the simulated Crucible agent, as these were exposed with the retries that are performed by the new `delete_crucible_*` functions. What shook out of this is that `Nexus::volume_delete` was removed, as this was not correct to call during the unwind of the disk and snapshot create sagas: it would conflict with what those sagas were doing as part of their unwind. The volume delete saga is still required during disk and snapshot deletion, but there is enough context during the disk and snapshot create sagas to properly unwind a volume, even in the case of read-only parents. `Nexus::volume_delete` ultimately isn't safe, so it was removed: instead, any future sagas should either embed the volume delete saga as a sub saga, or there should be enough context in the outputs said future saga's nodes to properly unwind what was created. Depends on oxidecomputer/crucible#838, which fixes a few non-idempotency bugs in the actual Crucible agent. Fixes oxidecomputer#3698
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some questions for you
); | ||
} | ||
|
||
State::Destroyed | State::Failed => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which state here are we looking at? The state of a downstairs somewhere?
If the state is failed, could it be failed and then come back later and not be failed and then want this snapshot?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the stored state for a "running snapshot" (aka read only downstairs booted at the snapshot's mount dir: I like and don't like the "running snapshot" name but here we are).
As far as I can tell, there's no path in the state
state machine to transition out of Failed
for any of the resources that the agent can interact with.
.arg("creation") | ||
.arg(snapshot) | ||
.output()?; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need a if !cmd.status.success() {
check here? Or if we fail, does cmd_stdout just end up
with nothing in it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Error check added in dfb1358
Several endpoints of the Crucible agent do not perform work synchronously with that endpoint's handler: changes are instead requested and the actual work proceeds asynchronously. Nexus assumes that the work is synchronous though. This commit creates three functions in common_storage that properly deal with an agent that does asynchronous work (`delete_crucible_region`, `delete_crucible_snapshot`, and `delete_crucible_running_snapshot`), and calls those. Part of testing this commit was creating "disk" antagonists in the omicron-stress tool. This uncovered cases where the transactions related to disk creation and deletion failed to complete. One of these cases is in `regions_hard_delete` - this transaction is now retried until it succeeds. The `TransactionError::retry_transaction` function can be used to see if CRDB is telling Nexus to retry the transaction. Another is `decrease_crucible_resource_count_and_soft_delete_volume` - this was broken into two steps, the first of which enumerates the read-only resources to delete (because those don't currently change!). Another fix that's part of this commit, exposed by the disk antagonist: it should only be ok to delete a disk in state `Creating` if you're in the disk create saga. If this is not true, it's possible for a delete of a disk currently in the disk create saga to cause that saga to fail unwinding and remain stuck. This commit also bundles idempotency related fixes for the simulated Crucible agent, as these were exposed with the retries that are performed by the new `delete_crucible_*` functions. What shook out of this is that `Nexus::volume_delete` was removed, as this was not correct to call during the unwind of the disk and snapshot create sagas: it would conflict with what those sagas were doing as part of their unwind. The volume delete saga is still required during disk and snapshot deletion, but there is enough context during the disk and snapshot create sagas to properly unwind a volume, even in the case of read-only parents. `Nexus::volume_delete` ultimately isn't safe, so it was removed: instead, any future sagas should either embed the volume delete saga as a sub saga, or there should be enough context in the outputs said future saga's nodes to properly unwind what was created. Depends on oxidecomputer/crucible#838, which fixes a few non-idempotency bugs in the actual Crucible agent. Fixes oxidecomputer#3698
Several endpoints of the Crucible agent do not perform work synchronously with that endpoint's handler: changes are instead requested and the actual work proceeds asynchronously. Nexus assumes that the work is synchronous though. This commit creates three functions in common_storage that properly deal with an agent that does asynchronous work (`delete_crucible_region`, `delete_crucible_snapshot`, and `delete_crucible_running_snapshot`), and calls those. Part of testing this commit was creating "disk" antagonists in the omicron-stress tool. This uncovered cases where the transactions related to disk creation and deletion failed to complete. One of these cases is in `regions_hard_delete` - this transaction is now retried until it succeeds. The `TransactionError::retry_transaction` function can be used to see if CRDB is telling Nexus to retry the transaction. Another is `decrease_crucible_resource_count_and_soft_delete_volume` - this was broken into two steps, the first of which enumerates the read-only resources to delete (because those don't currently change!). Another fix that's part of this commit, exposed by the disk antagonist: it should only be ok to delete a disk in state `Creating` if you're in the disk create saga. If this is not true, it's possible for a delete of a disk currently in the disk create saga to cause that saga to fail unwinding and remain stuck. This commit also bundles idempotency related fixes for the simulated Crucible agent, as these were exposed with the retries that are performed by the new `delete_crucible_*` functions. What shook out of this is that `Nexus::volume_delete` was removed, as this was not correct to call during the unwind of the disk and snapshot create sagas: it would conflict with what those sagas were doing as part of their unwind. The volume delete saga is still required during disk and snapshot deletion, but there is enough context during the disk and snapshot create sagas to properly unwind a volume, even in the case of read-only parents. `Nexus::volume_delete` ultimately isn't safe, so it was removed: instead, any future sagas should either embed the volume delete saga as a sub saga, or there should be enough context in the outputs said future saga's nodes to properly unwind what was created. Depends on oxidecomputer/crucible#838, which fixes a few non-idempotency bugs in the actual Crucible agent. Fixes #3698
When
zfs snapshot dataset@name
is called, the ZFS snapshot is not instantaneously mounted to the appropriate .../.zfs/snapshot/ path (as long as 172 ms was measured for a zpool with file backed vdevs). There's two fixes that result from this:Use
zfs list -t snapshot ...
to tell if a snapshot exists, not the presence of the mounted snapshot directoryWait for the ZFS snapshot directory to get mounted before starting a read-only downstairs service that points to it. Wait a maximum of 5 seconds before returning an error.
Also bundled in this commit is a fix for an agent bug: it's possible for a snapshot to have a running read-only downstairs that is in the destroyed state (but not deleted yet), and this will prevent the deletion of said snapshot, as the deletion code simply checked for the presence of a running read-only snapshot and not its state. Fix this by matching on the state and allowing deletion.
Also (!) bundled in this commit is another fix for another agent bug: when getting snapshots for a region, if the region wasn't in state Created then the endpoint would return 500. This is wrong, and we can handle more states than !Created. The only state that should return 500 is Failed.
Fixes #827.