-
Notifications
You must be signed in to change notification settings - Fork 160
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add set_quick_repair() #893
Conversation
Now that commit() has its own in-memory copy of the database header, there's no need for the special-case swap_primary flag
src/transactions.rs
Outdated
/// Security considerations: Many hard disk drives and SSDs do not actually guarantee that data | ||
/// has been persisted to disk after calling `fsync`. Even with this commit level, an attacker | ||
/// with a high degree of control over the database's workload, including the ability to cause | ||
/// the database process to crash, can cause the database to crash with the god byte primary bit | ||
/// pointing to an invalid commit slot, leaving the database in an invalid, potentially attacker-controlled state. | ||
Paranoid, | ||
/// Like [`Durability::Paranoid`], but also saves the allocator state as part of each commit. | ||
/// This means repair is no longer required when opening the database after a crash. | ||
ParanoidPlus, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest Enclosed
here to convey that this writes more data inline to make the resulting persistent structure self-contained.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for looking at this!
Yeah, Enclosed
would work, although I think I might still prefer ParanoidPlus
. My feeling is neither of those names really gives the user any hope of guessing the actual semantics, but ParanoidPlus
at least makes it clear that it's even more durable than Paranoid
, whereas Enclosed
could be anywhere on the durability scale.
Oh wow, this ended up being a lot more work than I'd initially thought.
Thanks for the PR! I had been thinking it could work by setting the
recovery bit to false, but this is much better.
I'm traveling without my computer for the next few days, so it'll be a bit
before I can review this in detail.
Sent from my phone
…On Sat, Nov 9, 2024, 10:57 PM Adam Reichold ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In src/transactions.rs
<#893 (comment)>:
> /// Security considerations: Many hard disk drives and SSDs do not actually guarantee that data
/// has been persisted to disk after calling `fsync`. Even with this commit level, an attacker
/// with a high degree of control over the database's workload, including the ability to cause
/// the database process to crash, can cause the database to crash with the god byte primary bit
/// pointing to an invalid commit slot, leaving the database in an invalid, potentially attacker-controlled state.
Paranoid,
+ /// Like [`Durability::Paranoid`], but also saves the allocator state as part of each commit.
+ /// This means repair is no longer required when opening the database after a crash.
+ ParanoidPlus,
I would suggest Enclosed here to convey that this writes more data inline
to make the resulting persistent structure self-contained.
—
Reply to this email directly, view it on GitHub
<#893 (review)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAGNXQDPRVRTNRZAUU2CCMLZ737UXAVCNFSM6AAAAABRPVKIBSVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDIMRVGY3DGNBRGE>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
Sounds good! There's no hurry; enjoy your trip! |
Ok, read through most of it now. One question. Do you think this quick repair approach can be used for 1-phase commit transactions too? My thinking is that you need to save the allocator state and use 2-phase commit to get really fast repairs, because you need 2PC to avoid validating all the checksums. However, if you have the allocator state it seems like we can still accelerate a repair under 1-phase commit. It would still validate all the checksums, but then if it found the allocator state it could skip rebuilding the allocator state which requires a second scan of the file. Does that seem like it works to you? If so, then I think we should add a new method like |
Yeah, I like the idea of having this be a setting separate from the durability level. I'm not sure we want to support using it with 1-phase commit, though. It seems like a footgun -- if you call If we want to speed up repair with 1-phase commit, I feel like the better solution would be to verify the checksums and reconstruct the allocator state in a single pass. That would give the same speed benefit -- and it would work for all 1-phase commit users, without requiring them to opt into a new mode or pay the cost of writing out the allocator state with each commit. |
Ah yes, good point, let's not support it for 1-phase commits. Ok, so the naming options that seem appealing to me are:
(2) or (3) seem more descriptive than (1) to me. What do you think is most clear? |
I think my preference at the moment is (3), but (2) also sounds fine! If we go with (3), do you think it might be worth splitting 2-phase commit out into its own flag too? This would let people use |
That sounds good to me! |
Instead of actually making a commit, do_repair() should just return the new roots that would be committed. This makes check_integrity() much faster, since it no longer needs to verify the checksums twice, and it means commit() can always trim the database without worrying about whether it'll change the allocator hash. It also allows check_integrity() to avoid making an unnecessary commit if the database is clean, which is nice with quick-repair. It means a successful check_integrity() won't invalidate the allocator state table, so you can still do a quick-repair afterwards
Refactoring in preparation for quick-repair, which needs to be able to call these separately
This adds a new quick-repair mode, which gives instant recovery after a crash at the cost of slower commits. To make this work, each commit with quick-repair enabled needs to save the allocator state somewhere. We can't use the region headers, because we'd be overwriting them in place; we might crash partway through the overwrite, and then we'd need a full repair. So we instead save the allocator state to a new table in the system tree. Writing to the table is slightly tricky, because it needs to be done without allocating (see below), but other than that it's a perfectly ordinary transactional write with all the usual guarantees. The other requirement to avoid full repair is knowing whether the last transaction used 2-phase commit. For this, we add a new two_phase_commit bit to the god byte, which is always updated atomically along with swapping the primary bit. Old redb versions will ignore the new flag when reading and clear it when writing, which is exactly what we want. This turns out to also fix a longstanding bug where 2-phase commit hasn't been providing any security benefit at all. The checksum forgery attack described in the documentation for 1-phase commit actually works equally well against 2-phase commit! The problem is that even though 2-phase commit guarantees the primary is valid, redb ignores the primary flag when repairing. It always picks whichever commit slot is newer, as long as the checksum is valid. So if you crash partway through a commit, it'll try to recover using the partially-written secondary rather than the fully-written primary, regardless of the commit strategy. The fix for this is exactly the two_phase_commit bit described above. After a crash, we check whether the last transaction used 2-phase commit; if so, we only look at the primary (which is guaranteed to be valid) and ignore the secondary. Quick-repair needs this check anyway for safety, so we get the bug fix for free. To write to the allocator state table without allocating, I've introduced a new insert_inplace() function. It's similar to insert_reserve(), but more general and maybe simpler. To use it, you have to first do an ordinary insert() with your desired key and a value of the appropriate length; then later in the same transaction you can call insert_inplace() to replace the value with a new one. Unlike insert_reserve(), this works with values that don't implement MutInPlaceValue, and it lets you hold multiple reservations simultaneously. insert_inplace() could be safely exposed to users, but I don't think there's any reason to. Since it doesn't give you a mutable reference, there's no benefit over insert() unless you're storing data that cares about its own position in the database. So for now it's private, and I haven't bothered making a new error type for it; it just panics if you don't satisfy the preconditions. The fuzzer is perfect for testing quick-repair, because it can simulate a crash, reopen the database (using quick-repair if possible), and then verify that the resulting allocator state exactly matches what would happen if it ran a full repair. I've modified the fuzzer to generate quick-repair commits in addition to ordinary commits.
Okay, I've updated the code to use |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great! I had one comment
if self.quick_repair { | ||
self.two_phase_commit = true; | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I may have missed it and this is handled somewhere else, but I think we should also add
if self.two_phase_commit && self.durability == InternalDurability::None {
self.durability = InternalDurability::Immediate;
}
Otherwise a user could try to set both Durability::None and 2PC, but the security benefit wouldn't be achieved unless the next durable commit is also 2PC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is unsafe even with Durability::Immediate
.
Suppose your application makes some commits that contain attacker-controlled data, and others that don't. You might think you can use 2-phase commit just for the former, but that's not good enough! The attacker's data is present in the database, so they can cause a crash at the moment when it's about to be overwritten by a new (non-attacker-controlled) page, leaving the new page pointer referring to the attacker-controlled data instead. Since we're already assuming they can predict the database layout and the content of the data they don't control, they can forge the checksum to make it appear valid.
If you want the security of 2-phase commit, you need to use it for every commit during which the attacker could cause a crash, which is probably all of them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That said, I agree that 2-phase commit with Durability::None
isn't really a meaningful combination, and I'd be okay with disallowing it. (I'd rather do that than silently upgrade to Durability::Immediate
, because the user presumably cares about commit speed if they're requesting Durability::None
; we should tell them how to get that speed rather than silently doing the wrong thing.)
But my preference is to allow the combination, even though it doesn't do anything, for exactly the reason mentioned above: the right way to use 2-phase commit is to enable it on every single commit, so we may as well make that easy. The same thing applies to quick-repair -- if you're using quick-repair, you probably want it on every commit, so it seems convenient to allow it even on the Durability::None
commits where it has no effect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh ya, you're right! Security is hard, haha
Merged. Thanks! This is awesome |
Yay, thanks for taking the time to review all of this! There's no hurry, but if you decide you're interested in doing the transition we were talking about in #894 to eventually get rid of the old allocator state, let me know. I've got code to do the extra quick-repair commit on shutdown, which I can submit as another PR if you want it. |
Oh ya, it'd be great to get rid of that old code path. Yes, please send that PR |
Here's the PR for #878!
This adds a new durability level above Paranoid that doesn't require repair. For now, it's just called "ParanoidPlus" -- not the most descriptive name, but it at least makes it very clear where it falls on the scale of commit speed vs. recoverability. If you have a better name, let me know 😄
To avoid repair, Durability::ParanoidPlus commits need to save the allocator state somewhere. We can't use the region headers, because we'd be overwriting them in place; we might crash partway through the overwrite, and then we'd need repair. So we instead save the allocator state to a new table in the system tree. Writing to the table is slightly tricky, because it needs to be done without allocating (see below), but other than that it's a perfectly ordinary transactional write with all the usual guarantees.
The other requirement to avoid repair is knowing whether the last transaction used 2-phase commit. For this, we add a new
two_phase_commit
bit to the god byte, which is always updated atomically along with swapping the primary bit. Old redb versions will ignore the new flag when reading and clear it when writing, which is exactly what we want.This turns out to also fix a longstanding bug where Durability::Paranoid hasn't been providing any security benefit at all. The checksum forgery attack described in the Durability::Immediate documentation actually works equally well against Durability::Paranoid! The problem is that even though 2-phase commit guarantees the primary is valid, redb ignores the primary flag when repairing. It always picks whichever commit slot is newer, as long as the checksum is valid. So if you crash partway through a commit, it'll try to recover using the partially-written secondary rather than the fully-written primary, regardless of the durability mode.
The fix for this is exactly the
two_phase_commit
bit described above. After a crash, we check whether the last transaction used 2-phase commit; if so, we only look at the primary (which is guaranteed to be valid) and ignore the secondary. Durability::ParanoidPlus needs this check anyway for safety, so we get the Durability::Paranoid bug fix for free.To write to the allocator state table without allocating, I've introduced a new insert_inplace() function. It's similar to insert_reserve(), but more general and maybe simpler. To use it, you have to first do an ordinary insert() with your desired key and a value of the appropriate length; then later in the same transaction you can call insert_inplace() to replace the value with a new one. Unlike insert_reserve(), this works with values that don't implement MutInPlaceValue, and it lets you hold multiple reservations simultaneously.
insert_inplace() could be safely exposed to users, but I don't think there's any reason to. Since it doesn't give you a mutable reference, there's no benefit over insert() unless you're storing data that cares about its own position in the database. So for now it's private, and I haven't bothered making a new error type for it; it just panics if you don't satisfy the preconditions.
The fuzzer is perfect for testing Durability::ParanoidPlus, because it can simulate a crash, reopen the database (skipping repair if possible), and then verify that the resulting allocator state exactly matches what would happen if it ran a full repair. I've updated the fuzzer to generate Durability::ParanoidPlus commits along with the existing Durability::None and Durability::Immediate.