Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow block cloning across encrypted datasets. #14705

Closed
wants to merge 1 commit into from

Conversation

pjd
Copy link
Contributor

@pjd pjd commented Apr 2, 2023

Block cloning is now possible between two encrypted datasets that share the same encryption root.

Motivation and Context

Description

How Has This Been Tested?

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

Checklist:

Block cloning is now possible between two encrypted datasets that share
the same encryption root.

Signed-off-by: Pawel Jakub Dawidek <[email protected]>
@mcmilk
Copy link
Contributor

mcmilk commented Apr 3, 2023

Can we add an encrypted block cloning test with the test suite?

@robn
Copy link
Member

robn commented Apr 3, 2023

What happens if you clone across datasets, and then use zfs change-key to make one of them into its own encryption root?

@Majiir
Copy link

Majiir commented Apr 4, 2023

Also, what happens when you have two encryption roots, zfs change-key -i one of them under the other, and then try to clone blocks across those datasets?


I'm no expert on ZFS internals, but to throw out a few ideas:

  1. Can we require that the two datasets use the same master key? They may or may not share an encryption root, but if the blocks were encrypted with the same key, either dataset can read them transparently. This would require either that both datasets have loaded keys (since the wrapping keys may be different) or that there be some sort of unencrypted identifier for each master key.

  2. Alternately, the BRT could indicate the originally encrypting dataset for each block. That would require consulting the BRT on decryption, and keeping keys around for old datasets that have been destroyed, but for which some blocks are still referenced. Such blocks would have to remain in the BRT even when they are only referenced once.

  3. Can we indicate in a block pointer the key (or dataset) that was originally used to encrypt the block it points to?

@behlendorf behlendorf added the Status: Code Review Needed Ready for review and testing label Apr 6, 2023
@robn robn mentioned this pull request Oct 5, 2023
@isopix
Copy link

isopix commented Oct 5, 2023

What happens if you clone across datasets, and then use zfs change-key to make one of them into its own encryption root?

And nobody knows the answer to such an important question?

@robn
Copy link
Member

robn commented Oct 5, 2023

And nobody knows the answer to such an important question?

I have a gut feeling about the answer (namely: "it would be bad") but its more that these kind of edge cases need to be studied and understood (and if there's a problem, solved) before this could go live.

Its not a big deal, just needs someone with enough skill and time to look into it.

@darkbasic
Copy link

darkbasic commented Oct 6, 2023

What happens if you clone across datasets, and then use zfs change-key to make one of them into its own encryption root?

I guess that zfs change-key should create a new copy otherwise bad things might happen.
It should also be able to create a transaction and revert in case there is not enough free space to do so.

@Majiir
Copy link

Majiir commented Oct 6, 2023

[UPDATE] This post contains errors. Leaving it as originally written, see corrections further down.

What happens if you clone across datasets, and then use zfs change-key to make one of them into its own encryption root?

I'm not following how zfs change-key could create any problems.


Suppose we start with two datasets A and B with the same encryption root E. The encryption root has a wrapping key W that encrypts a master key M.

A block X is written to dataset A. It is encrypted with a randomly chosen IV and a derived key K. The key K is derived from M and a regularly rotated salt S. The salt (and therefore K) are shared by many blocks. IV and S are stored with the block pointer, so you can decrypt a block as long as you have the block pointer and M.

Now, we clone X to dataset B. That means some node Y now points to X (in addition to whatever in A already pointed to X) and the BRT is tracking the number of references to X. When reading a file from either dataset, the fact that the block was cloned is completely transparent.

When we read a file in B and come across X, we decrypt it like any other block: Read S, IV and MAC from the block pointer, use S and M to derive K, and decrypt with IV, K and MAC. This works because datasets A and B have the same master key M.


(Aside: The docs say the salt and derived key are cached when encrypting blocks to deriving the key needlessly. Is it also cached when reading blocks?)


Back to our question: What happens when we zfs change-key B to make it its own encryption root?

We wrap M in a new wrapping key Y. And that's it! The master key doesn't change - that would require re-encrypting every block. When we go to read a file in B containing X, decryption still works because we still derive the same K for that block. As far as I know, it is impossible to change M for a dataset.

We need to ensure that our earlier assumption always holds: If we clone a block from A to B, then those datasets must have the same master key (or both be unencrypted). They also must have the same encryption settings, because the block pointer does not store which cipher suite was used.


With that, cloning across encrypted datasets should "just work". But there may be additional security considerations. (Caveat: I'm an amateur!)

  • We need to be careful when checking the master keys for equality to avoid timing attacks.

  • If an attacker knows that two datasets share a block (which I think can be known just by looking at the pointers on disk) then they also know that the two datasets have the same master key. This could be valuable information if those datasets have different encryption roots and therefore different wrapping keys. (This sounds to me like a non-issue, but again, amateur.)

  • Seeing that two encrypted datasets share data could give an attacker hints about what a dataset contains (based on dataset names). For this reason, it should be easy (e.g. via dataset property) to disable further block cloning into or out of a dataset (not sure which direction matters more). IMO, we should not expect users to carefully control every invocation of cp or mv.

@oromenahar
Copy link
Contributor

oromenahar commented Oct 6, 2023

Its not a big deal, just needs someone with enough skill and time to look into it.

I wanna check this stuff. But right now I am a little bit limited by time. I quess I can do this until the end of this month or middle of next month. Quite unsure, my calender is full.

I'm also working on some better key handling. I wanna improve this PR #14836 and has some of the work already done. While working on this I'm going to try to understand how zfs handles the encrytption and wanna check this PR as well. And wanna check how about clones and changing keys on datasets.

@robn
Copy link
Member

robn commented Oct 7, 2023

@Majiir thanks for the analysis.

When I originally asked "what if you change encryption root" I didn't know very much about OpenZFS encryption. These days I know a little more, but still not very much.

I think you're right insofar as changing the encryption root only updates the wrapping keys, but I think it we still can't clone between arbitrary encrypted datasets.

Suppose we start with two datasets A and B with the same encryption root E. The encryption root has a wrapping key W that encrypts a master key M.

So as far as I understand it, each dataset has its own master key, except for snapshots and clones created from snapshots, which share the master key of their parent (that is, anything that causes dsl_dir_is_clone() to return true). The comments at the top of dsl_crypt.c and zio_crypt.c seem to support that.

Given that, it seems like it should be possible to allow block cloning within a dataset, its snapshots and their clones, because they share a master key.

However, that might not be safe to do in terms of information leakage. The "CONSIDERATIONS FOR DEDUP" commentary at the top of zio_crypt.c explain an alternate scheme for IV and salt to ensure the ciphertexts stay the same, but I think that's only necessary to make the dedup lookup possible, which is the only signal that a block has been seen previously. I don't think it's a security consideration - if it were, then why are duplicate blocks safe with snapshots and their clones?

But even then, that comment makes it clear that this still only allows dedup within "clone groups". Which makes sense if they are still reliant on their master key.

So I reckon cloned blocks between a dataset, it's snapshots, and clones from those snapshots will work, because the dedup precedent suggests that it's safe to do generally, and we can use the blocks as-is because we don't need the additional signal to reuse the block.

But, it won't work between arbitrary datasets, because the master keys are different.

And the changing encryption root is irrelevant!

If so, that's pretty great, since cloning from a snapshot back to its primary dataset is the major use case for cross dataset cloning.

@mmatuska
Copy link
Contributor

I gave this patch a testing shot and it looks good.

root@test:~ # mkdir -p /mnt /mnt2 /mnt3
root@test:~ # zfs create -o encryption=on -o keyformat=passphrase -o mountpoint=none rpool/VAULT
Enter new passphrase:
Re-enter new passphrase:
root@test:~ # zfs create rpool/VAULT/ds1
root@test:~ # zfs create rpool/VAULT/ds2
root@test:~ # mount -t zfs rpool/VAULT/ds1 /mnt
root@test:~ # mount -t zfs rpool/VAULT/ds2 /mnt2
root@test:~ # dd if=/dev/random of=/mnt/testfile bs=1M count=1024
1024+0 records in
1024+0 records out
1073741824 bytes transferred in 4.565675 secs (235177011 bytes/sec)
root@test:~ # zfs snapshot rpool/VAULT/ds1@s1
root@test:~ # openssl sha256 /mnt/testfile
SHA2-256(/mnt/testfile)= a7bc8c2d89f8332fdb89803e34b0ac6b0615fcee5bcaf544aa4451dd56026ea8
root@test:~ # zpool get bcloneused rpool
NAME   PROPERTY    VALUE        SOURCE
rpool  bcloneused  64K          -
root@test:~ # cp /mnt/testfile /mnt/testfile2
root@test:~ # openssl sha256 /mnt/testfile2
SHA2-256(/mnt/testfile2)= a7bc8c2d89f8332fdb89803e34b0ac6b0615fcee5bcaf544aa4451dd56026ea8
root@test:~ # zpool get bcloneused rpool
NAME   PROPERTY    VALUE        SOURCE
rpool  bcloneused  1.00G        -
root@test:~ # cp /mnt/.zfs/snapshot/s1/testfile /mnt/testfile3
root@test:~ # openssl sha256 /mnt/testfile3
SHA2-256(/mnt/testfile3)= a7bc8c2d89f8332fdb89803e34b0ac6b0615fcee5bcaf544aa4451dd56026ea8
root@test:~ # zpool get bcloneused rpool
NAME   PROPERTY    VALUE        SOURCE
rpool  bcloneused  1.00G        -
root@test:~ # zpool get bclonesaved rpool
NAME   PROPERTY     VALUE         SOURCE
rpool  bclonesaved  2.00G         -
root@test:~ # zfs clone -o mountpoint=none rpool/VAULT/ds1@s1 rpool/VAULT/cl1
root@test:~ # mount -t zfs rpool/VAULT/cl1 /mnt3
root@test:~ # cp /mnt/testfile /mnt3/testfile
root@test:~ # openssl sha256 /mnt3/testfile 
SHA2-256(/mnt3/testfile)= a7bc8c2d89f8332fdb89803e34b0ac6b0615fcee5bcaf544aa4451dd56026ea8
root@test:~ # zpool get bclonesaved rpool
NAME   PROPERTY     VALUE         SOURCE
rpool  bclonesaved  3.00G         -
root@test:~ # openssl sha256 /mnt2/testfile 
SHA2-256(/mnt2/testfile)= a7bc8c2d89f8332fdb89803e34b0ac6b0615fcee5bcaf544aa4451dd56026ea8

@Majiir
Copy link

Majiir commented Nov 1, 2023

Test results

I'm working on writing up these test cases in a NixOS VM test. I was able to reproduce @mmatuska's results, which are notable because they appear to contradict the statement that different datasets created in the same encryption root have different master keys. With my test setup, I took a stab at answering a few questions from upthread:

What happens if you clone across datasets, and then use zfs change-key to make one of them into its own encryption root?

If the datasets originally share an encryption root, then the clone works. That would imply that the two datasets also have the same master key. After zfs change-key, the second dataset is its own encryption root, but the master key hasn't changed, so the cloned blocks are still readable.

If the datasets do not share an encryption root, then the clone doesn't happen (the blocks are copied normally). This happens even if the datasets have the same master key, so this is a missed opportunity to clone blocks.

Also, what happens when you have two encryption roots, zfs change-key -i one of them under the other, and then try to clone blocks across those datasets?

If the two encryption roots were previously part of the same encryption root, and therefore have the same master key, then inheriting the same parent encryption root again seems to allow blocks to be cloned safely.

If the two datasets began life with two different encryption roots, ... it still works?!?!?!??? allocated and bcloneused don't increase, bclonesaved does, the checksum of the original and cloned file matches. This can't be right - those datasets must have different master keys, right? Reads should fail with decryption errors.

I thought maybe something funny was happening with ZFS ARC, so I rebooted the VM. But the two files still hashed the same.

Finally, I made it fail by rebooting the VM and hashing the cloned file first, before the original blocks were read (and presumably cached, unencrypted). Logs were spammed like this for each block:

machine # [   44.172796] zed[1081]: eid=523 class=data pool='rpool' priority=2 err=5 flags=0x180 bookmark=156:3:0:516

Armed with that knowledge, I repeated the simple block cloning test across two datasets under the same encryption root, this time with a reboot between the cloning and the hashing, and by hashing the cloned target before reading the original file. Sure enough, the reads fail.


So, what does this all mean?

  • This PR should be reworked to ensure that the source and target datasets for a block clone have the same master key.
  • It appears @robn is correct that different datasets created under the same encryption root are created with different master keys.
  • Testing is challenging. Blocks that are successfully decrypted when read from one dataset can be cached and can then be successfully read from a file in another dataset, even if reading from that second dataset would fail if the decrypted blocks were not already cached.

Thoughts on master keys

Each dataset having a different master key limits the utility of block cloning with encryption. Some use cases are still supported (like copying from a snapshot) but some attractive use cases are not, like moving or copying a file between different filesystems.

On the other hand, the master keys being different allows a user to zfs change-key a dataset and send it along with the user key somewhere untrusted without fear that they are compromising the master key for their other datasets (at least, as long as those datasets aren't in the same 'clone group' as the one being sent). I can see this being important behavior for some users, and useless for others.

For block cloning to be most useful for encrypted datasets, I think we want two new capabilities:

  • A filesystem property (off by default) that reuses the master key of the parent filesystem when creating child datasets. This would allow an informed user to expand the potential for block cloning.
  • Ability for a user to check which master key is used for a dataset. (Doing this safely may be challenging.) The purpose is twofold: (1) to let a user easily see whether block cloning will work between two datasets, although cp --reflink can do this; and (2) to help users manage the risk of key compromise given the introduction of the new filesystem property described above. (Consider a situation where a previous administrator enabled this property, created some datasets, and then disabled it again. How can the new administrator know whether it's safe to send an innocuous dataset and its new user key to an untrusted host?)

@oromenahar
Copy link
Contributor

I opened a PR with tests and a rebase to the current master. Don't know how to add commits to this PR. (sorry)
Sorry for being a little bit late but you can check it out here: #15544 if you like.

@mmatuska
Copy link
Contributor

mmatuska commented Nov 19, 2023

We cannot check against the common encryption root.

Here is a test that makes this not work:

zfs create -o mountpoint=none -o encryption=on -o keyformat=passphrase rpool/ENC1
zfs create rpool/ENC1/ds1
zfs create -o mountpoint=none -o encryption=on -o keyformat=passphrase rpool/ENC2
zfs create rpool/ENC2/ds2
zfs change-key -o keyformat=passphrase rpool/ENC2/ds2
zfs rename rpool/ENC2/ds2 rpool/ENC1/ds2
zfs change-key -i rpool/ENC1/ds2

Now the datasets rpool/ENC1/ds1 and rpool/ENC1/ds2 are under the same encryption root but have different master keys.
Now just copy_file_range(2) a file from ds1 to ds2 and you get a read error when reading the file from ds2.

The safest way would be to compare master keys (or check if the same master key is used), the second safest to allow only snapshots and clones of the same dataset because these have to use the same key.

@behlendorf
Copy link
Contributor

Replaced by #15544 which has been rebased.

@behlendorf behlendorf closed this Nov 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Code Review Needed Ready for review and testing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants