Overflowing refreservation is bad #15996

rincebrain · 2024-03-14T09:12:39Z

Motivation and Context

Someone came to me and pointed out that you could pretty readily cause the refreservation calculation to exceed 2^64, given the 2^17 multiplier in it, and produce refreservations wildly less than the actual volsize in cases where it should have failed.

Description

Shuffle the multiplicands around so that we save the 2^17 multiplier for the end, and if our previous value exceeds 2^47, just use UINT64_MAX.

(We could also just fail in that case, I don't have an especially strong opinion, it just seemed like a useful explicit check to add given the problem at hand.)

How Has This Been Tested?

$ for i in `seq 1 4`; do truncate -s 64G /testfile${i};done;
$ sudo ./zpool create dumbpool -o ashift=13 raidz2 /testfile{1,2,3,4}

# original
$ for i in 1 2 4 8 16 32 64 128 256; do sudo ./zfs create -V ${i}T dumbpool/vol${i};done;
$ sudo zfs list -o name,volsize,refreservation -s volsize
NAME             VOLSIZE  REFRESERV
dumbpool/vol1         1T      1.10T
dumbpool/vol2         2T      2.20T
dumbpool/vol4         4T      4.39T
dumbpool/vol8         8T      8.79T
dumbpool/vol16       16T      17.6T
dumbpool/vol32       32T      35.2T
dumbpool/vol64       64T      8.26T
dumbpool/vol128     128T      16.5T
dumbpool/vol256     256T      33.0T
dumbpool               -       none

# patch
$ for i in 1 2 4 8 16 32 64 128 256; do sudo ./zfs create -V ${i}T dumbpool/vol${i};done;
cannot create 'dumbpool/vol256': out of space
$ sudo zfs list -o name,volsize,refreservation -s volsize
NAME             VOLSIZE  REFRESERV
dumbpool/vol1         1T      1.10T
dumbpool/vol2         2T      2.20T
dumbpool/vol4         4T      4.39T
dumbpool/vol8         8T      8.79T
dumbpool/vol16       16T      17.6T
dumbpool/vol32       32T      35.2T
dumbpool/vol64       64T      70.3T
dumbpool/vol128     128T       141T
dumbpool               -       none

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Performance enhancement (non-breaking change which improves efficiency)
Code cleanup (non-breaking change which makes code smaller or more readable)
Breaking change (fix or feature that would cause existing functionality to change)
Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
Documentation (a change to man pages or other documentation)

Checklist:

My code follows the OpenZFS code style requirements.
I have updated the documentation accordingly.
I have read the contributing document.
I have added tests to cover my changes.
I have run the ZFS Test Suite with this change applied.
All commit messages are properly formatted and contain Signed-off-by.

robn · 2024-03-14T09:18:13Z

Yeah I'd have probably have returned EOVERFLOW there rather than clamping, but the other part seems fine. Good catch.

rincebrain · 2024-03-14T09:27:44Z

We can't, it's a uint64_t return.

We could trip a VERIFY there, and just kill the process, but that seems rude, and given our existing error handling in that code is "if we don't get anything out of this calculation, just do the naive thing", setting it as big as we could seemed the least terrible option without reworking it entirely.

amotin

Theoretically, there can be zvol smaller than 128KB. In such case nblocks * asize will be less then tsize, that will result in zero volsize. Would we shift both tsize and SPA_OLD_MAXBLOCKSIZE right by SPA_MINBLOCKSHIFT (that should not lose precision), it would be impossible.

rincebrain · 2024-03-14T17:52:33Z

It seems to be hard to construct a case where that happens in practice, quickly glancing, I end up with a refreservation of a couple MiB even in cases like that.

I'll implement the change, but just as an observation, it doesn't immediately seem easy to reproduce in practice.

(I would assume this case basically can't come up with parity overhead on parity-based devices, and there's already a bailout case for non-parity devices not passing through this code. That said, I agree that's a better solution, I was just curious how easily you could make it happen in practice.)

amotin · 2024-03-14T18:03:40Z

It seems to be hard to construct a case where that happens in practice, quickly glancing, I end up with a refreservation of a couple MiB even in cases like that.

I haven't tried is with the patch, but was thinking about something like: zfs create -V 512 -b 512 pool/zzz.

behlendorf · 2024-04-04T00:54:16Z

I'll implement the change, but just as an observation, it doesn't immediately seem easy to reproduce in practice.

Good find. I believe this is just waiting on the small change mentioned for very small zvols. Assuming that's even possible in practice.

behlendorf · 2024-04-25T21:26:50Z

@rincebrain could you rebase this to get a fresh CI run.

Someone came to me and pointed out that you could pretty readily cause the refreservation calculation to exceed 2**64, given the 2**17 multiplier in it, and produce refreservations wildly less than the actual volsize in cases where it should have failed. Signed-off-by: Rich Ercolani <[email protected]>

Someone came to me and pointed out that you could pretty readily cause the refreservation calculation to exceed 2**64, given the 2**17 multiplier in it, and produce refreservations wildly less than the actual volsize in cases where it should have failed. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Rich Ercolani <[email protected]> Closes openzfs#15996

rincebrain mentioned this pull request Mar 14, 2024

Default refreservation overflows for zvol >64TB, volblocksize=16k [libzfs_dataset.c#L5569] #15997

Open

amotin reviewed Mar 14, 2024

View reviewed changes

behlendorf added the Status: Code Review Needed Ready for review and testing label Apr 4, 2024

behlendorf approved these changes Apr 4, 2024

View reviewed changes

behlendorf added Status: Accepted Ready to integrate (reviewed, tested) and removed Status: Code Review Needed Ready for review and testing labels Apr 25, 2024

rincebrain force-pushed the overflowbad branch from 055bde4 to b5cb119 Compare April 29, 2024 13:27

behlendorf merged commit db499e6 into openzfs:master Apr 29, 2024
23 of 26 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Overflowing refreservation is bad #15996

Overflowing refreservation is bad #15996

rincebrain commented Mar 14, 2024 •

edited

Loading

robn commented Mar 14, 2024

rincebrain commented Mar 14, 2024

amotin left a comment

rincebrain commented Mar 14, 2024 •

edited

Loading

amotin commented Mar 14, 2024

behlendorf commented Apr 4, 2024

behlendorf commented Apr 25, 2024

Overflowing refreservation is bad #15996

Overflowing refreservation is bad #15996

Conversation

rincebrain commented Mar 14, 2024 • edited Loading

Motivation and Context

Description

How Has This Been Tested?

Types of changes

Checklist:

robn commented Mar 14, 2024

rincebrain commented Mar 14, 2024

amotin left a comment

Choose a reason for hiding this comment

rincebrain commented Mar 14, 2024 • edited Loading

amotin commented Mar 14, 2024

behlendorf commented Apr 4, 2024

behlendorf commented Apr 25, 2024

rincebrain commented Mar 14, 2024 •

edited

Loading

rincebrain commented Mar 14, 2024 •

edited

Loading