Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overflowing refreservation is bad #15996

Merged
merged 1 commit into from
Apr 29, 2024
Merged

Conversation

rincebrain
Copy link
Contributor

@rincebrain rincebrain commented Mar 14, 2024

Motivation and Context

Someone came to me and pointed out that you could pretty readily cause the refreservation calculation to exceed 2^64, given the 2^17 multiplier in it, and produce refreservations wildly less than the actual volsize in cases where it should have failed.

Description

Shuffle the multiplicands around so that we save the 2^17 multiplier for the end, and if our previous value exceeds 2^47, just use UINT64_MAX.

(We could also just fail in that case, I don't have an especially strong opinion, it just seemed like a useful explicit check to add given the problem at hand.)

How Has This Been Tested?

$ for i in `seq 1 4`; do truncate -s 64G /testfile${i};done;
$ sudo ./zpool create dumbpool -o ashift=13 raidz2 /testfile{1,2,3,4}

# original
$ for i in 1 2 4 8 16 32 64 128 256; do sudo ./zfs create -V ${i}T dumbpool/vol${i};done;
$ sudo zfs list -o name,volsize,refreservation -s volsize
NAME             VOLSIZE  REFRESERV
dumbpool/vol1         1T      1.10T
dumbpool/vol2         2T      2.20T
dumbpool/vol4         4T      4.39T
dumbpool/vol8         8T      8.79T
dumbpool/vol16       16T      17.6T
dumbpool/vol32       32T      35.2T
dumbpool/vol64       64T      8.26T
dumbpool/vol128     128T      16.5T
dumbpool/vol256     256T      33.0T
dumbpool               -       none

# patch
$ for i in 1 2 4 8 16 32 64 128 256; do sudo ./zfs create -V ${i}T dumbpool/vol${i};done;
cannot create 'dumbpool/vol256': out of space
$ sudo zfs list -o name,volsize,refreservation -s volsize
NAME             VOLSIZE  REFRESERV
dumbpool/vol1         1T      1.10T
dumbpool/vol2         2T      2.20T
dumbpool/vol4         4T      4.39T
dumbpool/vol8         8T      8.79T
dumbpool/vol16       16T      17.6T
dumbpool/vol32       32T      35.2T
dumbpool/vol64       64T      70.3T
dumbpool/vol128     128T       141T
dumbpool               -       none

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

Checklist:

@robn
Copy link
Member

robn commented Mar 14, 2024

Yeah I'd have probably have returned EOVERFLOW there rather than clamping, but the other part seems fine. Good catch.

@rincebrain
Copy link
Contributor Author

We can't, it's a uint64_t return.

We could trip a VERIFY there, and just kill the process, but that seems rude, and given our existing error handling in that code is "if we don't get anything out of this calculation, just do the naive thing", setting it as big as we could seemed the least terrible option without reworking it entirely.

Copy link
Member

@amotin amotin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Theoretically, there can be zvol smaller than 128KB. In such case nblocks * asize will be less then tsize, that will result in zero volsize. Would we shift both tsize and SPA_OLD_MAXBLOCKSIZE right by SPA_MINBLOCKSHIFT (that should not lose precision), it would be impossible.

@rincebrain
Copy link
Contributor Author

rincebrain commented Mar 14, 2024

It seems to be hard to construct a case where that happens in practice, quickly glancing, I end up with a refreservation of a couple MiB even in cases like that.

I'll implement the change, but just as an observation, it doesn't immediately seem easy to reproduce in practice.

(I would assume this case basically can't come up with parity overhead on parity-based devices, and there's already a bailout case for non-parity devices not passing through this code. That said, I agree that's a better solution, I was just curious how easily you could make it happen in practice.)

@amotin
Copy link
Member

amotin commented Mar 14, 2024

It seems to be hard to construct a case where that happens in practice, quickly glancing, I end up with a refreservation of a couple MiB even in cases like that.

I haven't tried is with the patch, but was thinking about something like: zfs create -V 512 -b 512 pool/zzz.

@behlendorf behlendorf added the Status: Code Review Needed Ready for review and testing label Apr 4, 2024
@behlendorf
Copy link
Contributor

I'll implement the change, but just as an observation, it doesn't immediately seem easy to reproduce in practice.

Good find. I believe this is just waiting on the small change mentioned for very small zvols. Assuming that's even possible in practice.

@behlendorf behlendorf added Status: Accepted Ready to integrate (reviewed, tested) and removed Status: Code Review Needed Ready for review and testing labels Apr 25, 2024
@behlendorf
Copy link
Contributor

@rincebrain could you rebase this to get a fresh CI run.

Someone came to me and pointed out that you could pretty
readily cause the refreservation calculation to exceed
2**64, given the 2**17 multiplier in it, and produce
refreservations wildly less than the actual volsize in cases where
it should have failed.

Signed-off-by: Rich Ercolani <[email protected]>
@behlendorf behlendorf merged commit db499e6 into openzfs:master Apr 29, 2024
23 of 26 checks passed
lundman pushed a commit to openzfsonwindows/openzfs that referenced this pull request Sep 4, 2024
Someone came to me and pointed out that you could pretty
readily cause the refreservation calculation to exceed
2**64, given the 2**17 multiplier in it, and produce
refreservations wildly less than the actual volsize in cases where
it should have failed.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Rich Ercolani <[email protected]>
Closes openzfs#15996
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Accepted Ready to integrate (reviewed, tested)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants