Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Criteria for re-enabling block cloning (toggle zfs_bclone_enabled) by default #16189

Closed
johnpyp opened this issue May 11, 2024 · 13 comments · Fixed by #16712
Closed

Criteria for re-enabling block cloning (toggle zfs_bclone_enabled) by default #16189

johnpyp opened this issue May 11, 2024 · 13 comments · Fixed by #16712
Labels
BRT BRT tracking Type: Feature Feature request or new feature

Comments

@johnpyp
Copy link

johnpyp commented May 11, 2024

(Apologies if this is covered somewhere else, I didn't see any issues or discussions related to it)

Describe the feature would like to see added to OpenZFS

Turn on or remove zfs_bclone_enabled, given that most of the large issues have seemingly been resolved.

How will this feature improve OpenZFS?

It will enable block cloning for everyone, of course! :)

Additional context

So, my understanding is that zfs_bclone_enabled was added as a stop-gap measure to make sure that the corruption related to block cloning, what's the criteria by which ZFS can either enable this by default or remove it?

Following the many recent PRs related to block cloning, it seems that all(?) of the outstanding corruption issues have been resolved (please correct me if this is wrong).

Given the severity of the issue at the time, I can understand the hesitance to re-enable it by default, so I'm mostly wondering what the threshold is. Does it need "bake time"? Does it need higher profile customers using it? Or, is it actually ready to get re-enabled now?

@johnpyp johnpyp added the Type: Feature Feature request or new feature label May 11, 2024
@rincebrain
Copy link
Contributor

You're wrong.

See #16025 and #15615 for some more context.

@rrevans
Copy link
Contributor

rrevans commented May 13, 2024

@rincebrain Hmm - do you have insights on how those PRs are related to BRT? To my knowledge, both of those are trying to improve llseek vs. fixing real bclone bugs?

@rincebrain
Copy link
Contributor

Unless I've misplaced the thread entirely, they still fell into the header of "yes, it's not a problem with BRT, but triggering the problem in question with BRT is substantially easier", no?

Yes, the problem is not technically BRT's code, but I still wouldn't turn it on until that's moot.

(I could have also pointed to #15139, those were just what came to mind first.)

@rrevans
Copy link
Contributor

rrevans commented May 17, 2024

For #15526, as I recall, coreutils-9.2 was the main factor give or take? That changed copying behavior so that llseek(SEEK_HOLE/SEEK_DATA) vs. sync ctx races triggered more often. bclone/BRT was a red herring, I believe. To that end, both #16025 and #15615 should be unrelated to BRT and out of scope for enabling bclone again. (But as aside: llseek still has possible data races that need fixing, even on x86.)

As for collecting bclone blockers proper, I'll nominate both #16012 and #16014. There should probably be a github Project created to collect and track the blocking open FRs/PRs/Issues.

To the stability criteria, that's a question to pose to the leadership group. I'd suggest the project needs to set quantifiable success metrics in conversation with the authors and contributors such as:

  1. Some time period without a major bug reported (defined as data loss or other functionality broken), e.g. 90 days.
  2. Specific stress testing activities to perform e.g.
    a. run ZTS 500 times with bclone enabled by default
    b. run gentoo reproducer workload for 200 hours
  3. Gather some number of reports from users deploying to important workloads for some length of time e.g. 90 days in use for a VM hosting workload, 180 days in use as a file server workload, etc.

@rincebrain
Copy link
Contributor

#15526 technically isn't coreutils-related at all, because Gentoo wrote their own bespoke copy function in portage, IIRC, so it would reproduce without new coreutils as well.

IIRC, that one was primarily the problem with zn_has_cached_data being racey, which bites you in both the llseek and not cases, since zpl_clone_range depends on the same check as zfs_holey_common (the backing function for SEEK_HOLE/SEEK_DATA checks).

@rincebrain
Copy link
Contributor

Hey @robn @thesamesam I imagine you both have opinions on this.

@johnpyp
Copy link
Author

johnpyp commented May 17, 2024

Thanks both for the info, agree on the suggestion for a github project to track the ongoing blockers... I did not realize there were so many (potentially) blocking issues, and will stop turning it back on on my pools 😓

That kind of stabilization criteria sounds good too. I'd hope for point 3 that this can be timeboxed appropriately and have explicit buy-in from companies/whoever, such that it doesn't end up stretching out over long periods of time needlessly due to flaking on the testing / deprioritized the work / etc.

@MichaelDietzel
Copy link

I maybe have an idea how I could do some stress testing: A few months ago I implemented a simple rust tool for block based deduplication using FIDEDUPERANGE. It is still more or less experimental and needs a lot of polishing but it works for me and it is pretty fast. I mostly like to use it to deduplicate vm-images on a zvol formatted with xfs, so there are many blocks being deduplicated. My hope was to continue working on it once ZFS block cloning is stable and get rid of the extra xfs-layer. I'd also like to add cross-dataset-deduplication and (if supported by ZFS) deduplication between datasets and zvols. Or maybe even deduplicating between already existing snapshots? So I cannot use FIDEDUPERANGE for this. But what else do I use? Maybe someone can give me a hint where to start. Would I need to modify and compile ZFS by myself, or is there some kind of interface that ZFS provides that I could use with my "external" tool? I'd prefer an interface similar to FIDEDUPERANGE that doublechecks that the data are really identical so that I cannot do anything stupid or maybe just get it wrong because of a race condition. Should I maybe open a separate issue for this? Thanks!

@robn
Copy link
Member

robn commented May 18, 2024

@MichaelDietzel #11065 is the existing FR for FIDEDUPERANGE, and #15393 has an prototype implementation. I suspect that once finished, it would be a small lift to wire up an alternate non-kernel entry point to support different kinds of targets. We need the functionality first though, so those tickets are probably where to focus your efforts initially.

@robn
Copy link
Member

robn commented May 18, 2024

Since I was asked directly: I personally consider block cloning good/safe enough to be enabled since 2.2.4. A lot of work has been done to stabilise it since 2.2.2 especially; I am not aware of any substantial outstanding issues directly involving block cloning; I have it enabled on the few machines I maintain; I have suggested to certain sysadmins that know and trust me that it's safe-enough to enable; and, were I still in charge of a fleet of 100+ machines (previous life), I would be enabling it there.

However, I recognise that I'm in a position that many others are not, in that I run my systems with a ton of redundancy and I have the skills to understand and fix problems when they come up, so I can afford to be somewhat blasé about. That said, I've not had to roll up my sleeves in this way on anything related to block cloning.

So I stop short of a full-throated recommendation, but if asked I do say "it's probably fine". If there was a rough show-of-hands consensus vote on whether or not to enable it for 2.2.5 and I was at the table, I would say yes.

(I have no idea if such a vote is useful or wanted, or if I would be invited; I'm just trying to get across where my gut feel is).

I agree that more specific process for deciding on stability generally would be great, and I'll support anyone who wants to push on getting buy-in and implementing it, but I won't be taking point on that kind of project this year at least - too many other plates in the air at the moment. In the absence of such a thing, I suspect something close to "rough show-of-hands consensus vote" is about the best we can hope for at the moment.

@tonyhutter
Copy link
Contributor

While not a bugfix, we may want to wait for #16208 before re-enabling block cloning

@behlendorf
Copy link
Contributor

There should probably be a github Project created to collect and track the blocking open FRs/PRs/Issues.

Apparently GitHub doesn't want to create new projects today so I've created a BRT label instead and applied it.

@rincebrain rincebrain added the BRT BRT tracking label Jun 15, 2024
@elahn
Copy link

elahn commented Jul 2, 2024

For reference, to turn on block cloning on Linux for all pools with feature@block_cloning enabled, the "tunable" is a zfs kernel module parameter: /sys/module/zfs/parameters/zfs_bclone_enabled

See: Performance and Tuning / Module Parameters

behlendorf pushed a commit that referenced this issue Nov 1, 2024
I think we've done enough experiments.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Alexander Motin <[email protected]>
Reviewed-by: George Melikov <[email protected]>
Signed-off-by: Rob Norris <[email protected]>
Closes #16189 
Closes #16712
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
BRT BRT tracking Type: Feature Feature request or new feature
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants