Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zfs: backport patches for block cloning tunable and default disable; zfsUnstable: 2.2.1-unstable-2023-10-21 -> 2.2.1 #269465

Closed
wants to merge 2 commits into from

Conversation

amarshall
Copy link
Member

Description of changes

Alternative to #269097. Still bumping zfsUnstable to 2.2.1 as I feel that probably still makes sense.

Things done

  • Built on platform(s)
    • x86_64-linux
    • aarch64-linux
    • x86_64-darwin
    • aarch64-darwin
  • For non-Linux: Is sandboxing enabled in nix.conf? (See Nix manual)
    • sandbox = relaxed
    • sandbox = true
  • Tested, as applicable:
  • Tested compilation of all packages that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage
  • Tested basic functionality of all binary files (usually in ./result/bin/)
  • 23.11 Release Notes (or backporting 23.05 Release notes)
    • (Package updates) Added a release notes entry if the change is major or breaking
    • (Module updates) Added a release notes entry if the change is significant
    • (Module addition) Added a release notes entry if adding a new NixOS module
  • Fits CONTRIBUTING.md.

Priorities

Add a 👍 reaction to pull requests you find important.

Mic92 and others added 2 commits November 23, 2023 11:53
Due to openzfs/zfs#15533, we do not want to
update to 2.2.1, but we *do* want patches from 2.2.1 to address block
cloning issues (see openzfs/zfs#15529).

The first patch does not apply cleanly due to a trivial conflict in the
context, so vendor the patch.
@@ -25,4 +25,12 @@ callPackage ./generic.nix args {
version = "2.2.0";

sha256 = "sha256-s1sdXSrLu6uSOmjprbUa4cFsE2Vj7JX5i75e4vRnlvg=";

extraPatches = [
./brt-tunable.patch
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think we should check this in our zfs nixos test?

% zpool get all | grep feature@block_cloning
zroot  feature@block_cloning          disabled                       local

after having the flag?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pool feature will still be enabled, instead /sys/module/zfs/parameters/zfs_bclone_enabled will be available and 0.

It’s a bit tricky since with #269452 we will have a variant where we expect it to not be present at all (so we should handle that), but also vanilla 2.2.0 won’t have it either—but then we want to fail if it’s not present. From initial glance, makeZfsTest makes it a bit tricky to switch on the version… I guess could switch on the version in the test itself?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can definitely encode version checking inside makeZfsTest if you want, you just have to use package and look at its version.

@eadwu
Copy link
Member

eadwu commented Nov 24, 2023

This release is still exposed to a different issue openzfs/zfs#15533 (comment). Although it does not appear to result in any corrupted data.

@SuperSandro2000
Copy link
Member

I don't think we should that deeply in zfs. Lets better stay on an older version than starting to patch things together.

@amarshall
Copy link
Member Author

amarshall commented Nov 24, 2023

@SuperSandro2000 Generally I agree, but folks who have pools with active v2.2+ pool features cannot trivially downgrade.

@numinit
Copy link
Contributor

numinit commented Nov 25, 2023

We may want to wait for 2.2.2? Block cloning does not appear to be the issue, and there is a partial mitigation for 2.2.0 or prevoius versions in boot.kernelParams = [ "zfs.zfs_dmu_offset_next_sync=0" ];.

@ius
Copy link
Contributor

ius commented Nov 25, 2023

I believe there are two things that need fixing:

  1. A data corruption issue: should be fixed by dnode_is_dirty: check dnode and its data for dirtiness openzfs/zfs#15571
  2. A write error issue in v2.2.1 (only), offending commit is identified here, reverting it fixes it: CKSUM and WRITE errors with 2.2.1 stable, when vdevs are atop LUKS openzfs/zfs#15533 (comment)

As we now know block cloning doesn't appear to be the root cause for the corruption issue (openzfs/zfs#15526 (comment)) - so I think we should be fine with just the patch for (1). The zfsUnstable part of the PR is still affected by (2).

I think we should be able to converge to a single proposal, having the discussion split among two PRs is a bit awkward. Can we close one?

I feel updating to v2.2.1 and cherry picking the fix + revert makes the most sense. Or wait for upstream to release, though it would be nice to have a version of ZFS that doesn't eat data when NixOS 23.11 is released...

For reference, the upstream issues:

@adamcstephens
Copy link
Contributor

adamcstephens commented Nov 26, 2023

I'm hesitant to adopt openzfs/zfs#15571 even though this was patch was applied in zfs_2_1. While this may be the patch that is accepted, I am not the person to make that decision. As this bug has been around for a long time, and only become more common with recent changes, I propose we instead roll back those recent changes.

My recommendation would be to proceed with 2.2.1, which disables block cloning, and follow Gentoo in patching zfs_dmu_offset_next_sync=0 back to the default. Their patch is here: https://gitweb.gentoo.org/repo/gentoo.git/tree/sys-fs/zfs-kmod/files/zfs-kmod-2.2.1-Disable-zfs_dmu_offset_next_sync-tunable-by-default.patch

These two should hopefully be enough to calm some fears while upstream has a chance to fix the underlying bug.

@numinit
Copy link
Contributor

numinit commented Nov 26, 2023

I have been testing with https://github.com/numinit/nixpkgs/tree/zhammer (managed to repro it in NixOS tests) so we have some hard data to base our decision on.

Summary here: openzfs/zfs#15526 (comment)

Based on these data (10 million is still running) I would recommend 2.2.1 with @robn's patch (openzfs/zfs#15571), but I understand the hesitancy since it is not yet upstreamed. Note that it is still possible to repro with zfs_dmu_offset_next_sync=0, and I have done so in NixOS tests.

@adamcstephens
Copy link
Contributor

I think it would be good to get something merged today if possible, if nothing else than to reduce the risk. If others feel comfortable applying openzfs/zfs#15571 then I'll support that.

@amarshall seeing as this is your PR, what is your opinion and how would you like to proceed?

@amarshall
Copy link
Member Author

Closing, see #269097 (comment)

@amarshall amarshall closed this Nov 26, 2023
@amarshall amarshall deleted the zfs-220-no-bclone branch November 26, 2023 21:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants