Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Linux: bump minimum kernel version to 4.4 #16434

Closed
wants to merge 46 commits into from
Closed

Conversation

robn
Copy link
Member

@robn robn commented Aug 11, 2024

Motivation and Context

With RHEL7 behind us, we can set a new line and rip out a ton of stuff.

Description

Choosing a reasonable minimum version

RHEL 8 is based on 4.18, which is the oldest kernel in any "mainstream" LTS distribution. However, 4.19 is the oldest "official" LTS from kernel.org, and the last 4.18 from there (4.18.20) does not build on recent compilers. Its important for my testing that I can build OpenZFS locally for any supported kernel, or at least range of kernels. This presents a challenge.

The LF Civil Infrastructure Platform are providing 10-year "Super LTS" maintenance for certain old kernel series. 4.4.302 is the most recent pre-4.18 SLTS kernel, and builds well on recent compilers, so I have selected that as a reasonable base that can cover anything possibly in 4.18.

Musings on supported kernel range

If I'm honest, I'd say 4.4 is really too old, and probably we'd be better to declare a fixed set of operating systems we support rather than a range of version numbers. I say this because currently (before this PR) we say 3.10 to take in RHEL7, even though the last upstream 3.10 (3.10.108) is not easy to build on recent compilers, and using a distribution that shipped a standard kernel of that vintage can't easily build recent OpenZFS (lack of Python 3 for a start).

For this PR, I'm content with a range, but if the brains trust are amenable to a specific support set, I'd be happy to start putting a list together and working out what that means for us with testing and etc.

Remove unused build checks

There's two classes of unused or obsolete build checks that I've gotten rid of:

  • any that set a #define that does not appear anywhere in the codebase. These mostly seem to be things that used to exist, were removed and the corresponding check was not removed.
  • any that simply abort configure if they fail. These all appear to be the ancient ~2.6 baseline checks. They aren't entirely useless as-is, but since they always pass on any modern kernel, its mostly wasted build time.

Remove checks and code for kernel versions we no longer support

The method here was to build on a wide range of kernels (see the testing section below), and compare the check output. For any checks that always succeeded or always failed on all builds between 4.4 and ~now, remove them.

I've kept these as separate logical commits in the PR to assist with review, however I would expect to squash them down before merge.

Retained checks

I have kept the following checks and their corresponding code because they appear to be there to support non-amd64 architectures, which I do not have ready access to:

  • HAVE_KERNEL_FPU_API_HEADER
  • HAVE_KERNEL_FPU
  • HAVE_KERNEL_NEON
  • HAVE_CPU_HAS_FEATURE_GPL_ONLY
  • HAVE_FLUSH_DCACHE_PAGE_GPL_ONLY
  • HAVE_ZERO_PAGE_GPL_ONLY
Untested checks

The following have been retained because I believe they are used in kernels in the supported range, but I don't have those kernels available to confirm:

  • HAVE_BIO_IO_ACCT: 5.7-5.11
  • HAVE_BLK_ALLOC_QUEUE_REQUEST_FN: 5.7-5.8
  • HAVE_BLK_CLEANUP_DISK: 5.16-5.19
Dicerolls
  • HAVE_DECLARE_EVENT_CLASS: removed because DECLARE_EVENT_CLASS has been unavailable to us since 4.4 (changed to a GPL export) and definitely wasn't available in any of my build tests. However, it's a lot of tracing code and I'm not totally convinced that its not used in some cases. (example from just today: kernel linux 6.10.3 build error #16433 has some smells that suggest it might have been used). [robn 2024-08-12: bad example, that turns out to be a custom build. I am still a little uncertain about this change, but less so than before].

  • HAVE_XATTR_GET_DENTRY_INODE_FLAGS: I can't see that this was ever in a released kernel. Our comments say its a variation for an Android kernel, of all things. I have no way of knowing if its still relevant to anything. I suspect not, but its a fairly benign bit of code, so I've kept it for the moment.

Tidying configure output

The last commit just fixes some check output that was missing. Quite boring!

How Has This Been Tested?

I did a configure, build, and simple run consisting of module load, pool create, short fio r/w run, export, import, scrub, export module unload, on the following kernel.org kernels. All passed:

  • 4.4.302
  • 4.9.337
  • 4.14.336
  • 4.19.319
  • 5.4.281
  • 5.10.223
  • 5.15.164
  • 6.1.102
  • 6.2.16
  • 6.3.13
  • 6.4.16
  • 6.5.13
  • 6.6.43
  • 6.7.12
  • 6.8.12
  • 6.9.12
  • 6.10.2

I did a compile, build and module load on the following distributions (note: no actual workload, just the module load):

  • 4.15.0-117-generic (Ubuntu 16.04)
  • 5.4.0-84-generic (Ubuntu 18.04)
  • 5.13.0-30-generic (Ubuntu 20.04)
  • 5.15.0-25-generic (Ubuntu 22.04)
  • 6.8.0-31-generic (Ubuntu 24.04)
  • 4.18.0-348.7.1.el8_5.x86_64 (CentOS 8.4.2105)
  • 4.18.0-553.8.1.el8_10.x86_64 (Rocky 8.6)
  • 5.14.0-427.28.1.el9_4.x86_64 (Rocky 9)

I've also compiled this on FreeBSD 14.1, just to make sure I hadn't damaged anything. Compiled fine.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking changere which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

Checklist:

robn added 30 commits August 10, 2024 21:43
Selected as the most recent CIP SLTS[1] release (and thus still
buildable on modern compilers), before 4.18, RHEL8's base kernel[2].

At time of writing, latest is 4.4.302.

1. https://wiki.linuxfoundation.org/civilinfrastructureplatform/start#kernel_maintainership
2. https://access.redhat.com/articles/3078#RHEL8

Sponsored-by: https://despairlabs.com/sponsor/
Signed-off-by: Rob Norris <[email protected]>
All of these set a #define that doesn't appear anywhere in the tree.

Sponsored-by: https://despairlabs.com/sponsor/
Signed-off-by: Rob Norris <[email protected]>
All these checks abort configure if they fail. They're looking for
things that have existed forever, and so never fail. While they do have
some tiny value in detecting if these ever change, we'll also get that
from the build breaking. Removing them is a bunch of work we don't have
to do on every run.

Sponsored-by: https://despairlabs.com/sponsor/
Signed-off-by: Rob Norris <[email protected]>
robn added 11 commits August 10, 2024 21:43
@snajpa
Copy link
Contributor

snajpa commented Aug 11, 2024

I only skimmed through it and I have a tiny point to make regarding the zpl_ renames... I would guess that if you revert to more generic versions, it's only a matter of time before it conflicts with something somewhere (again)... [I wouldn't remove the prefixes, but that's me]

@robn
Copy link
Member Author

robn commented Aug 11, 2024

@snajpa the handful of renames are where we now only have one variation, so its just an alias for no good reason. Since we do call kernel functions directly where we haven't needed a compat wrapper, this can easily lead to the question of why this particular thing gets a no-op wrapper. Thus, I remove them to make it clear that this is just a regular kernel call.

If that change again in the future, then we can bring them back; that's no big deal. Having the name or not won't change that.

As far as I can tell, this never made it to a real release. It was
introduced in 6b2553918d8b and removed a couple of weeks later in
fceef393a538. This was all part of the development of what would become
4.5. So I assume this was OpenZFS chasing upstream development at the
time.

    fceef393a538 viro 2015-12-30 switch ->get_link() to delayed_call, kill ->put_link()
    cd3417c8fc95 viro 2015-12-29 kill free_page_put_link()
    0d0def49d05a viro 2015-12-08 teach nfs_get_link() to work in RCU mode
    1a384eaac265 viro 2015-12-08 teach proc_self_get_link()/proc_thread_self_get_link() to work in RCU mode
    6a6c99049635 viro 2015-12-08 teach shmem_get_link() to work in RCU mode
    d3883d4f9344 viro 2015-12-08 teach page_get_link() to work in RCU mode
    6b2553918d8b viro 2015-12-08 replace ->follow_link() with new method that could stay in RCU mode

Sponsored-by: https://despairlabs.com/sponsor/
Signed-off-by: Rob Norris <[email protected]>
@robn
Copy link
Member Author

robn commented Aug 14, 2024

We had a chat about this in the call this morning and consensus was that 4.18 should be the minimum, by which we mean, actually kinda 4.19 but 4.18 to bring in EL8.

So I'll soon be closing this and opening a new PR to start out at 4.18. It'll look pretty much like this one, just removing even more stuff. It'll also need another round of testing, so likely a few days.

It was also suggested that we should be a bit more clear about which kernels & distributions we are regularly testing on. I'm going to take a first pass at expressing that somewhere, using the list at top of this thread as a starting point. Separate PR though.

@behlendorf
Copy link
Contributor

@robn thanks for working on this. I'm looking forward to shedding a bunch of this compatibility code.

any that simply abort configure if they fail. These all appear to be the ancient ~2.6 baseline checks. They aren't entirely useless as-is, but since they always pass on any modern kernel, its mostly wasted build time.

The kernel developers aren't shy about taking away long standing interfaces so unless these checks significantly increase the build time I'd prefer to keep them.

HAVE_DECLARE_EVENT_CLASS: removed because DECLARE_EVENT_CLASS has been unavailable to us since 4.4.

We should keep this check to allow local custom builds to use the debug functionality.

HAVE_XATTR_GET_DENTRY_INODE_FLAGS: I can't see that this was ever in a released kernel.

Over the years we've included a handful of changes to handle patches included in various popular distribution kernels. Unless we're sure these are no longer needed let's hold on to that compatibility code.

@behlendorf behlendorf added the Status: Code Review Needed Ready for review and testing label Aug 14, 2024
@Harry-Chen
Copy link
Contributor

Just some info on your "retained checks":

  • HAVE_KERNEL_FPU_API_HEADER, HAVE_KERNEL_FPU, HAVE_KERNEL_NEON: all for kSIMD handling, NEON is for aarch64 only
  • HAVE_CPU_HAS_FEATURE_GPL_ONLY: powerpc-specific polyfill
  • HAVE_FLUSH_DCACHE_PAGE_GPL_ONLY: mips-specific polyfill
  • HAVE_ZERO_PAGE_GPL_ONLY: polyfill on some arches

kSIMD is somewhat complicated on X86. The latter three could be removed when zfs compiles and loads without error.

@snajpa
Copy link
Contributor

snajpa commented Aug 18, 2024

On the topic of HAVE_DECLARE_EVENT_CLASS and changing the META file, there's another way how to make OpenZFS compatible with Linux license-wise that we've been using - hopefully this link leads to correct line

IMO this is somewhat more honest way to get the _GPL exports to ZFS code, rather than faking ZFS code's licence - the patch signals a legal opinion that would be tough to beat in most parts of the world (it seems very tough to argue that CDDL and GPL are so different in spirit to make them incompatible as much as to disallow combining the works)

The downside is obviously the need to recompile kernel, but subsequently no changes are needed to the whole OpenZFS codebase, especially when the configure checks are nicely made to be generic enough to start working when license_is_gpl_compatible.

@Harry-Chen
Copy link
Contributor

Harry-Chen commented Aug 23, 2024

As for the HAVE_DECLARE_EVENT_CLASS patch, seems @tonyhutter also got hit by compilation errors gated behind this macro (#16450 (comment)) when using copy-builtin. So the polyfill code might still be used in some supported way, i.e., not when simply changing license in META.

And the error itself it not hard to fix (1 2), though.

@Harry-Chen
Copy link
Contributor

@Harry-Chen yes I agree this problem is not difficult to fix. what's the point of keeping dead code that causes problems?

It is not easy to determine whether it is 'dead' or not. As I say, there may be legitimate use cases. So a poll or something might provide more information?

@robn
Copy link
Member Author

robn commented Aug 24, 2024

Specific issue and partial fix for the DECLARE_EVENT_CLASS thing in #16475.

@robn
Copy link
Member Author

robn commented Aug 24, 2024

@wmmur I don't recall "perceived as an attack". My entire recollection is that you posted about your problem in unrelated PRs, and I asked you not to because you had already opened an issue and anything more was just creating noise.

@robn
Copy link
Member Author

robn commented Aug 26, 2024

Replaced by #16479.

@robn robn closed this Aug 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Code Review Needed Ready for review and testing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants