-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
config/kernel: enforce kernel max version, with escape hatch #15986
Conversation
As the reporter for #15930 , I am obligated to note that running OpenZFS on not yet officially supported kernels is exactly how I find bugs to report. As such, I would suggest some sort of ALL CAPS DANGER WILL ROBINSON warning as opposed to an error out condition that I then have to patch around for testing. Having said that, it would be nice to have a 2.2.x proposed branch with back ported patches known to be needed for new kernel support merged in (even if a release isn't ready to be tagged), for those of us who are actively testing new kernel support. |
There is zfs-2.2.4-staging but it's a bit annoying because you will have to change the branch each and every time a new minor version gets released. Would be nice to have a |
I'm confused; that's literally what the |
Yes but I would note that https://github.com/openzfs/zfs/tree/zfs-2.2.4-staging does not yet have #15931 backported. I've been using #15931 in my own 2.2.3-based PPA, and do use it with kernel 6.8.0, but I label that as experimental on purpose. As I understand it, the current OpenZFS workflow is to add commits to the staging branch at the point when it is decided that it is probably time to tag a release. I have no problems with that, but it does mean that someone wanting to use a newer kernel has to keep on top of the PRs (submitted and/or accepted) to figure out what patches might need to be applied to get that additional kernel support working properly. Would it be nice to have some subset of OpenZFS tagged releases follow the kernel release cycle? Sure. But I'm not funding development of this well-honed software project, so I don't get a say in that. |
I would remark also that if people expect this warning to tell them it's a bad idea, they may be burned by expecting the inverse implication, that it is known to work on things this warning doesn't come up from, and then someone cherrypicks a breaking change into Linux LTS or a distro cherrypicks something from the future and people are very surprised indeed. |
@satmandu that's because it still hasn't been merged in master either. Once it lands in master it will get backported. I don't know why it's taking so long. I usually look at commits to backport in my own branch where I add compatibility patches for newer kernels but I've missed that one, I will start looking at open PRs either. |
My apologies for being obtuse. I just meant that in my PPA I would have to add a patch to apply the |
How is this patch supposed to work? I've asked the Arch Linux maintainer to apply it to their dkms but I've seen reports that it doesn't work. |
If If you set
"It doesn't work" needs details, preferably a build log. |
I will say, if we do land this patch and vendor distributions simply add I definitely do want to know about breakages specifically because of changes in newer kernel versions, but I am far less interested in standard OpenZFS operational issues on unsupported configurations. If downstreams are going to actively ignore recommendations, then they need to do more to support their users directly, or assist them when they turn up here. I get that some distributions' whole purpose is to ship bleeding-edge everything, but that must be balanced by at least making it clear that OpenZFS' traditional stability and reliability guarantees may not hold in those situations. Quietly hiding this fact is just dishonest. At the very least, I would like those distributions to inform their users of this by some other mechanism, if showing output from OpenZFS configure is not possible or appropriate. If that's a non-starter, I'm happy to take suggestions on alternate methods. I don't want to be a dick about it, but its already incredibly difficult to support the range of kernels we do. Distributors adding even more combinations without also getting involved in their support and upkeep just seems rude. |
The supported Linux version range today is the fully supported range. Is there value in having this flag only enable a list of versions that are known to be stable enough for bleeding edge testing? As a disincentive to shipping this to unsuspecting users, this flag could also imply debug build and/or emit warnings to kmsg on import and when invoking tools. |
Ok, just wanted to be sure that the check aborts and doesn't simply show a warning.
They didn't provide any, I've just checked myself and it looks like it's working as expected:
They probably filed the report against the wrong package and the one they're using didn't have this patch backported. |
@darkbasic I'm confused. That build does have this patch - they've taken the patch, and then not used the option? I don't even know why you would. |
That log is from my build, which does indeed have the patch. What happened is that a user commented on the AUR about being able to successfully build the dkms against 6.8. The package on the AUR has backported the patch, so I started wondering if this pr is supposed to downright fail or just add a warning. I've then tried it myself and it does indeed the former. What I guess has happened is that the user uses the zfs Arch repository, which contrary to the AUR probably didn't backport this PR. Why he commented on the AUR remains a mystery. |
Ok, I'm sensing confusion about the intent of this change. Maybe that's me confused, or maybe I'm doing fine but haven't explained it properly. So I'll try explaining again, and if we're still no good, I'll let someone point out that I'm confused, and then I'll quietly withdraw into the hedges. We publish a "maximum supported" kernel version in In 2.2.3 we shipped experimental support for 6.8. This was buggy/incomplete. People tried it, noticed problems, reported them. That's great! But, there was also a certain amount of pressure in those requests (and elsewhere) to have it fixed quickly, which I felt was unreasonable. I wondered if maybe the problem was that it wasn't easily discoverable that kernels beyond 6.7 were unsupported/experimental, and thus this patch: a message to let you know and a way to explicitly opt-in to potential carnage. In my opinion, if a vendor distributes OpenZFS for a kernel with higher version than the maximum version listed in META, then that vendor is explicitly opting their users into an experimental/unsupported configuration, and if that breaks (incl. data loss), the responsibility is mostly on the vendor, not on the OpenZFS project itself. This patch doesn't change that theory, but would at least make it very clear that the vendor is making that decision also. (not that I don't want to hear about such breakage; but I don't want an irate end-user blaming us for shipping broken software when we didn't, at least not knowingly). The alternatives to this seem to be either to never put experimental patches anywhere near a release series (even with warnings), or to make experimental builds available ahead of time. Builds are probably ideal but requires time and infrastructure we mostly don't have, so shipping experimental support with warnings on it at least is a straightforward way to get it into people's hands. I dunno, this felt like a light touch :) |
Maybe, except most of the time OpenZFS won't even compile against a new kernel version, due to their perpetual API churn. So this was mostly intended as a gate for when we do ship early support but don't yet know if its complete. Maybe instead META could have
A warning to the kernel log seems reasonable and benign. I like the idea of building with debug, though I wonder if its too heavy-handed so long as failed assertions panic the whole module. I'm pretty sure I don't want that if vendors are going to be quietly opting users into this option; that's very much a "you are definitely testing now" and I don't think that's entirely fair to spring on people unknowingly. On the other hand, maybe a vendor is going to be far less inclined to enable this option if the result is "kernel crashes" vs "mild inconvenience". |
Thanks for explaining. Yes this is exactly the sort of idea. In that approach it is nice and clear where supported ends and experimental starts, and it's also opt-in only.
What do you think about a middle of the road option where assertions are built and executed but print warnings instead? The outcome is then identical for the user - corruption maybe or other unknown problems the assertions are intended to catch - but they get an actionable report to file if desired. |
I'm persuaded. It's not miles away from this PR in function, but it feels a lot more solid.
I... rather like this. I'll have a play with it. |
What I am wondering at this point: As a current example: |
@robn I'm fine with your overall approach. You might want to add another line to give the user a hint that
Also, is this still "Draft" or are you ready for it to be approved? |
Yeah, I have no idea. That's part of why I want this!
Yep, I'd hope they're paying attention. |
Hold off for the moment, I'm going to try a different approach first and see if it feels nicer. |
While I hate that this is necessary, I think it's a good idea. ZFS support is critical to Void Linux, and we hold off bumping our generic |
1385d6e
to
08b6eff
Compare
Alright, here's a rework based on the discussion:
More detailed output: By default, configure will abort when trying to compile against any kernel higher than than
With
For a kernel version higher than
With
This will define
Finally, kernel versions below
Note that
Finally, I have a prototype of "soft assertions", but I want to think about it more, because it's a little more invasive than I would like for this PR, and might actually be a thing we want to make more widely available. So I'm not going to consider it further for this PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks quite good
Two notes
- For the hard stop, maybe have configure print how to bump the max version in the local repo for bleeding edge development e.g. "Contributors: See ... for supporting newer versions"
- META or somewhere else should write down guidance on what Linux-Maximum-Experimental should be. Is it meant to be builds, builds and passes tests, fairly stable, or just bumped as soon as the porting effort starts...?
Unfortunately, updating META with new kernel versions is already a maintenance burden, and an additional field will add to that burden. As @rrevans mentioned, the criteria for "experimentally supported but not officially supported" is still undefined. Overall, I don't see a benefit over the simpler
|
08b6eff
to
43d2c39
Compare
Ahh sorry @rrevans, I had missed your comment. I've been using "experimental" through this PR to mean a vague statement of "we are not ready to commit to OpenZFS' traditional guarantees of reliability and stability, but it's probably fine". Maybe in other contexts that would be "beta" or something. Regardless, I'll agree that it's not specific enough to "enforce". But, I would still prefer to keep So, update pushed. I've removed
|
43d2c39
to
cc6b9d7
Compare
cc6b9d7
to
be32257
Compare
@tonyhutter what do you reckon? |
be32257
to
b4f3c4e
Compare
b4f3c4e
to
1048eec
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally, I like --enable-linux-experimental
and the slightly scary warning. It makes it quite clear what you're opting in to.
AS_VERSION_COMPARE([$kernsrcver], [$ZFS_META_KVER_MIN], [ | ||
AC_MSG_ERROR([ | ||
AX_COMPARE_VERSION([$kernsrcver], [ge], [$ZFS_META_KVER_MIN], [], [ | ||
AC_MSG_ERROR([ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that the pre-4.18 kernel changes have been merged it look like there's only one place left which uses AS_VERSION_COMPARE
. Once this merges we should switch the AS_VERSION_COMPARE
call in config/kernel-blkdev.m4
over to AX_COMPARE_VERSION
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yeah, good spot. Noted.
META lists the maximum kernel version we consider to be fully supported. However, we don't enforce this. Sometimes we ship experimental patches for a newer kernel than we're ready to support or, less often, we compile just fine against a newer kernel. Invariably, something doesn't quite work properly, and it's difficult for users to understand that they're actually running against a kernel that we're not yet ready to support. This commit tries to improve this situation. First, it simply enforces Linux-Maximum, by having configure bail out if you try to compile against a newer version that. Then, it adds the --enable-linux-experimental switch to configure. When supplied, this disables enforcing the maximum version, allowing the user to attempt to build against a kernel with version higher than Linux-Maximum. Finally, if the switch is supplied _and_ configure is run against a higher kernel version, it shows a big warning message when configure finishes, and defines HAVE_LINUX_EXPERIMENTAL for the build. This allows us to add code to modify runtime behaviour as well. Signed-off-by: Rob Norris <[email protected]> Sponsored-by: https://despairlabs.com/sponsor/
Since the person using the kernel may not be the person who built it, show a warning at module load too, in case they aren't aware that it might be weird. Signed-off-by: Rob Norris <[email protected]> Sponsored-by: https://despairlabs.com/sponsor/
1048eec
to
4623240
Compare
Rebased, retested, pushed. |
I like
I would also mention the flag in the build failure as an aid to the user, like:
|
Let people do at least basic effort to research how to bypass this. Your suggestion sounds to me like a warning:
|
😄 I wish it was that exiting! 99% of the time it's just not going to build. Honestly, if you're building ZFS from source with a bleeding edge kernel then you're probably used to playing with fire. And if that isn't the case, the new warning message does make things clear:
|
Generally, I prefer "enable danger mode" over "disable safe mode", because I really want it to be clear that the user is opting into something bad, and all the info they need is in front of them. Similarly, I don't like "press the red button to turn off the safety", because the user hasn't had to make any effort to think about the situation and assess the risks. I won't die on this hill; having the option at all is more important to me than how its named, and I know that in practice the risks in this case are likely low - its more likely that something just doesn't work than does the wrong thing. But as a rule, I do think that when it comes to anything where the cost of a mistake is data loss, we should be very very clear what is happening. So, I will not argue against rewording (and of course I can just be straight up overruled; at the end of the day I'm just Some Guy 🐶) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@robn I hear what you saying, lets just stick with --enable-linux-experimental
then 👍
Since the person using the kernel may not be the person who built it, show a warning at module load too, in case they aren't aware that it might be weird. Reviewed-by: Robert Evans <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Signed-off-by: Rob Norris <[email protected]> Sponsored-by: https://despairlabs.com/sponsor/ Closes #15986
Motivation and Context
It's possible for OpenZFS to build correctly against a newer kernel than it is supported for, but then not work correctly. This invariably results in disappointment, confusion and/or anger. See #15930/#15931 for a recent example.
Since it's not feasible for us to match Linux's release frequency, the next best thing seems to be to warn the user that they're entering the Nightmare Realm, so they aren't surprised when the wolves get them.
Description
[EDIT 2024-05-05: this PR has been updated based on discussion here. See this comment below for description of the new version. I'll leave this top comment here for context]
Check the kernel version we're configuring against and bail out if the kernel is too new.
Sometimes however we do actually want to compile against a newer kernel than is supported, usually when testing a pre-release kernel. Add the deliberately-verbose
--disable-supported-linux-version-check
option to disable this check. This is lots to type, and so hopefully can be taken as a very explicit signal that the user knows what they're doing.Finally, if an unsupported kernel is used and the option is used, a big warning message is displayed at the end of the configure run to really try and make the point.
How Has This Been Tested?
Configuring as normal against a kernel in the supported range does what it always has:
Configuring against a kernel version higher than the max supported throws an error:
Overriding allows it to continue:
Should configure succeed after overriding the check (as currently happens on 6.8), it throws a big warning at the end:
Types of changes
Checklist:
Signed-off-by
.