Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NAS-130427 / 25.04 / Update Linux kernel to next LTS release v6.12 #200

Merged
merged 67 commits into from
Nov 21, 2024

Conversation

usaleem-ix
Copy link
Contributor

All of our updates have been ported on top of Linux v6.12.0. This includes all the patches from #139 (Update to 6.6) and other PRs that were merged into truenas/linux-6.6.

Following patches were updated to resolve merge conflicts while porting to 6.12:

  • a0d1961 Add initial support for large xattrs
  • 365c6a0 Introduce driver_override for NTB devices (This patch wasn't updated itself but since struct bus_type ntb_bus is now declared as const, there was a merge conflict)
  • a7a8ae6 ahciem: Emulate SES enclosure for AHCI enclosures
  • aa34aa0 Implement native NFSv4 ACLs in NFS server
  • 932f261 nvme: skip optional id ctrl csi for versions less than 2.0.0
  • 1f80fd7 Introduce NVDIMM NTB mirroring driver
  • 9d34be8 Add DACL support to nfsd (v4.1+)
  • 21bc41d fs/cifs - add ZFS ACL support to SMB client

25.04 image with 6.12 is present at here. API tests show 6 failures.

This commit adds TrueNAS build customization required for building
Debian packages for TrueNAS SCALE kernel.

The original commit ported from v6.6 is
0c5b36a.
@bugclerk
Copy link

Copy link
Collaborator

@amotin amotin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I seems "NAS-123903 / NFS41 ACL support for NFS4 protocol" commit squashed previously got split back somehow. I don't have preference, but makes me wonder if it is the same version or if there are any artifacts, may worth checking. @bmeagherix

fs/xattr.c Outdated
if (size > XATTR_SIZE_MAX) {
if ((size > XATTR_LARGE_SIZE_MAX) ||
(IS_LARGE_XATTR(path.dentry->d_inode) == 0)) {
return -E2BIG;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have feeling here are missing path_put(&path) and kvfree(ctx.kvalue).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While there, I wonder what was the point to check for size and then for size > XATTR_SIZE_MAX. Though it is not new and insignificant.

Copy link
Contributor

@anodos325 anodos325 Nov 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like it got moved from where it should be because of refactoring in xattr.c by upstream. It should be in do_setxattr otherwise you're introducing difference between setxattr and fsetxattr.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I was working on an updated 6.6 for testing purposes I did something like this:

commit 8a1aae17e6f54be3fda82a76a03f07004d7a730c
Author: Andrew Walker <[email protected]>
Date:   Thu Oct 31 16:24:00 2024 -0400

    Fix do_setxattr

diff --git a/fs/xattr.c b/fs/xattr.c
index ce0d11556bea..30f2a5822d4f 100644
--- a/fs/xattr.c
+++ b/fs/xattr.c
@@ -669,6 +669,13 @@ int do_setxattr(struct mnt_idmap *idmap, struct dentry *dentry,
                return do_set_acl(idmap, dentry, ctx->kname->name,
                                  ctx->kvalue, ctx->size);

+#ifdef CONFIG_TRUENAS
+       if (ctx->size &&
+           (ctx->size > XATTR_SIZE_MAX) &&
+           (IS_LARGE_XATTR(d->d_inode) == 0)) {
+               return -E2BIG;
+       }
+#endif
        return vfs_setxattr(idmap, dentry, ctx->kname->name,
                        ctx->kvalue, ctx->size, ctx->flags);
 }
@@ -677,17 +684,6 @@ static int path_setxattr(const char __user *pathname,
                         const char __user *name, const void __user *value,
                         size_t size, int flags, unsigned int lookup_flags)
 {
-#ifdef CONFIG_TRUENAS
-       if (size) {
-               if (size > XATTR_SIZE_MAX) {
-                       if ((size > XATTR_LARGE_SIZE_MAX) ||
-                           (IS_LARGE_XATTR(d->d_inode) == 0)) {
-                               return -E2BIG;
-                       }
-               }
-       }
-#endif
-
        struct xattr_name kname;
        struct xattr_ctx ctx = {
                .cvalue   = value,
                

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated as you suggested, can you please take a look?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at this again (been a few years), we may also want to set upper bound in setxattr_copy (unrelated to current questions):

diff --git a/fs/xattr.c b/fs/xattr.c
index c5a8a7c3fb4a..a9b12be7283f 100644
--- a/fs/xattr.c
+++ b/fs/xattr.c
@@ -646,7 +646,10 @@ int setxattr_copy(const char __user *name, struct xattr_ctx *ctx)

        error = 0;
        if (ctx->size) {
-#ifndef CONFIG_TRUENAS
+#ifdef CONFIG_TRUENAS
+               if (ctx->size > XATTR_LARGE_SIZE_MAX)
+                       return -E2BIG;
+#else
                if (ctx->size > XATTR_SIZE_MAX)
                        return -E2BIG;
 #endif

To avoid large memdup_user(). Thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do above change then we can simplify check in do_setxattr to return -E2BIG if does not support large xattrs and xattr size is larger than XATTR_SIZE_MAX.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I have updated accordingly to check for hard limit of XATTR_LARGE_SIZE_MAX in setxattr_copy.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The indentation is not right again.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry about that, it should be fixed now.

drivers/ata/ahci.c Outdated Show resolved Hide resolved
fs/nfsd/nfs4acl.c Show resolved Hide resolved
@usaleem-ix
Copy link
Contributor Author

but makes me wonder if it is the same version or if there are any artifacts, may worth checking. @bmeagherix

I cherry-picked the last pushed artifacts from source branch of #142 to break it down and make it easier to port. It's worth taking a look there @bmeagherix.

@usaleem-ix usaleem-ix force-pushed the NAS-130427 branch 3 times, most recently from 9515f5f to ed8fc7f Compare November 19, 2024 09:12
Copy link
Contributor

@bmeagherix bmeagherix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just reviewed the commits for which I was the original author. LGTM.

Copy link
Contributor

@ixhamza ixhamza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess you can remove Makefile.orig and smb2pdu.h.orig, as they seem to be leftover files from c80fcec.

@usaleem-ix
Copy link
Contributor Author

I guess you can remove Makefile.orig and smb2pdu.h.orig, as they seem to be leftover files from c80fcec.

Thanks, I have updated.

@usaleem-ix usaleem-ix force-pushed the NAS-130427 branch 2 times, most recently from 776bcde to d5174dc Compare November 20, 2024 17:09
anodos325 and others added 13 commits November 20, 2024 22:23
Support for alternate datastreams over the SMB protocol has been
historically enabled in  such a way that Samba writes them as
filesystem extended attributes in the user namespace. FreeBSD has no
practical limit on xattr size, and so clients (often MacOS) may write
ones that exceed the 64 KiB limit imposed by the Linux kernel. Since
XATTR_SIZE_MAX is uesd in many places in the kernel, and not all
filesystems support large xattrs, introduce new constant
XATTR_LARGE_SIZE_MAX that is used as an alternate value if the
filesystem sb_flags has SB_LARGEXATTR. There will be corresponding
commit in ZFS to set this flag when it is defined and xattrs are
enabled on the ZFS dataset.

This commit also introduces flag SB_NFSV4ACL which will be used
to indicate and enable NFSv4-specific behavior in kernel with regard
to permissions.

These new features / alternate behavior are controlled by the
compile-time kernel compilation flag CONFIG_TRUENAS, which defaults
to n (off). In principle, TrueNAS-specific changes that deviate from
a vanilla Linux kernel can be removed for testing purposes by changing
CONFIG_TRUENAS=n in the relevant build scripts.

Signed-off-by: Andrew Walker <[email protected]>
There are various places in which evaluation of permissions
in the presence of an NFSv4 ACL is more nuanced than what is
typical when evaluating traditional POSIX permissions. For
example, a user may be permitted to delete a file if he
has DELETE permissions on the file or DELETE_CHILD permissions
on the parent directory. Traditional POSIX permissions will
only check for MAY_WRITE | MAY_EXEC on parent directory.

Several new inode permissions masks have been added to facilitate these
NFSv4-specific checks corresponding to different NFSv4 permissions
that grant abilities to make changes to files. For the purpose of
this commit and the goal of providing rough a approximation of
NFSv4 access checks, only write (and not read) access checks have
been implemented. This is selectively done in a way to grant
minimal compliance with permissions as defined in RFC-5661.

The new permissions-related behavior is only applied when the
inode sb_flag SB_NFS4ACL is present. In this case, the onus of full
implementation of requisite features to satisfy the ACL behavior
specified in RFC-5661 is delegated to the filesystem's inode
permissions interface (i_op->permission). If possible we try to
check for the convention POSIX permission first before trying
the NFSv4-equivalent. For example, when writing an xattr, we
check for WRITE_DATA before WRITE_NAMED_ATTRS because in the case
of former with a trivial ACL we can avoid having to evaluate the
full ACL, and instead merely look at POSIX mode.
csiostor seems to cause Chelsio T6 firmware to crash.

Jira: NAS-110910
Being written anything waits for all device probe to complete before
returning.  After that `udevadm settle` used by ZFS scripts really
can provide system with all disks detected for boot pool import.

Ticket:	NAS-108200
Enable NTB and NTB tools in the Truenas config.  In addition, enable
the Intel NTB driver, so that we have at least one NTB driver available.
Added initial support for PLX Non-Transparent Bridge.
Before this change it was impossible to load client modules before
NTB hardware is probed.  This change removes the limitation.  New
NTB transports will get their children devices as they come in.
This fixes interrupt storms on hardware using legacy level-triggered
interrupts, since doorbell processing could take time after interrupt
handler completion, that triggered extra interrupts in a loop.
If a previous successful run is
present, then skip re-run for its pull request.

Signed-off-by: Umer Saleem <[email protected]>
Previously for TrueNAS, Debian Linux kernel configuration was used and
TrueNAS config options were added on top of that. Because of that,
TrueNAS kernel config in 'scripts/package/truenas/tn.config' has grown
very large and difficult to manage for TrueNAS only options.

Debian Linux kernel configuration for version 6.1.55 has been added
as 'debian_amd64.config' to keep the options from Debian seperate from
TrueNAS options. TrueNAS only config options are stored in
'truenas.config'.

Kernel configuration can now be generated as:

	1) make ARCH=x86_64 defconfig
	2) ./scripts/kconfig/merge_config.sh .config \
			./scripts/package/truenas/debian_amd64.config
	3) ./scripts/kconfig/merge_config.sh .config \
			./scripts/package/truenas/truenas.config
	4) ./scripts/kconfig/merge_config.sh .config \
			./scripts/package/truenas/debug.config

Signed-off-by: Umer Saleem <[email protected]>
Without this it was impossible to use multple NTB consumer drivers,
since kernel just attached the first one to all NTBs.  This replicates
driver_override device attribute from PCI, plus adds module parameter
to ease default setting.

Signed-off-by: Alexander Motin <[email protected]>
bmeagherix and others added 21 commits November 20, 2024 22:23
… reuse

Also now use ACE4SIZE and add a check wrt remaining in
convert_nfs41xdr_to_nfs40_acl / convert_to_nfs40_ace
The nfs4_xattr_list_nfs4_acl_xdr, nfs4_xattr_get_nfs4_acl_xdr and
nfs4_xattr_set_nfs4_acl_xdr functions are implemented using the existing
client-side support for DACL.

Currently only OWNER@, GROUP@, EVERYONE@ and numeric uids or gids are
supported in the ACEs.
Add an xattr handler with same namespace as ZFS ACL xattr handler
to allow userspace utilities to easily preserve and convert
contents of SMB Security Descriptor DACL into native ZFS ACL
when ingesting data during migration via SMB client (for example
using rsync with the explicit option to preserve the xattr in
question).

This PR also adds a new procfs endpoint:
/proc/fs/cifs/zfsacl_configuration_options that can be used to
control error handling for cases where we can't convert SID
into a Unix ID, and currently also whether we allow setting
NT ACL via xattr writes on remote SMB server (disabled by default).
DCB seems to be generating spam on the console for users with
Chelsio T4. Disable DCB features for Chelsio T4.

Signed-off-by: Umer Saleem <[email protected]>
On several systems we've noticed that when NTB link goes down, the
Physical Layer User Test Pattern registers we use as additional
scratchpad registers (that is explicitly allowed by the chip specs)
become read-only for about 100us.  I see no explanation for this in
the chip specs, neither why it was not seen before, may be a race.
Since we do need these registers, workaround it by repeating writes
until we succeed or 1ms timeout expire.
Debian has defaulted to compressing the modules in XZ format. This
introduces some changes that are not desired. The debug info/symbols
for modules are not stripped anymore. As a result, the size of modules
increases ten folds. Previosuly, the debug symbols for modules and
kernel were packaged separately in a -dbg package. But after this
config update, -dbg package only contains symbols for kernel binary
itself. Kernel modules have in built debug symbols. The size of ISO
image increses 3-4 times because of this. Revert to preveious state
for this config.

Signed-off-by: Umer Saleem <[email protected]>
This adds support to emulate NVME functionality for the drives
connected on Trimode HBA. Userspace tools can talk to the NVME
device through passthrough commands, e.g., nvme-cli can identify
controller using `nvme id-ctrl /dev/sdX`.
SMB Protocol Background:
------------------------
Filesystems presented over the SMB protocol may support alternate data
streams ("named streams") within a file or a directory. This support is
designated by the filesystem attribute FILE_NAMED_STREAMS. Named streams
are not identical to extended attributes (EAs), which may also be
supported by the same SMB server.

A named stream is a place within a file in addition to the main stream
(normal file data) where data is stored. Named streams have different
data than the main stream (and than each other) and may be written and
read independently of each other. Named streams for a file are designated
by appending a ":" colon character to the file name followed by the
name of the alternate data stream. Stream names may be no more than 255
characters in length and are subject to the characteristics and
limitations documented in MS-FSCC Section 2.1.5 Pathname and following.

A list of named streams for a file can be gathered by submitting an
SMB2_QUERY_INFO request for FILE_STREAM_INFORMATION. The expected server
response is documented in MS-FSS Section 2.4.43 FileStreamInformation.

Streams are typically smallish in size (less than 200 bytes individually),
and are rarely used apart from MacOS SMB clients.

TrueNAS /ZFS background:
------------------------
Solaris supported a similar feature set through its file-backed xattr
capabilities and APIs. This meant that the kernel SMB server in solaris
was able to seamlessly provide support for named streams. When ZFS was
ported to FreeBSD and Linux the extattr and xattr OS APIs were layered
on top of the ZFS file-backed xattrs. As time progressed and ZFS on
Linux saw more use, it was determined that the performance and lack of
atomicity of operations on file-backed xattrs was insufficient for
some application requirements (this was especially the case for Samba
shares), this eventually led to the ZFS dataset configuration parameter
for SA-backed xattrs on Linux (which is the TrueNAS default). With this
configuration, xattrs up to a certain size are written as SA, and larger
xattrs are written as files. The practical result of this is that
TrueNAS can support extended attributes that are much greater in size
than a traditional Linux file server. Unfortunately, due to inability
to perform partial reads and writes on extended attributes a 2 MiB
upper bound is placed as the maximum size of a single extended attribute
/ named stream in TrueNAS.

Samba background:
-----------------
Samba has the ability to present extended attributes as named streams
to SMB clients. This is achieved by prepending a special prefix to
the extended attribute (to differentiate the streams xattrs from normal
xattrs that are presented as EAs over the SMB protocol). Due to
historical design decisions, the Samba module in charge of translating
xattrs into streams appends an extra NULL byte to the xattr on writes
to the local filesystem and strips it off when converting to a stream
for SMB clients.

Implementation details:
-----------------------
This commit adds support for the Linux kernel SMB2/3 client to enumerate
streams on a remote SMB server by including them in the output of
listxattr with the special Samba prefix. Streams may be written to
the remote SMB server via setxattr and read through getxtattr. The
Samba-specific behavior for appending / removing an extra byte to
the xattr can be disabled by setting /proc/fs/cifs/stream_samba_compat
to 0.
PCIe ACS (Access Control Services) is the PCIe 2.0+ feature that
allows us to control whether transactions are allowed to be redirected
in various subnodes of a PCIe topology.  For instance, if two
endpoints are below a root port or downsteam switch port, the
downstream port may optionally redirect transactions between the
devices, bypassing upstream devices.  The same can happen internally
on multifunction devices.  The transaction may never be visible to the
upstream devices.

One upstream device that we particularly care about is the IOMMU.  If
a redirection occurs in the topology below the IOMMU, then the IOMMU
cannot provide isolation between devices.  This is why the PCIe spec
encourages topologies to include ACS support.  Without it, we have to
assume peer-to-peer DMA within a hierarchy can bypass IOMMU isolation.

Unfortunately, far too many topologies do not support ACS to make this
a steadfast requirement.  Even the latest chipsets from Intel are only
sporadically supporting ACS.  We have trouble getting interconnect
vendors to include the PCIe spec required PCIe capability, let alone
suggested features.

Therefore, we need to add some flexibility.  The pcie_acs_override=
boot option lets users opt-in specific devices or sets of devices to
assume ACS support.  The "downstream" option assumes full ACS support
on root ports and downstream switch ports.  The "multifunction"
option assumes the subset of ACS features available on multifunction
endpoints and upstream switch ports are supported.  The "id:nnnn:nnnn"
option enables ACS support on devices matching the provided vendor
and device IDs, allowing more strategic ACS overrides.  These options
may be combined in any order.  A maximum of 16 id specific overrides
are available.  It's suggested to use the most limited set of options
necessary to avoid completely disabling ACS across the topology.

Note to hardware vendors, we have facilities to permanently quirk
specific devices which enforce isolation but not provide an ACS
capability.  Please contact me to have your devices added and save
your customers the hassle of this boot option.

Signed-off-by: Alex Williamson <[email protected]>
Committed-by: Umer Saleem <[email protected]>
The mpt3sas driver does not create a parent end device for PCIe types
where the SAS address is stored, causing the ses driver to not add
PCIe device types connected to a tri-mode HBA. To address this, the
fallback mechanism reads the SAS address from the VPD 0x83 page. This
change is inspired by commit 9927c68.
Upstream commit 6ebfede changed
API for SMB2_set_eof() which introduced regression in truncation
of SMB alternate data streams.
nfsd supports up to 1024 ACEs. Windows servers support a security
descriptor size of up to 64 KiB, which translates to around 1700
aces, but since we don't locally support that size we'll keep both
at 1024 max.
Enable CONFIG_ERROR option that treats all warnings as error during
kernel build.

Signed-off-by: Umer Saleem <[email protected]>
Label err_dma_mask is not being used and generates a warning at build
time. With CONFIG_WERROR enabled, this warning is treated as error and
breaks the build. This commit removes this label for now.

Signed-off-by: Umer Saleem <[email protected]>
@usaleem-ix usaleem-ix merged commit 3b44d5c into truenas/linux-6.12 Nov 21, 2024
6 checks passed
@bugclerk
Copy link

Not updating JIRA ticket https://ixsystems.atlassian.net/browse/NAS-130427 target versions as no JIRA version corresponds to this PR

@bugclerk
Copy link

This PR has been merged and conversations have been locked.
If you would like to discuss more about this issue please use our forums or raise a Jira ticket.

@truenas truenas locked as resolved and limited conversation to collaborators Nov 21, 2024
@usaleem-ix usaleem-ix deleted the NAS-130427 branch November 21, 2024 16:16
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants