-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kernel Panic after a couple of minutes while using APFS inside zvol on mirror-pool #90
Comments
Here another panic, just minutes after importing the pool: |
and one more after import:
|
I can no longer import the pool without panicing:
|
import -n panic: https://gist.github.com/JMoVS/20f3247c0dfa05b745cdfec83ac91b79 |
readonly import
https://gist.github.com/JMoVS/37427666f7aae402b0a443fea2ada817 |
Most likely the issue there, and setting larger stack might help. |
Probably the zvol replay code we fixed today. |
import problem fixed with rc2 - now monitoring situation regarding general use again |
import works again, now tried volblocksize 128k with rc2, new panic: https://gist.github.com/JMoVS/aa6483efbf0dbaeb76bcbee7f07467bf |
New panic with the custom pkg:
Beware: Still unable to 100% verify which kext version is really loaded - but eying the strings n the kext and considering the fact that caches were rebuilt, I hope it is indeed the right version |
|
more panic:
|
|
Function sizes:
|
this test had compression enabled whereas the previous ones didn't - don't know if that makes a difference |
14:05 https://www.lundman.net/OpenZFSonOsX-2.1.99-Big.Sur-11-arm64.pkg 71ee657ce334422f83310c0411973b2e no new messages appeared here:
|
|
|
|
In openzfsonosx#90 a user reported panics on an M1 with the message "Invalid kernel stack pointer (probable overflow)." In at least several of these a deep multi-arena allocation was in progress (several vmem_alloc/vmem_xalloc reaching all the way down through vmem_bucket_alloc, xnu_alloc_throttled, and ultimately to osif_malloc). The stack frames above the first vmem_alloc were also fairly large. This commit sets a dynamically sysctl-tunable threshold (8k default) for remaining stack size as reported by xnu. If we do not have more bytes than that when vmem_alloc() is called, then the actual allocation will be done in a separate worker thread which will start with a nearly empty stack that is much more likely to hold the various frames all the way through our code boundary with the kernel and beyond. The xnu / mach thread_call API (osfmk/kern/thread_call.h) is used to avoid circular dependencies with taskq, and the mechanism is per-arena costing a quick stack-depth check per vmem_alloc() but allowing for wildly varying stack depths above the first vmem_alloc() call. Vmem arenas now have two further kstats: the lowest amount of available stack space seen at a vmem_alloc() into it, and the number of times the allocation work has been done in a thread_call worker. * some spl_vmem.c functions are given inline hints These are small functions with no or very few automatic variables that were good candidates for clang/llvm's inlining heuristics before we switched to building the kext with -finline-hint-functions. * remove some (unrelated) unused variables which escaped previous commits, eliminating a couple compile-time warnings
In openzfsonosx#90 a user reported panics on an M1 with the message "Invalid kernel stack pointer (probable overflow)." In at least several of these a deep multi-arena allocation was in progress (several vmem_alloc/vmem_xalloc reaching all the way down through vmem_bucket_alloc, xnu_alloc_throttled, and ultimately to osif_malloc). The stack frames above the first vmem_alloc were also fairly large. This commit sets a dynamically sysctl-tunable threshold (8k default) for remaining stack size as reported by xnu. If we do not have more bytes than that when vmem_alloc() is called, then the actual allocation will be done in a separate worker thread which will start with a nearly empty stack that is much more likely to hold the various frames all the way through our code boundary with the kernel and beyond. The xnu / mach thread_call API (osfmk/kern/thread_call.h) is used to avoid circular dependencies with taskq, and the mechanism is per-arena costing a quick stack-depth check per vmem_alloc() but allowing for wildly varying stack depths above the first vmem_alloc() call. Vmem arenas now have two further kstats: the lowest amount of available stack space seen at a vmem_alloc() into it, and the number of times the allocation work has been done in a thread_call worker. * some spl_vmem.c functions are given inline hints These are small functions with no or very few automatic variables that were good candidates for clang/llvm's inlining heuristics before we switched to building the kext with -finline-hint-functions. * remove some (unrelated) unused variables which escaped previous commits, eliminating a couple compile-time warnings.
In #90 a user reported panics on an M1 with the message "Invalid kernel stack pointer (probable overflow)." In at least several of these a deep multi-arena allocation was in progress (several vmem_alloc/vmem_xalloc reaching all the way down through vmem_bucket_alloc, xnu_alloc_throttled, and ultimately to osif_malloc). The stack frames above the first vmem_alloc were also fairly large. This commit sets a dynamically sysctl-tunable threshold (8k default) for remaining stack size as reported by xnu. If we do not have more bytes than that when vmem_alloc() is called, then the actual allocation will be done in a separate worker thread which will start with a nearly empty stack that is much more likely to hold the various frames all the way through our code boundary with the kernel and beyond. The xnu / mach thread_call API (osfmk/kern/thread_call.h) is used to avoid circular dependencies with taskq, and the mechanism is per-arena costing a quick stack-depth check per vmem_alloc() but allowing for wildly varying stack depths above the first vmem_alloc() call. Vmem arenas now have two further kstats: the lowest amount of available stack space seen at a vmem_alloc() into it, and the number of times the allocation work has been done in a thread_call worker. * some spl_vmem.c functions are given inline hints These are small functions with no or very few automatic variables that were good candidates for clang/llvm's inlining heuristics before we switched to building the kext with -finline-hint-functions. * remove some (unrelated) unused variables which escaped previous commits, eliminating a couple compile-time warnings.
panic after setting 4000:
|
In #90 a user reported panics on an M1 with the message "Invalid kernel stack pointer (probable overflow)." In at least several of these a deep multi-arena allocation was in progress (several vmem_alloc/vmem_xalloc reaching all the way down through vmem_bucket_alloc, xnu_alloc_throttled, and ultimately to osif_malloc). The stack frames above the first vmem_alloc were also fairly large. This commit sets a dynamically sysctl-tunable threshold (8k default) for remaining stack size as reported by xnu. If we do not have more bytes than that when vmem_alloc() is called, then the actual allocation will be done in a separate worker thread which will start with a nearly empty stack that is much more likely to hold the various frames all the way through our code boundary with the kernel and beyond. The xnu / mach thread_call API (osfmk/kern/thread_call.h) is used to avoid circular dependencies with taskq, and the mechanism is per-arena costing a quick stack-depth check per vmem_alloc() but allowing for wildly varying stack depths above the first vmem_alloc() call. Vmem arenas now have two further kstats: the lowest amount of available stack space seen at a vmem_alloc() into it, and the number of times the allocation work has been done in a thread_call worker. * some spl_vmem.c functions are given inline hints These are small functions with no or very few automatic variables that were good candidates for clang/llvm's inlining heuristics before we switched to building the kext with -finline-hint-functions. * remove some (unrelated) unused variables which escaped previous commits, eliminating a couple compile-time warnings.
setup change: this time zvol inside ZFS encryption root, skein checksum, and compression set to zstd initilly, changed to lz4 briefly and then back again. but unencrypted apfs.
|
|
stack trace with newest code drop (rc3 + cherry pick)
|
|
|
Using the fp: to show stack sizes we have:
|
In openzfsonosx/openzfs#90 a user reported panics on an M1 with the message "Invalid kernel stack pointer (probable overflow)." In at least several of these a deep multi-arena allocation was in progress (several vmem_alloc/vmem_xalloc reaching all the way down through vmem_bucket_alloc, xnu_alloc_throttled, and ultimately to osif_malloc). The stack frames above the first vmem_alloc were also fairly large. This commit sets a dynamically sysctl-tunable threshold (8k default) for remaining stack size as reported by xnu. If we do not have more bytes than that when vmem_alloc() is called, then the actual allocation will be done in a separate worker thread which will start with a nearly empty stack that is much more likely to hold the various frames all the way through our code boundary with the kernel and beyond. The xnu / mach thread_call API (osfmk/kern/thread_call.h) is used to avoid circular dependencies with taskq, and the mechanism is per-arena costing a quick stack-depth check per vmem_alloc() but allowing for wildly varying stack depths above the first vmem_alloc() call. Vmem arenas now have two further kstats: the lowest amount of available stack space seen at a vmem_alloc() into it, and the number of times the allocation work has been done in a thread_call worker. * some spl_vmem.c functions are given inline hints These are small functions with no or very few automatic variables that were good candidates for clang/llvm's inlining heuristics before we switched to building the kext with -finline-hint-functions. * remove some (unrelated) unused variables which escaped previous commits, eliminating a couple compile-time warnings.
In openzfsonosx/openzfs#90 a user reported panics on an M1 with the message "Invalid kernel stack pointer (probable overflow)." In at least several of these a deep multi-arena allocation was in progress (several vmem_alloc/vmem_xalloc reaching all the way down through vmem_bucket_alloc, xnu_alloc_throttled, and ultimately to osif_malloc). The stack frames above the first vmem_alloc were also fairly large. This commit sets a dynamically sysctl-tunable threshold (8k default) for remaining stack size as reported by xnu. If we do not have more bytes than that when vmem_alloc() is called, then the actual allocation will be done in a separate worker thread which will start with a nearly empty stack that is much more likely to hold the various frames all the way through our code boundary with the kernel and beyond. The xnu / mach thread_call API (osfmk/kern/thread_call.h) is used to avoid circular dependencies with taskq, and the mechanism is per-arena costing a quick stack-depth check per vmem_alloc() but allowing for wildly varying stack depths above the first vmem_alloc() call. Vmem arenas now have two further kstats: the lowest amount of available stack space seen at a vmem_alloc() into it, and the number of times the allocation work has been done in a thread_call worker. * some spl_vmem.c functions are given inline hints These are small functions with no or very few automatic variables that were good candidates for clang/llvm's inlining heuristics before we switched to building the kext with -finline-hint-functions. * remove some (unrelated) unused variables which escaped previous commits, eliminating a couple compile-time warnings.
Signed-off-by: Jorgen Lundman <[email protected]> Upstream: configure.ac add cmd/os/macos/zsysctl Upstream: configure.ac changes Upstream: Makefile Upstream: Add macOS to headers Attempt to group most of the sweeping changes to headers in there, unless they fit better with an individual commit Signed-off-by: Jorgen Lundman <[email protected]> It appears FreeBSD did the same for zfs_ioctl_register_dataset_nolog() as they use it, so following suit for zfs_ioctl_register_pool() Upstream: macOS default mount is /Volumes Signed-off-by: Jorgen Lundman <[email protected]> Upstream: add IO calls for iokit Is this the best way? We could add ", func, private" to the existing IO, and either send by uio, or by func(private). Signed-off-by: Jorgen Lundman <[email protected]> Upstream: Allow cmd/zfs mount unmount of snapshots "zfs mount dataset@snapshot" as mounting of snapshot has to be done manually from userland in macOS. Add zfs_rollback_os() call to the rollback logic, so platforms can do specific requirements. macOS: need to kick Finder to update. Signed-off-by: Jorgen Lundman <[email protected]> upstream: hack - retry destroy until diskarb goes away A more portable solution is perhaps desired. Signed-off-by: Jorgen Lundman <[email protected]> macOS: Add macOS support Add all files required for the macOS port. Add new cmd/os/ for tools which are only expected to be used on macOS. This has support for all macOS version up to BigSur (11.x) Signed-off-by: Jorgen Lundman <[email protected]> macOS: Additional work macOS: employ advanced lock synchronisation macOS: handle additional lookups to delay waiting for mount macOS: handle rapid snapshot auto mounts Re-implement snapshots to always mount on "lookup()". This handles the deadlock when cwd is changed to the snapshot directory before mount. Then add some logic to attempt to not-mount in some situations, ie listing inside ".zfs/snapshot" directory. If a listing there is started, we ignore mount requests until it is complete - by storing the theadid and pid of the listing process. Any access below ".zfs/snapshot", will clear the ignore, ie, cause the mount to happen. macOS: userland unmount to disable auto_snapshot to avoid triggering a mount. Also make kernel remember 5 pid+tid to ignore. macOS: Do not truncate returned name in case correcting lookups macOS: also don't truncate further down macOS: fix leak in ldi handle_set_wce_iokit The parent device needs to be released if it was retained. macOS: add zvol_os_is_zvol() Or we are unable to create zpools inside zvols. Also cleanup zvolIO.cpp to be cstyle compliant and correcting obvious leaks. macOS: fix zfs_vnop_lookup() and linkid zfs_vnop_lookup() failed to "remember" the name used to lookup in the cache_lookup() success case, making us return the incorrect name in future zfs_vnop_getattr() - most notacibly in realpath(). linkid logic for Finder was not converting XNU inode to avoid the first 16 inodes. macOS: Return nametoolong when formD is lacking space Originally it was returning "Operation not supported" which isn't quite as useful to the user. Hopefully nothing checks that it must return ENOTSUP. macOS: change vnop_lookup to use cache. To give more room for formD formC to work with, we always allocate MAXPATHLEN, so we might as well use a kmem_cache. macOS: rmdir -p is far too eager macOS: dir link count doesn't count files. To be like upstream: drwxr-xr-x 2 root wheel 2 Jun 16 17:37 . touch a drwxr-xr-x 2 root wheel 3 Jun 16 17:37 . Where 2nd field is "number of directories" (2) and 5th field is "number of files and directories" (3) macOS: move sa_setup() to after zap_lookup() This is the order Linux calls them, so we should minimise differences. macOS: clean up handling of readonly with vfs_mount to follow what upstream does. macOS: parentID also needs to be mapped to XNU id macOS: add cmd/os/macos/zsysctl macOS: bring in cmd/os/macos/zsysctl and mount_zfs macOS: Makefile.am for mount_zfs [squash] macOS: squash macOS: strip selinux functions [squash] macOS: move getmntany into libzfs zvol.c change fix zfs.h macOS: run zsysctl if /etc/zfs/zsysctl.conf exists macOS: re-implement most of xattrs We had some difference betweem how ZOL and macOS behaved when going between xattr=sa and xattr=on datasets (send/recv) and fairly large duplicate code. Take ZOL zpl_xattr for the sa/on logic, change it to take "uio" for the data buffer. Also pass in "cr" as we can. The finderinfo logic stays in the vnop handlers, leaving the imported source very close to ZOL. Everything with xattrs, and decmpfs needs to be tested :) macOS: Add uio type for IOKit iomem support Add another UIO seg type, UIO_FUNCSPACE (UIO_SYSSPACE, UIO_USERSPACE) to handle the IOkit IOMemoryDescriptor type. When zvolIO needs to issue IO on volumes, it will setup a uio with iov_base as "iomem". As dmu_read_dnode_uio() (and write) filters down to zfs_uiomove(), spl-uio will handle the type to call registered IO function "zvolIO_strategy" instead of memcpy/bcopy calls. zvolIO_strategy() will call iomem->writeBytes (readBytes) as required. Model zvol_os.c calls zvol_os_read_zv() (and write) on ZOL sources again to ensure as little divergence as possible. Restore dmu.c to contain no macOS changes macOS: Fix abd leak, kmem_free correct size of abd_t ... for macOS and Freebsd, and improve macOS abd performance (#56) * Cleanup in macos abd_os.c to fix abd_t leak Fix a leak of abd_t that manifested mostly when using raidzN with at least as many columns as N (e.g. a four-disk raidz2 but not a three-disk raidz2). Sufficiently heavy raidz use would eventually run a system out of memory. The leak was introduced as a fix for a panic caused by calculating the wrong size of an abd_t at free time if the abd_t had been made using abd_get_offset_impl, since it carried along the unnecessary tails of large ABDs, leading to a mismatch between abd->abd_size and the original allocation size of the abd_t. This would feed kmem_free a bad size, which produces a heap corruption panic. The fix now carries only the necessary chunk pointers, leading to smaller abd_ts (especially those of abd_get_zeros() ABDs) and a performance gain from the reduction in copying and allocation activity. We now calculate the correct size for the abd_t at free time. This requires passing the number of bytes wanted in a scatter ABD to abd_get_offset_scatter(). Additionally: * Switch abd_cache arena to FIRSTFIT, which empirically improves perofrmance. * Make abd_chunk_cache more performant and debuggable. * Allocate the abd_zero_buf from abd_chunk_cache rather than the heap. * Don't try to reap non-existent qcaches in abd_cache arena. * KM_PUSHPAGE->KM_SLEEP when allocating chunks from their own arena - having fixed the abd leaks, return to using KMF_LITE, but leave a commented example of audit kmem debugging - having made this work, abd_orig_size is no longer needed as a way to track the size originally kmem_zalloc-ed for a scatter abd_t * Update FreeBSD abd_os.c with the fix, and let Linux build * Minimal change to fix FreeBSD's abd_get_offset_scatter() carrying too many chunks for the desired ABD size * A size argument is added to abd_get_offset_scatter() for FreeBSD and macOS, which is unused by Linux Signed-off-by: Jorgen Lundman <[email protected]> Upstream: ASM changes to support macOS Due to some differences in assembler work, macOS will have own copies. It would be desirable to change all assembler files to use asm_linkage.h and the macros inside for better portability. Signed-off-by: Jorgen Lundman <[email protected]> Upstream: module/zfs/spa.c Signed-off-by: Jorgen Lundman <[email protected]> Upstream: zfs-tests to support macOS Start to add macOS support to the zfs-tester environment, much more work is required still. Signed-off-by: Jorgen Lundman <[email protected]> Upstream: Changes to dprintf for macOS Prefer to always have the option to turn printfs on, even in RELEASE builds Signed-off-by: Jorgen Lundman <[email protected]> Upstream: macOS currently has own zfs_fsync Hoping to remove it eventually. Signed-off-by: Jorgen Lundman <[email protected]> Upstream: work around different API for sbuf_finish() Signed-off-by: Jorgen Lundman <[email protected]> Upstream: Why is linux even trying to look at etc/launchd Signed-off-by: Jorgen Lundman <[email protected]> Upstream: Missing empty taskq for userland Signed-off-by: Jorgen Lundman <[email protected]> Upstream: Add crypto errata1 for projectquota-less datasets There was a short Windows of 2.0 releases before rc4 where a crypto dataset would enable projectquota but fail to start it. Add a work-around for that issue. It is expected this commit will be remote in the near future. datasets with crypto will generate the proper local_mac, and will not be able to be imported with the broken 2.0 version. Fixed dataset should work on other platforms again. Signed-off-by: Jorgen Lundman <[email protected]> Upstream: import -d does not go through os/macos/ sources On macOS we need to prioritise /dev/disk over /dev/rdisk, but the common code makes no adjustment based on os preferred names. Potentially we should possibly call an os/ function to set the priority. Signed-off-by: Jorgen Lundman <[email protected]> Upstream: Test for NULL vd It seems we managed to get a deadman triggered during export? : 0xffffff8004ebda40 mach_kernel : _return_from_trap + 0xe0 : 0xffffff7f8942bbbf org.openzfsonosx.zfs : _vdev_deadman + 0x1f : 0xffffff7f8941149a org.openzfsonosx.zfs : _spa_deadman + 0xca : 0xffffff7f896a6246 org.openzfsonosx.zfs : _taskq_thread + 0x4a6 Signed-off-by: Jorgen Lundman <[email protected]> macOS: zdb inode mapping fix Upstream: realpath vdev directory paths This is already the behavior of zpool_find_import_scan, so do the same in make_leaf_vdev and zfs_strcmp_pathname. On macOS, /var is a symlink to private/var so when the user inputs an import path starting with /var, it is eventually converted automatically by zfs to the realpath starting with /private/var. This causes problems later finding vdevs as string comparisons between paths starting with /private/var and paths starting with /var fail, so make sure we are always using the vdev directory's realpath. Note that basenames are preserved so as not to compromise invariant symlinks. Signed-off-by: Jorgen Lundman <[email protected]> Upstream: dirname -> zfs_dirnamelen [squash] Forgot to actually change it to zfs_dirnamelen Signed-off-by: Jorgen Lundman <[email protected]> Upstream: set default macOS invariant disks path InvariantDisk (udev analogue for macOS) does not use /dev/disk in order to avoid subdirectories in /dev. Instead, the default path for the invariant symlinks is /var/run/disk/by-*, a root owned temporary directory. Signed-off-by: Jorgen Lundman <[email protected]> Upstream: stub-out zpool_read_label for APPLE It does not work for macOS platform, we have our own based on the old pre-lio style. Signed-off-by: Jorgen Lundman <[email protected]> Upstream: cppcheck fixes Upstream: fit in with recent man page changes Signed-off-by: Jorgen Lundman <[email protected]> Upstream: use correct libcurl.4.dylib name This fix isn't exactly great either. macOS: destroy snapshots squash renamed zpool_disable_volume_os macOS: rename zed zvol symlink script and variables macOS: handle 2-arg pthread_setname_np() By taking it out completely. macOS: Add snapshot and zvol events (uio.h fixes) It turns out that it could not see readv/writev because our macos/sys/uio.h was testing for the _LIBSPL_SYS_UIO_H as set by the top level libspl/include/sys/uio.h and therefor skipped over, if includes came in wrong order. Upstream: libzfs.h abi requires changes macOS: compile fixes after rebase macOS: changes to zfs_file after rebase macOS: compile fixes after rebase Signed-off-by: Jorgen Lundman <[email protected]> Upstream: Make zvol list be non-static Until we can agree on a solution that works for everyone. macOS: rename fallthrough to zfs_fallthrough macOS: Compile fixes for latest rebase macOS: Update arcstat and arc_summary Signed-off-by: Jorgen Lundman <[email protected]> macOS: Correct CPUID features lookup Account for surprise A, B, D, C order of registers. Add fixes to compile on ARM64, but functionally is missing. Signed-off-by: Jorgen Lundman <[email protected]> macOS: Set name after zfs_vnop_create() for NFS NFS would fail with open(..., O_EXCL) if we do not set the name after zfs_vnop_create(). nfsd handles O_EXCL differently, in that it always assumes VA_EXCLUSIVE is set (and will receive EEXIST). nfsd uses atime to store a pseudo-random unique ID, then call VNOP_CREATE(). If it succeeded in creating the file (this nfs client won over any other nfs clients) then the atime ID will match. nfsd will then call vnode_setattr() with the correct atime. If the name is not set by ZFS, it fails before calling vnode_setattr() with call stack: mac_vnode_check_open() vn_getpath_ext_with_mntlen() build_path_with_parent() Also correct fhtovp/vptofh to handle XNU remapped inodes. Remove atime checks for 48bit overflow from pre 64bit days. zfs_vnop_create() is also given a vattr struct, we should reply with the attr we handled - this saves XNU from calling fallback setattr(). Clean up zfs_vnop_getattr() to only set va_active for vattrs that was asked for, rather than blindly setting vattrs. Some xnu code checks that va_active == va_enabled, so if we set too many it can force XNU to call fallback. Actually handle atime in setattr()/getattr(), as it lives in zp->z_atime struct. Signed-off-by: Jorgen Lundman <[email protected]> macOS: fix default ADDEDTIME getattr Logic would return 0 date reply for entries without ADDEDTIME (which is only added after moving) Signed-off-by: Jorgen Lundman <[email protected]> macOS: Also set name in other vnop create calls. Unsure if NFS will bug on symlink and link, but we might as well call update. mknod is handled in the call to create. Signed-off-by: Jorgen Lundman <[email protected]> macOS: Add verbose kstat for RAW type The RAW kstat type as used by nodes like: kstat.zfs.misc.dbgmsg kstat.zfs.misc.dbufs can get really large, and there is no way to skip them when issuing a "sysctl -a" or similar request. This can slow down the process considerably, while it holds the locks. The RAW kstat type will now automatically add a "verbose" leaf as well, defaulting to "0" (do not display). To see the RAW information set the verbose value to 1. kstat.zfs.misc.dbufs.verbose: 0 kstat.zfs.misc.dbufs.dbufs: pool objset object level blkid offset dbsize [...] kstat.zfs.misc.dbufs.verbose: 1 Conveniently, this command works: sudo sysctl kstat.zfs.misc.dbgmsg.verbose=1 kstat.zfs.misc.dbgmsg.dbgmsg kstat.zfs.misc.dbgmsg.verbose=0 Signed-off-by: Jorgen Lundman <[email protected]> macOS: change wmsum to be a struct instead of being clever with pointers, and prepare for possible future expansion. Signed-off-by: Jorgen Lundman <[email protected]> macOS: Handle ZFS_MODULE_PARAMS as sysctl, take 2 Modelled on FreeBSD approach, made to work on macOS. Attempt to stay close to legacy macOS tunable names but some are now slightly different. Retire the macOS kstat versions, replace with ZFS_MODULE_IMPL. Signed-off-by: Jorgen Lundman <[email protected]> Upstream: Also copy out sysctl for ZFS_MODULE_VIRTUAL Upstream: take out Linux code in zfeature macOS: move ioctl_fd (back) into libzfs_core macOS: fix up clock_gettime for zfs-tester macOS: build fix for monterey macOS: bring in cmd/os/macos/zsysctl and mount_zfs macOS: also include all source files Split deep vmem_alloc()/vmem_xalloc() stacks In openzfsonosx/openzfs#90 a user reported panics on an M1 with the message "Invalid kernel stack pointer (probable overflow)." In at least several of these a deep multi-arena allocation was in progress (several vmem_alloc/vmem_xalloc reaching all the way down through vmem_bucket_alloc, xnu_alloc_throttled, and ultimately to osif_malloc). The stack frames above the first vmem_alloc were also fairly large. This commit sets a dynamically sysctl-tunable threshold (8k default) for remaining stack size as reported by xnu. If we do not have more bytes than that when vmem_alloc() is called, then the actual allocation will be done in a separate worker thread which will start with a nearly empty stack that is much more likely to hold the various frames all the way through our code boundary with the kernel and beyond. The xnu / mach thread_call API (osfmk/kern/thread_call.h) is used to avoid circular dependencies with taskq, and the mechanism is per-arena costing a quick stack-depth check per vmem_alloc() but allowing for wildly varying stack depths above the first vmem_alloc() call. Vmem arenas now have two further kstats: the lowest amount of available stack space seen at a vmem_alloc() into it, and the number of times the allocation work has been done in a thread_call worker. * some spl_vmem.c functions are given inline hints These are small functions with no or very few automatic variables that were good candidates for clang/llvm's inlining heuristics before we switched to building the kext with -finline-hint-functions. * remove some (unrelated) unused variables which escaped previous commits, eliminating a couple compile-time warnings. Sub-PAGE_SIZE ABDs instead of linear ABDs Previously, when making an ABD of size less than zfs_abd_chunk_size (4k by default), we would make a linear ABD, which would allocate memory out of the zio caches. The subpage chunks are in multiples of SPA_MINBLOCKSIZE, with each multiple (up to PAGE_SIZE minus SPA_MINBLOCKSIZE) having its own kmem_cache. These kmem_caches are parented to a subpage vmem_cache that takes 128k allocations from the PAGE_SIZE abd_chunk_cache. ABDs whose size falls within SPA_MINBLOCKSIZE bytes of PAGE_SIZE and all larger ABDs are served by the PAGE_SIZE ABD cache. Upstream: fix M1 --enable-debug build failure cannot #pragma diagnostic pop without a matching #pragma diagnostic push use -finline-hint-functions and not HAVE_LARGE_STACKS We appear to have a stack overflow problem. HAVE_LARGE_STACKS is default. It drives the decision about whether (HAVE) or not (!HAVE) to do txg sync context (frequent) and pool initialization (much less frequent) zio work in the same thread as the present __zio_execute, or whether it should be pushed to the head of the line of zios to be serviced asynchronously by another thread. Let's not define HAVE_LARGE_STACKS when building the kext for macOS. Clang's -finline-hint-functions inlines all threads explicitly hinted as inline "static inline foo(...) { }" or equipped with an __attribute__((always_inline)), but does not inline other functions, even if they are static. Clang & LLVM's inlining bumps the stack frame size to include automatic variables in the inlined functions, growing the stack even for invocations where the inlined function will not be reached. This has led to large stack frames in recursively called functions, notably dsl_scan_visitbp, which was dealt with by removing the always inline attribute. Globally enabling -finline-hint-functions reduces the number of inlined functions enough to make un-inlining such functions, while still inlining obvious wins (e.g. tiny frequently called from all over the source tree functions such as atomic_add_64_nv()). remove exponential moving average code Its utility for tracking relatively long term (~ seconds) movements of the calculated spl_free value has declined with our switch to "pure" and away from using macOS kernel variables which tracking momentary VM page demand and consequent changes in ARC. macOS allows for floating point to be used in kernel extensions, but this *may* have a cost on ARM in the form of larger stack frames, which is an acute problem. This code therefore should go away, rather than be put behind a compile-time flag. macOS: fix autoimport script Use absolute path to zsysctl. Ensure that the org.openzfsonosx.zpool-import service is loaded before kickstarting it. macOS: zfs_resume_fs can panic accessing NULL vp macOS: silence zvol_in_zvol and use SET_ERROR() while we are at it. macOS: zvol_replay must call zil_open Fix some SPL warnings * large frame size -> IOMalloc/IOFree * loss of precision in int -> short by way of bitmasking * __maybe_unused for the pattern: const retval r __maybe_unused = f(); ASSERT0(f); in non-debug builds * variables used possibly uninitialized macOS: Silence warning in uio A race to thread_call_enter1() could deadlock If multiple concurrent deep-stack allocation requests race to vmem_alloc(), the "winner" of the race could be cancelled by one of the other racers, and so the cancelled "winner" would never see the done flag, and would spend the rest of eternity stuck in a cv_wait() loop, hanging the thread that wanted memory. This commit uses a per-arena busy flag to block the later racers from reaching thread_call_enter1() until the race-winner's in-worker-thread memory allocation is complete. Additionally, the worker does less work updating stats, and only takes a mutex around the cv_signal(). The parent also checks for lost and duplicate cv_signals() error conditions. macOS: zvolRename needs to wait zvolRename needs to wait for IOKit to settle the changes, with a timeout. Rework the code to reuse the wait logic in a function. Remove old delay() hack. Most easily tested with zfs-tester run over; cli_root/zfs_create, zvol/zvol_cli, zvol/zvol_misc which results in testpool/vol.33979-renamed "is busy". macOS: Use vdev_disk_taskq when stack space limited. As we can trigger stack overflow in the IO path (especially with zvol) we detect if available space is below tunable spl_vmem_split_stack_below. In addition to this, remove small kmem_alloc of ldi_buf_t, vdev_buf_t and ldi_iokit_buf_t for each IO by attaching ldi_buf_t into zio_t. (See ZIO_OS_FIELDS) Track lowest stack remaining seen in vdev_disk as kstat It may be useful to know if we are being handed an especially deep stack in vdev_disk_io_start(), rather than just that we have been called with less than the threshold remaining. Additionally update variable names for clarity, notably reflecting that spl_stack_split_below is not just for vmem any more. Issue zvol_read/write async when needed macOS: Address 3 different ways to compress HFS Handle 2 xattr holding the compressed data stream, and detect when UF_COMPRESSED is being set, and if file size is zero we return zero. Makes 'tar' retry the compression. Upstream: test for -finline-hint-functions macOS: thread_call_allocate fix for older OS macOS: add support for sharenfs and sharesmb macOS: implement kcred for zfs_replay Some calls used by zfs_replay can not handle a NULL kcred and will panic. We use available functions to fetch the kernel cred, but it is somewhat of a hack, as we release it before using. Alternatively, we could hold reference in zfsvfs_setup() before calling zil_replay() and release after, with the hopes that kcred isn't used many other places. macOS: Fix sysctl macros and missing prototypes macOS: Add disable_trashes tunable zfs_disable_trashes Upstream: wrap in ifdef for SEEK_HOLE macOS: fix for earlier macOS also needs: - AC_MSG_FAILURE([*** clock_gettime is missing in libc and librt]) + AC_MSG_RESULT([*** clock_gettime is missing in libc and librt]) and remove -finline-hint-functions macOS: wrap mkostemp/s in OSX10.12 checks macOS: bzero the tqe in the allocation construction phase the tqent_next and tqent_prev fields are random, which causes problems with the IS_EMPTY() check, causing an assertion in zio_taskq_dispatch macOS: Clean up vmem_alloc_in_worker_thread for boot hang vmem_alloc_in_worker_thread() would use a local stack "cb" referenced by the thread, possibly after the stack was released. spl_lowest_alloc_stack_remaining would also trigger async calls when not neccessary, each time we got a new low, which was problematic during spl-kmem startup. macOS: M1 kext must contain x64 binary Otherwise notarize fails. macOS: minor cstyle fix commit before git bisect Upstream: fixing all the Makefiles for new build system Upstream: Changes and fixes to common files macOS: fixes and updates macOS: replace bcmp / bcopy / bfree macOS: cstyle fixes macOS: compile fixes Upstream: stop Linux from compiling macOS source files. It doesn't seem to work. Upstream: continued makefile fixes Upstream: Linux squats on mount_zfs, so hack around it
Signed-off-by: Jorgen Lundman <[email protected]> Upstream: configure.ac add cmd/os/macos/zsysctl Upstream: configure.ac changes Upstream: Makefile Upstream: Add macOS to headers Attempt to group most of the sweeping changes to headers in there, unless they fit better with an individual commit Signed-off-by: Jorgen Lundman <[email protected]> It appears FreeBSD did the same for zfs_ioctl_register_dataset_nolog() as they use it, so following suit for zfs_ioctl_register_pool() Upstream: macOS default mount is /Volumes Signed-off-by: Jorgen Lundman <[email protected]> Upstream: add IO calls for iokit Is this the best way? We could add ", func, private" to the existing IO, and either send by uio, or by func(private). Signed-off-by: Jorgen Lundman <[email protected]> Upstream: Allow cmd/zfs mount unmount of snapshots "zfs mount dataset@snapshot" as mounting of snapshot has to be done manually from userland in macOS. Add zfs_rollback_os() call to the rollback logic, so platforms can do specific requirements. macOS: need to kick Finder to update. Signed-off-by: Jorgen Lundman <[email protected]> upstream: hack - retry destroy until diskarb goes away A more portable solution is perhaps desired. Signed-off-by: Jorgen Lundman <[email protected]> macOS: Add macOS support Add all files required for the macOS port. Add new cmd/os/ for tools which are only expected to be used on macOS. This has support for all macOS version up to BigSur (11.x) Signed-off-by: Jorgen Lundman <[email protected]> macOS: Additional work macOS: employ advanced lock synchronisation macOS: handle additional lookups to delay waiting for mount macOS: handle rapid snapshot auto mounts Re-implement snapshots to always mount on "lookup()". This handles the deadlock when cwd is changed to the snapshot directory before mount. Then add some logic to attempt to not-mount in some situations, ie listing inside ".zfs/snapshot" directory. If a listing there is started, we ignore mount requests until it is complete - by storing the theadid and pid of the listing process. Any access below ".zfs/snapshot", will clear the ignore, ie, cause the mount to happen. macOS: userland unmount to disable auto_snapshot to avoid triggering a mount. Also make kernel remember 5 pid+tid to ignore. macOS: Do not truncate returned name in case correcting lookups macOS: also don't truncate further down macOS: fix leak in ldi handle_set_wce_iokit The parent device needs to be released if it was retained. macOS: add zvol_os_is_zvol() Or we are unable to create zpools inside zvols. Also cleanup zvolIO.cpp to be cstyle compliant and correcting obvious leaks. macOS: fix zfs_vnop_lookup() and linkid zfs_vnop_lookup() failed to "remember" the name used to lookup in the cache_lookup() success case, making us return the incorrect name in future zfs_vnop_getattr() - most notacibly in realpath(). linkid logic for Finder was not converting XNU inode to avoid the first 16 inodes. macOS: Return nametoolong when formD is lacking space Originally it was returning "Operation not supported" which isn't quite as useful to the user. Hopefully nothing checks that it must return ENOTSUP. macOS: change vnop_lookup to use cache. To give more room for formD formC to work with, we always allocate MAXPATHLEN, so we might as well use a kmem_cache. macOS: rmdir -p is far too eager macOS: dir link count doesn't count files. To be like upstream: drwxr-xr-x 2 root wheel 2 Jun 16 17:37 . touch a drwxr-xr-x 2 root wheel 3 Jun 16 17:37 . Where 2nd field is "number of directories" (2) and 5th field is "number of files and directories" (3) macOS: move sa_setup() to after zap_lookup() This is the order Linux calls them, so we should minimise differences. macOS: clean up handling of readonly with vfs_mount to follow what upstream does. macOS: parentID also needs to be mapped to XNU id macOS: add cmd/os/macos/zsysctl macOS: bring in cmd/os/macos/zsysctl and mount_zfs macOS: Makefile.am for mount_zfs [squash] macOS: squash macOS: strip selinux functions [squash] macOS: move getmntany into libzfs zvol.c change fix zfs.h macOS: run zsysctl if /etc/zfs/zsysctl.conf exists macOS: re-implement most of xattrs We had some difference betweem how ZOL and macOS behaved when going between xattr=sa and xattr=on datasets (send/recv) and fairly large duplicate code. Take ZOL zpl_xattr for the sa/on logic, change it to take "uio" for the data buffer. Also pass in "cr" as we can. The finderinfo logic stays in the vnop handlers, leaving the imported source very close to ZOL. Everything with xattrs, and decmpfs needs to be tested :) macOS: Add uio type for IOKit iomem support Add another UIO seg type, UIO_FUNCSPACE (UIO_SYSSPACE, UIO_USERSPACE) to handle the IOkit IOMemoryDescriptor type. When zvolIO needs to issue IO on volumes, it will setup a uio with iov_base as "iomem". As dmu_read_dnode_uio() (and write) filters down to zfs_uiomove(), spl-uio will handle the type to call registered IO function "zvolIO_strategy" instead of memcpy/bcopy calls. zvolIO_strategy() will call iomem->writeBytes (readBytes) as required. Model zvol_os.c calls zvol_os_read_zv() (and write) on ZOL sources again to ensure as little divergence as possible. Restore dmu.c to contain no macOS changes macOS: Fix abd leak, kmem_free correct size of abd_t ... for macOS and Freebsd, and improve macOS abd performance (#56) * Cleanup in macos abd_os.c to fix abd_t leak Fix a leak of abd_t that manifested mostly when using raidzN with at least as many columns as N (e.g. a four-disk raidz2 but not a three-disk raidz2). Sufficiently heavy raidz use would eventually run a system out of memory. The leak was introduced as a fix for a panic caused by calculating the wrong size of an abd_t at free time if the abd_t had been made using abd_get_offset_impl, since it carried along the unnecessary tails of large ABDs, leading to a mismatch between abd->abd_size and the original allocation size of the abd_t. This would feed kmem_free a bad size, which produces a heap corruption panic. The fix now carries only the necessary chunk pointers, leading to smaller abd_ts (especially those of abd_get_zeros() ABDs) and a performance gain from the reduction in copying and allocation activity. We now calculate the correct size for the abd_t at free time. This requires passing the number of bytes wanted in a scatter ABD to abd_get_offset_scatter(). Additionally: * Switch abd_cache arena to FIRSTFIT, which empirically improves perofrmance. * Make abd_chunk_cache more performant and debuggable. * Allocate the abd_zero_buf from abd_chunk_cache rather than the heap. * Don't try to reap non-existent qcaches in abd_cache arena. * KM_PUSHPAGE->KM_SLEEP when allocating chunks from their own arena - having fixed the abd leaks, return to using KMF_LITE, but leave a commented example of audit kmem debugging - having made this work, abd_orig_size is no longer needed as a way to track the size originally kmem_zalloc-ed for a scatter abd_t * Update FreeBSD abd_os.c with the fix, and let Linux build * Minimal change to fix FreeBSD's abd_get_offset_scatter() carrying too many chunks for the desired ABD size * A size argument is added to abd_get_offset_scatter() for FreeBSD and macOS, which is unused by Linux Signed-off-by: Jorgen Lundman <[email protected]> Upstream: ASM changes to support macOS Due to some differences in assembler work, macOS will have own copies. It would be desirable to change all assembler files to use asm_linkage.h and the macros inside for better portability. Signed-off-by: Jorgen Lundman <[email protected]> Upstream: module/zfs/spa.c Signed-off-by: Jorgen Lundman <[email protected]> Upstream: zfs-tests to support macOS Start to add macOS support to the zfs-tester environment, much more work is required still. Signed-off-by: Jorgen Lundman <[email protected]> Upstream: Changes to dprintf for macOS Prefer to always have the option to turn printfs on, even in RELEASE builds Signed-off-by: Jorgen Lundman <[email protected]> Upstream: macOS currently has own zfs_fsync Hoping to remove it eventually. Signed-off-by: Jorgen Lundman <[email protected]> Upstream: work around different API for sbuf_finish() Signed-off-by: Jorgen Lundman <[email protected]> Upstream: Why is linux even trying to look at etc/launchd Signed-off-by: Jorgen Lundman <[email protected]> Upstream: Missing empty taskq for userland Signed-off-by: Jorgen Lundman <[email protected]> Upstream: Add crypto errata1 for projectquota-less datasets There was a short Windows of 2.0 releases before rc4 where a crypto dataset would enable projectquota but fail to start it. Add a work-around for that issue. It is expected this commit will be remote in the near future. datasets with crypto will generate the proper local_mac, and will not be able to be imported with the broken 2.0 version. Fixed dataset should work on other platforms again. Signed-off-by: Jorgen Lundman <[email protected]> Upstream: import -d does not go through os/macos/ sources On macOS we need to prioritise /dev/disk over /dev/rdisk, but the common code makes no adjustment based on os preferred names. Potentially we should possibly call an os/ function to set the priority. Signed-off-by: Jorgen Lundman <[email protected]> Upstream: Test for NULL vd It seems we managed to get a deadman triggered during export? : 0xffffff8004ebda40 mach_kernel : _return_from_trap + 0xe0 : 0xffffff7f8942bbbf org.openzfsonosx.zfs : _vdev_deadman + 0x1f : 0xffffff7f8941149a org.openzfsonosx.zfs : _spa_deadman + 0xca : 0xffffff7f896a6246 org.openzfsonosx.zfs : _taskq_thread + 0x4a6 Signed-off-by: Jorgen Lundman <[email protected]> macOS: zdb inode mapping fix Upstream: realpath vdev directory paths This is already the behavior of zpool_find_import_scan, so do the same in make_leaf_vdev and zfs_strcmp_pathname. On macOS, /var is a symlink to private/var so when the user inputs an import path starting with /var, it is eventually converted automatically by zfs to the realpath starting with /private/var. This causes problems later finding vdevs as string comparisons between paths starting with /private/var and paths starting with /var fail, so make sure we are always using the vdev directory's realpath. Note that basenames are preserved so as not to compromise invariant symlinks. Signed-off-by: Jorgen Lundman <[email protected]> Upstream: dirname -> zfs_dirnamelen [squash] Forgot to actually change it to zfs_dirnamelen Signed-off-by: Jorgen Lundman <[email protected]> Upstream: set default macOS invariant disks path InvariantDisk (udev analogue for macOS) does not use /dev/disk in order to avoid subdirectories in /dev. Instead, the default path for the invariant symlinks is /var/run/disk/by-*, a root owned temporary directory. Signed-off-by: Jorgen Lundman <[email protected]> Upstream: stub-out zpool_read_label for APPLE It does not work for macOS platform, we have our own based on the old pre-lio style. Signed-off-by: Jorgen Lundman <[email protected]> Upstream: cppcheck fixes Upstream: fit in with recent man page changes Signed-off-by: Jorgen Lundman <[email protected]> Upstream: use correct libcurl.4.dylib name This fix isn't exactly great either. macOS: destroy snapshots squash renamed zpool_disable_volume_os macOS: rename zed zvol symlink script and variables macOS: handle 2-arg pthread_setname_np() By taking it out completely. macOS: Add snapshot and zvol events (uio.h fixes) It turns out that it could not see readv/writev because our macos/sys/uio.h was testing for the _LIBSPL_SYS_UIO_H as set by the top level libspl/include/sys/uio.h and therefor skipped over, if includes came in wrong order. Upstream: libzfs.h abi requires changes macOS: compile fixes after rebase macOS: changes to zfs_file after rebase macOS: compile fixes after rebase Signed-off-by: Jorgen Lundman <[email protected]> Upstream: Make zvol list be non-static Until we can agree on a solution that works for everyone. macOS: rename fallthrough to zfs_fallthrough macOS: Compile fixes for latest rebase macOS: Update arcstat and arc_summary Signed-off-by: Jorgen Lundman <[email protected]> macOS: Correct CPUID features lookup Account for surprise A, B, D, C order of registers. Add fixes to compile on ARM64, but functionally is missing. Signed-off-by: Jorgen Lundman <[email protected]> macOS: Set name after zfs_vnop_create() for NFS NFS would fail with open(..., O_EXCL) if we do not set the name after zfs_vnop_create(). nfsd handles O_EXCL differently, in that it always assumes VA_EXCLUSIVE is set (and will receive EEXIST). nfsd uses atime to store a pseudo-random unique ID, then call VNOP_CREATE(). If it succeeded in creating the file (this nfs client won over any other nfs clients) then the atime ID will match. nfsd will then call vnode_setattr() with the correct atime. If the name is not set by ZFS, it fails before calling vnode_setattr() with call stack: mac_vnode_check_open() vn_getpath_ext_with_mntlen() build_path_with_parent() Also correct fhtovp/vptofh to handle XNU remapped inodes. Remove atime checks for 48bit overflow from pre 64bit days. zfs_vnop_create() is also given a vattr struct, we should reply with the attr we handled - this saves XNU from calling fallback setattr(). Clean up zfs_vnop_getattr() to only set va_active for vattrs that was asked for, rather than blindly setting vattrs. Some xnu code checks that va_active == va_enabled, so if we set too many it can force XNU to call fallback. Actually handle atime in setattr()/getattr(), as it lives in zp->z_atime struct. Signed-off-by: Jorgen Lundman <[email protected]> macOS: fix default ADDEDTIME getattr Logic would return 0 date reply for entries without ADDEDTIME (which is only added after moving) Signed-off-by: Jorgen Lundman <[email protected]> macOS: Also set name in other vnop create calls. Unsure if NFS will bug on symlink and link, but we might as well call update. mknod is handled in the call to create. Signed-off-by: Jorgen Lundman <[email protected]> macOS: Add verbose kstat for RAW type The RAW kstat type as used by nodes like: kstat.zfs.misc.dbgmsg kstat.zfs.misc.dbufs can get really large, and there is no way to skip them when issuing a "sysctl -a" or similar request. This can slow down the process considerably, while it holds the locks. The RAW kstat type will now automatically add a "verbose" leaf as well, defaulting to "0" (do not display). To see the RAW information set the verbose value to 1. kstat.zfs.misc.dbufs.verbose: 0 kstat.zfs.misc.dbufs.dbufs: pool objset object level blkid offset dbsize [...] kstat.zfs.misc.dbufs.verbose: 1 Conveniently, this command works: sudo sysctl kstat.zfs.misc.dbgmsg.verbose=1 kstat.zfs.misc.dbgmsg.dbgmsg kstat.zfs.misc.dbgmsg.verbose=0 Signed-off-by: Jorgen Lundman <[email protected]> macOS: change wmsum to be a struct instead of being clever with pointers, and prepare for possible future expansion. Signed-off-by: Jorgen Lundman <[email protected]> macOS: Handle ZFS_MODULE_PARAMS as sysctl, take 2 Modelled on FreeBSD approach, made to work on macOS. Attempt to stay close to legacy macOS tunable names but some are now slightly different. Retire the macOS kstat versions, replace with ZFS_MODULE_IMPL. Signed-off-by: Jorgen Lundman <[email protected]> Upstream: Also copy out sysctl for ZFS_MODULE_VIRTUAL Upstream: take out Linux code in zfeature macOS: move ioctl_fd (back) into libzfs_core macOS: fix up clock_gettime for zfs-tester macOS: build fix for monterey macOS: bring in cmd/os/macos/zsysctl and mount_zfs macOS: also include all source files Split deep vmem_alloc()/vmem_xalloc() stacks In openzfsonosx/openzfs#90 a user reported panics on an M1 with the message "Invalid kernel stack pointer (probable overflow)." In at least several of these a deep multi-arena allocation was in progress (several vmem_alloc/vmem_xalloc reaching all the way down through vmem_bucket_alloc, xnu_alloc_throttled, and ultimately to osif_malloc). The stack frames above the first vmem_alloc were also fairly large. This commit sets a dynamically sysctl-tunable threshold (8k default) for remaining stack size as reported by xnu. If we do not have more bytes than that when vmem_alloc() is called, then the actual allocation will be done in a separate worker thread which will start with a nearly empty stack that is much more likely to hold the various frames all the way through our code boundary with the kernel and beyond. The xnu / mach thread_call API (osfmk/kern/thread_call.h) is used to avoid circular dependencies with taskq, and the mechanism is per-arena costing a quick stack-depth check per vmem_alloc() but allowing for wildly varying stack depths above the first vmem_alloc() call. Vmem arenas now have two further kstats: the lowest amount of available stack space seen at a vmem_alloc() into it, and the number of times the allocation work has been done in a thread_call worker. * some spl_vmem.c functions are given inline hints These are small functions with no or very few automatic variables that were good candidates for clang/llvm's inlining heuristics before we switched to building the kext with -finline-hint-functions. * remove some (unrelated) unused variables which escaped previous commits, eliminating a couple compile-time warnings. Sub-PAGE_SIZE ABDs instead of linear ABDs Previously, when making an ABD of size less than zfs_abd_chunk_size (4k by default), we would make a linear ABD, which would allocate memory out of the zio caches. The subpage chunks are in multiples of SPA_MINBLOCKSIZE, with each multiple (up to PAGE_SIZE minus SPA_MINBLOCKSIZE) having its own kmem_cache. These kmem_caches are parented to a subpage vmem_cache that takes 128k allocations from the PAGE_SIZE abd_chunk_cache. ABDs whose size falls within SPA_MINBLOCKSIZE bytes of PAGE_SIZE and all larger ABDs are served by the PAGE_SIZE ABD cache. Upstream: fix M1 --enable-debug build failure cannot #pragma diagnostic pop without a matching #pragma diagnostic push use -finline-hint-functions and not HAVE_LARGE_STACKS We appear to have a stack overflow problem. HAVE_LARGE_STACKS is default. It drives the decision about whether (HAVE) or not (!HAVE) to do txg sync context (frequent) and pool initialization (much less frequent) zio work in the same thread as the present __zio_execute, or whether it should be pushed to the head of the line of zios to be serviced asynchronously by another thread. Let's not define HAVE_LARGE_STACKS when building the kext for macOS. Clang's -finline-hint-functions inlines all threads explicitly hinted as inline "static inline foo(...) { }" or equipped with an __attribute__((always_inline)), but does not inline other functions, even if they are static. Clang & LLVM's inlining bumps the stack frame size to include automatic variables in the inlined functions, growing the stack even for invocations where the inlined function will not be reached. This has led to large stack frames in recursively called functions, notably dsl_scan_visitbp, which was dealt with by removing the always inline attribute. Globally enabling -finline-hint-functions reduces the number of inlined functions enough to make un-inlining such functions, while still inlining obvious wins (e.g. tiny frequently called from all over the source tree functions such as atomic_add_64_nv()). remove exponential moving average code Its utility for tracking relatively long term (~ seconds) movements of the calculated spl_free value has declined with our switch to "pure" and away from using macOS kernel variables which tracking momentary VM page demand and consequent changes in ARC. macOS allows for floating point to be used in kernel extensions, but this *may* have a cost on ARM in the form of larger stack frames, which is an acute problem. This code therefore should go away, rather than be put behind a compile-time flag. macOS: fix autoimport script Use absolute path to zsysctl. Ensure that the org.openzfsonosx.zpool-import service is loaded before kickstarting it. macOS: zfs_resume_fs can panic accessing NULL vp macOS: silence zvol_in_zvol and use SET_ERROR() while we are at it. macOS: zvol_replay must call zil_open Fix some SPL warnings * large frame size -> IOMalloc/IOFree * loss of precision in int -> short by way of bitmasking * __maybe_unused for the pattern: const retval r __maybe_unused = f(); ASSERT0(f); in non-debug builds * variables used possibly uninitialized macOS: Silence warning in uio A race to thread_call_enter1() could deadlock If multiple concurrent deep-stack allocation requests race to vmem_alloc(), the "winner" of the race could be cancelled by one of the other racers, and so the cancelled "winner" would never see the done flag, and would spend the rest of eternity stuck in a cv_wait() loop, hanging the thread that wanted memory. This commit uses a per-arena busy flag to block the later racers from reaching thread_call_enter1() until the race-winner's in-worker-thread memory allocation is complete. Additionally, the worker does less work updating stats, and only takes a mutex around the cv_signal(). The parent also checks for lost and duplicate cv_signals() error conditions. macOS: zvolRename needs to wait zvolRename needs to wait for IOKit to settle the changes, with a timeout. Rework the code to reuse the wait logic in a function. Remove old delay() hack. Most easily tested with zfs-tester run over; cli_root/zfs_create, zvol/zvol_cli, zvol/zvol_misc which results in testpool/vol.33979-renamed "is busy". macOS: Use vdev_disk_taskq when stack space limited. As we can trigger stack overflow in the IO path (especially with zvol) we detect if available space is below tunable spl_vmem_split_stack_below. In addition to this, remove small kmem_alloc of ldi_buf_t, vdev_buf_t and ldi_iokit_buf_t for each IO by attaching ldi_buf_t into zio_t. (See ZIO_OS_FIELDS) Track lowest stack remaining seen in vdev_disk as kstat It may be useful to know if we are being handed an especially deep stack in vdev_disk_io_start(), rather than just that we have been called with less than the threshold remaining. Additionally update variable names for clarity, notably reflecting that spl_stack_split_below is not just for vmem any more. Issue zvol_read/write async when needed macOS: Address 3 different ways to compress HFS Handle 2 xattr holding the compressed data stream, and detect when UF_COMPRESSED is being set, and if file size is zero we return zero. Makes 'tar' retry the compression. Upstream: test for -finline-hint-functions macOS: thread_call_allocate fix for older OS macOS: add support for sharenfs and sharesmb macOS: implement kcred for zfs_replay Some calls used by zfs_replay can not handle a NULL kcred and will panic. We use available functions to fetch the kernel cred, but it is somewhat of a hack, as we release it before using. Alternatively, we could hold reference in zfsvfs_setup() before calling zil_replay() and release after, with the hopes that kcred isn't used many other places. macOS: Fix sysctl macros and missing prototypes macOS: Add disable_trashes tunable zfs_disable_trashes Upstream: wrap in ifdef for SEEK_HOLE macOS: fix for earlier macOS also needs: - AC_MSG_FAILURE([*** clock_gettime is missing in libc and librt]) + AC_MSG_RESULT([*** clock_gettime is missing in libc and librt]) and remove -finline-hint-functions macOS: wrap mkostemp/s in OSX10.12 checks macOS: bzero the tqe in the allocation construction phase the tqent_next and tqent_prev fields are random, which causes problems with the IS_EMPTY() check, causing an assertion in zio_taskq_dispatch macOS: Clean up vmem_alloc_in_worker_thread for boot hang vmem_alloc_in_worker_thread() would use a local stack "cb" referenced by the thread, possibly after the stack was released. spl_lowest_alloc_stack_remaining would also trigger async calls when not neccessary, each time we got a new low, which was problematic during spl-kmem startup. macOS: M1 kext must contain x64 binary Otherwise notarize fails. macOS: minor cstyle fix commit before git bisect Upstream: fixing all the Makefiles for new build system Upstream: Changes and fixes to common files macOS: fixes and updates macOS: replace bcmp / bcopy / bfree macOS: cstyle fixes macOS: compile fixes Upstream: stop Linux from compiling macOS source files. It doesn't seem to work. Upstream: continued makefile fixes Upstream: Linux squats on mount_zfs, so hack around it
Signed-off-by: Jorgen Lundman <[email protected]> Upstream: configure.ac add cmd/os/macos/zsysctl Upstream: configure.ac changes Upstream: Makefile Upstream: Add macOS to headers Attempt to group most of the sweeping changes to headers in there, unless they fit better with an individual commit Signed-off-by: Jorgen Lundman <[email protected]> It appears FreeBSD did the same for zfs_ioctl_register_dataset_nolog() as they use it, so following suit for zfs_ioctl_register_pool() Upstream: macOS default mount is /Volumes Signed-off-by: Jorgen Lundman <[email protected]> Upstream: add IO calls for iokit Is this the best way? We could add ", func, private" to the existing IO, and either send by uio, or by func(private). Signed-off-by: Jorgen Lundman <[email protected]> Upstream: Allow cmd/zfs mount unmount of snapshots "zfs mount dataset@snapshot" as mounting of snapshot has to be done manually from userland in macOS. Add zfs_rollback_os() call to the rollback logic, so platforms can do specific requirements. macOS: need to kick Finder to update. Signed-off-by: Jorgen Lundman <[email protected]> upstream: hack - retry destroy until diskarb goes away A more portable solution is perhaps desired. Signed-off-by: Jorgen Lundman <[email protected]> macOS: Add macOS support Add all files required for the macOS port. Add new cmd/os/ for tools which are only expected to be used on macOS. This has support for all macOS version up to BigSur (11.x) Signed-off-by: Jorgen Lundman <[email protected]> macOS: Additional work macOS: employ advanced lock synchronisation macOS: handle additional lookups to delay waiting for mount macOS: handle rapid snapshot auto mounts Re-implement snapshots to always mount on "lookup()". This handles the deadlock when cwd is changed to the snapshot directory before mount. Then add some logic to attempt to not-mount in some situations, ie listing inside ".zfs/snapshot" directory. If a listing there is started, we ignore mount requests until it is complete - by storing the theadid and pid of the listing process. Any access below ".zfs/snapshot", will clear the ignore, ie, cause the mount to happen. macOS: userland unmount to disable auto_snapshot to avoid triggering a mount. Also make kernel remember 5 pid+tid to ignore. macOS: Do not truncate returned name in case correcting lookups macOS: also don't truncate further down macOS: fix leak in ldi handle_set_wce_iokit The parent device needs to be released if it was retained. macOS: add zvol_os_is_zvol() Or we are unable to create zpools inside zvols. Also cleanup zvolIO.cpp to be cstyle compliant and correcting obvious leaks. macOS: fix zfs_vnop_lookup() and linkid zfs_vnop_lookup() failed to "remember" the name used to lookup in the cache_lookup() success case, making us return the incorrect name in future zfs_vnop_getattr() - most notacibly in realpath(). linkid logic for Finder was not converting XNU inode to avoid the first 16 inodes. macOS: Return nametoolong when formD is lacking space Originally it was returning "Operation not supported" which isn't quite as useful to the user. Hopefully nothing checks that it must return ENOTSUP. macOS: change vnop_lookup to use cache. To give more room for formD formC to work with, we always allocate MAXPATHLEN, so we might as well use a kmem_cache. macOS: rmdir -p is far too eager macOS: dir link count doesn't count files. To be like upstream: drwxr-xr-x 2 root wheel 2 Jun 16 17:37 . touch a drwxr-xr-x 2 root wheel 3 Jun 16 17:37 . Where 2nd field is "number of directories" (2) and 5th field is "number of files and directories" (3) macOS: move sa_setup() to after zap_lookup() This is the order Linux calls them, so we should minimise differences. macOS: clean up handling of readonly with vfs_mount to follow what upstream does. macOS: parentID also needs to be mapped to XNU id macOS: add cmd/os/macos/zsysctl macOS: bring in cmd/os/macos/zsysctl and mount_zfs macOS: Makefile.am for mount_zfs [squash] macOS: squash macOS: strip selinux functions [squash] macOS: move getmntany into libzfs zvol.c change fix zfs.h macOS: run zsysctl if /etc/zfs/zsysctl.conf exists macOS: re-implement most of xattrs We had some difference betweem how ZOL and macOS behaved when going between xattr=sa and xattr=on datasets (send/recv) and fairly large duplicate code. Take ZOL zpl_xattr for the sa/on logic, change it to take "uio" for the data buffer. Also pass in "cr" as we can. The finderinfo logic stays in the vnop handlers, leaving the imported source very close to ZOL. Everything with xattrs, and decmpfs needs to be tested :) macOS: Add uio type for IOKit iomem support Add another UIO seg type, UIO_FUNCSPACE (UIO_SYSSPACE, UIO_USERSPACE) to handle the IOkit IOMemoryDescriptor type. When zvolIO needs to issue IO on volumes, it will setup a uio with iov_base as "iomem". As dmu_read_dnode_uio() (and write) filters down to zfs_uiomove(), spl-uio will handle the type to call registered IO function "zvolIO_strategy" instead of memcpy/bcopy calls. zvolIO_strategy() will call iomem->writeBytes (readBytes) as required. Model zvol_os.c calls zvol_os_read_zv() (and write) on ZOL sources again to ensure as little divergence as possible. Restore dmu.c to contain no macOS changes macOS: Fix abd leak, kmem_free correct size of abd_t ... for macOS and Freebsd, and improve macOS abd performance (#56) * Cleanup in macos abd_os.c to fix abd_t leak Fix a leak of abd_t that manifested mostly when using raidzN with at least as many columns as N (e.g. a four-disk raidz2 but not a three-disk raidz2). Sufficiently heavy raidz use would eventually run a system out of memory. The leak was introduced as a fix for a panic caused by calculating the wrong size of an abd_t at free time if the abd_t had been made using abd_get_offset_impl, since it carried along the unnecessary tails of large ABDs, leading to a mismatch between abd->abd_size and the original allocation size of the abd_t. This would feed kmem_free a bad size, which produces a heap corruption panic. The fix now carries only the necessary chunk pointers, leading to smaller abd_ts (especially those of abd_get_zeros() ABDs) and a performance gain from the reduction in copying and allocation activity. We now calculate the correct size for the abd_t at free time. This requires passing the number of bytes wanted in a scatter ABD to abd_get_offset_scatter(). Additionally: * Switch abd_cache arena to FIRSTFIT, which empirically improves perofrmance. * Make abd_chunk_cache more performant and debuggable. * Allocate the abd_zero_buf from abd_chunk_cache rather than the heap. * Don't try to reap non-existent qcaches in abd_cache arena. * KM_PUSHPAGE->KM_SLEEP when allocating chunks from their own arena - having fixed the abd leaks, return to using KMF_LITE, but leave a commented example of audit kmem debugging - having made this work, abd_orig_size is no longer needed as a way to track the size originally kmem_zalloc-ed for a scatter abd_t * Update FreeBSD abd_os.c with the fix, and let Linux build * Minimal change to fix FreeBSD's abd_get_offset_scatter() carrying too many chunks for the desired ABD size * A size argument is added to abd_get_offset_scatter() for FreeBSD and macOS, which is unused by Linux Signed-off-by: Jorgen Lundman <[email protected]> Upstream: ASM changes to support macOS Due to some differences in assembler work, macOS will have own copies. It would be desirable to change all assembler files to use asm_linkage.h and the macros inside for better portability. Signed-off-by: Jorgen Lundman <[email protected]> Upstream: module/zfs/spa.c Signed-off-by: Jorgen Lundman <[email protected]> Upstream: zfs-tests to support macOS Start to add macOS support to the zfs-tester environment, much more work is required still. Signed-off-by: Jorgen Lundman <[email protected]> Upstream: Changes to dprintf for macOS Prefer to always have the option to turn printfs on, even in RELEASE builds Signed-off-by: Jorgen Lundman <[email protected]> Upstream: macOS currently has own zfs_fsync Hoping to remove it eventually. Signed-off-by: Jorgen Lundman <[email protected]> Upstream: work around different API for sbuf_finish() Signed-off-by: Jorgen Lundman <[email protected]> Upstream: Why is linux even trying to look at etc/launchd Signed-off-by: Jorgen Lundman <[email protected]> Upstream: Missing empty taskq for userland Signed-off-by: Jorgen Lundman <[email protected]> Upstream: Add crypto errata1 for projectquota-less datasets There was a short Windows of 2.0 releases before rc4 where a crypto dataset would enable projectquota but fail to start it. Add a work-around for that issue. It is expected this commit will be remote in the near future. datasets with crypto will generate the proper local_mac, and will not be able to be imported with the broken 2.0 version. Fixed dataset should work on other platforms again. Signed-off-by: Jorgen Lundman <[email protected]> Upstream: import -d does not go through os/macos/ sources On macOS we need to prioritise /dev/disk over /dev/rdisk, but the common code makes no adjustment based on os preferred names. Potentially we should possibly call an os/ function to set the priority. Signed-off-by: Jorgen Lundman <[email protected]> Upstream: Test for NULL vd It seems we managed to get a deadman triggered during export? : 0xffffff8004ebda40 mach_kernel : _return_from_trap + 0xe0 : 0xffffff7f8942bbbf org.openzfsonosx.zfs : _vdev_deadman + 0x1f : 0xffffff7f8941149a org.openzfsonosx.zfs : _spa_deadman + 0xca : 0xffffff7f896a6246 org.openzfsonosx.zfs : _taskq_thread + 0x4a6 Signed-off-by: Jorgen Lundman <[email protected]> macOS: zdb inode mapping fix Upstream: realpath vdev directory paths This is already the behavior of zpool_find_import_scan, so do the same in make_leaf_vdev and zfs_strcmp_pathname. On macOS, /var is a symlink to private/var so when the user inputs an import path starting with /var, it is eventually converted automatically by zfs to the realpath starting with /private/var. This causes problems later finding vdevs as string comparisons between paths starting with /private/var and paths starting with /var fail, so make sure we are always using the vdev directory's realpath. Note that basenames are preserved so as not to compromise invariant symlinks. Signed-off-by: Jorgen Lundman <[email protected]> Upstream: dirname -> zfs_dirnamelen [squash] Forgot to actually change it to zfs_dirnamelen Signed-off-by: Jorgen Lundman <[email protected]> Upstream: set default macOS invariant disks path InvariantDisk (udev analogue for macOS) does not use /dev/disk in order to avoid subdirectories in /dev. Instead, the default path for the invariant symlinks is /var/run/disk/by-*, a root owned temporary directory. Signed-off-by: Jorgen Lundman <[email protected]> Upstream: stub-out zpool_read_label for APPLE It does not work for macOS platform, we have our own based on the old pre-lio style. Signed-off-by: Jorgen Lundman <[email protected]> Upstream: cppcheck fixes Upstream: fit in with recent man page changes Signed-off-by: Jorgen Lundman <[email protected]> Upstream: use correct libcurl.4.dylib name This fix isn't exactly great either. macOS: destroy snapshots squash renamed zpool_disable_volume_os macOS: rename zed zvol symlink script and variables macOS: handle 2-arg pthread_setname_np() By taking it out completely. macOS: Add snapshot and zvol events (uio.h fixes) It turns out that it could not see readv/writev because our macos/sys/uio.h was testing for the _LIBSPL_SYS_UIO_H as set by the top level libspl/include/sys/uio.h and therefor skipped over, if includes came in wrong order. Upstream: libzfs.h abi requires changes macOS: compile fixes after rebase macOS: changes to zfs_file after rebase macOS: compile fixes after rebase Signed-off-by: Jorgen Lundman <[email protected]> Upstream: Make zvol list be non-static Until we can agree on a solution that works for everyone. macOS: rename fallthrough to zfs_fallthrough macOS: Compile fixes for latest rebase macOS: Update arcstat and arc_summary Signed-off-by: Jorgen Lundman <[email protected]> macOS: Correct CPUID features lookup Account for surprise A, B, D, C order of registers. Add fixes to compile on ARM64, but functionally is missing. Signed-off-by: Jorgen Lundman <[email protected]> macOS: Set name after zfs_vnop_create() for NFS NFS would fail with open(..., O_EXCL) if we do not set the name after zfs_vnop_create(). nfsd handles O_EXCL differently, in that it always assumes VA_EXCLUSIVE is set (and will receive EEXIST). nfsd uses atime to store a pseudo-random unique ID, then call VNOP_CREATE(). If it succeeded in creating the file (this nfs client won over any other nfs clients) then the atime ID will match. nfsd will then call vnode_setattr() with the correct atime. If the name is not set by ZFS, it fails before calling vnode_setattr() with call stack: mac_vnode_check_open() vn_getpath_ext_with_mntlen() build_path_with_parent() Also correct fhtovp/vptofh to handle XNU remapped inodes. Remove atime checks for 48bit overflow from pre 64bit days. zfs_vnop_create() is also given a vattr struct, we should reply with the attr we handled - this saves XNU from calling fallback setattr(). Clean up zfs_vnop_getattr() to only set va_active for vattrs that was asked for, rather than blindly setting vattrs. Some xnu code checks that va_active == va_enabled, so if we set too many it can force XNU to call fallback. Actually handle atime in setattr()/getattr(), as it lives in zp->z_atime struct. Signed-off-by: Jorgen Lundman <[email protected]> macOS: fix default ADDEDTIME getattr Logic would return 0 date reply for entries without ADDEDTIME (which is only added after moving) Signed-off-by: Jorgen Lundman <[email protected]> macOS: Also set name in other vnop create calls. Unsure if NFS will bug on symlink and link, but we might as well call update. mknod is handled in the call to create. Signed-off-by: Jorgen Lundman <[email protected]> macOS: Add verbose kstat for RAW type The RAW kstat type as used by nodes like: kstat.zfs.misc.dbgmsg kstat.zfs.misc.dbufs can get really large, and there is no way to skip them when issuing a "sysctl -a" or similar request. This can slow down the process considerably, while it holds the locks. The RAW kstat type will now automatically add a "verbose" leaf as well, defaulting to "0" (do not display). To see the RAW information set the verbose value to 1. kstat.zfs.misc.dbufs.verbose: 0 kstat.zfs.misc.dbufs.dbufs: pool objset object level blkid offset dbsize [...] kstat.zfs.misc.dbufs.verbose: 1 Conveniently, this command works: sudo sysctl kstat.zfs.misc.dbgmsg.verbose=1 kstat.zfs.misc.dbgmsg.dbgmsg kstat.zfs.misc.dbgmsg.verbose=0 Signed-off-by: Jorgen Lundman <[email protected]> macOS: change wmsum to be a struct instead of being clever with pointers, and prepare for possible future expansion. Signed-off-by: Jorgen Lundman <[email protected]> macOS: Handle ZFS_MODULE_PARAMS as sysctl, take 2 Modelled on FreeBSD approach, made to work on macOS. Attempt to stay close to legacy macOS tunable names but some are now slightly different. Retire the macOS kstat versions, replace with ZFS_MODULE_IMPL. Signed-off-by: Jorgen Lundman <[email protected]> Upstream: Also copy out sysctl for ZFS_MODULE_VIRTUAL Upstream: take out Linux code in zfeature macOS: move ioctl_fd (back) into libzfs_core macOS: fix up clock_gettime for zfs-tester macOS: build fix for monterey macOS: bring in cmd/os/macos/zsysctl and mount_zfs macOS: also include all source files Split deep vmem_alloc()/vmem_xalloc() stacks In openzfsonosx/openzfs#90 a user reported panics on an M1 with the message "Invalid kernel stack pointer (probable overflow)." In at least several of these a deep multi-arena allocation was in progress (several vmem_alloc/vmem_xalloc reaching all the way down through vmem_bucket_alloc, xnu_alloc_throttled, and ultimately to osif_malloc). The stack frames above the first vmem_alloc were also fairly large. This commit sets a dynamically sysctl-tunable threshold (8k default) for remaining stack size as reported by xnu. If we do not have more bytes than that when vmem_alloc() is called, then the actual allocation will be done in a separate worker thread which will start with a nearly empty stack that is much more likely to hold the various frames all the way through our code boundary with the kernel and beyond. The xnu / mach thread_call API (osfmk/kern/thread_call.h) is used to avoid circular dependencies with taskq, and the mechanism is per-arena costing a quick stack-depth check per vmem_alloc() but allowing for wildly varying stack depths above the first vmem_alloc() call. Vmem arenas now have two further kstats: the lowest amount of available stack space seen at a vmem_alloc() into it, and the number of times the allocation work has been done in a thread_call worker. * some spl_vmem.c functions are given inline hints These are small functions with no or very few automatic variables that were good candidates for clang/llvm's inlining heuristics before we switched to building the kext with -finline-hint-functions. * remove some (unrelated) unused variables which escaped previous commits, eliminating a couple compile-time warnings. Sub-PAGE_SIZE ABDs instead of linear ABDs Previously, when making an ABD of size less than zfs_abd_chunk_size (4k by default), we would make a linear ABD, which would allocate memory out of the zio caches. The subpage chunks are in multiples of SPA_MINBLOCKSIZE, with each multiple (up to PAGE_SIZE minus SPA_MINBLOCKSIZE) having its own kmem_cache. These kmem_caches are parented to a subpage vmem_cache that takes 128k allocations from the PAGE_SIZE abd_chunk_cache. ABDs whose size falls within SPA_MINBLOCKSIZE bytes of PAGE_SIZE and all larger ABDs are served by the PAGE_SIZE ABD cache. Upstream: fix M1 --enable-debug build failure cannot #pragma diagnostic pop without a matching #pragma diagnostic push use -finline-hint-functions and not HAVE_LARGE_STACKS We appear to have a stack overflow problem. HAVE_LARGE_STACKS is default. It drives the decision about whether (HAVE) or not (!HAVE) to do txg sync context (frequent) and pool initialization (much less frequent) zio work in the same thread as the present __zio_execute, or whether it should be pushed to the head of the line of zios to be serviced asynchronously by another thread. Let's not define HAVE_LARGE_STACKS when building the kext for macOS. Clang's -finline-hint-functions inlines all threads explicitly hinted as inline "static inline foo(...) { }" or equipped with an __attribute__((always_inline)), but does not inline other functions, even if they are static. Clang & LLVM's inlining bumps the stack frame size to include automatic variables in the inlined functions, growing the stack even for invocations where the inlined function will not be reached. This has led to large stack frames in recursively called functions, notably dsl_scan_visitbp, which was dealt with by removing the always inline attribute. Globally enabling -finline-hint-functions reduces the number of inlined functions enough to make un-inlining such functions, while still inlining obvious wins (e.g. tiny frequently called from all over the source tree functions such as atomic_add_64_nv()). remove exponential moving average code Its utility for tracking relatively long term (~ seconds) movements of the calculated spl_free value has declined with our switch to "pure" and away from using macOS kernel variables which tracking momentary VM page demand and consequent changes in ARC. macOS allows for floating point to be used in kernel extensions, but this *may* have a cost on ARM in the form of larger stack frames, which is an acute problem. This code therefore should go away, rather than be put behind a compile-time flag. macOS: fix autoimport script Use absolute path to zsysctl. Ensure that the org.openzfsonosx.zpool-import service is loaded before kickstarting it. macOS: zfs_resume_fs can panic accessing NULL vp macOS: silence zvol_in_zvol and use SET_ERROR() while we are at it. macOS: zvol_replay must call zil_open Fix some SPL warnings * large frame size -> IOMalloc/IOFree * loss of precision in int -> short by way of bitmasking * __maybe_unused for the pattern: const retval r __maybe_unused = f(); ASSERT0(f); in non-debug builds * variables used possibly uninitialized macOS: Silence warning in uio A race to thread_call_enter1() could deadlock If multiple concurrent deep-stack allocation requests race to vmem_alloc(), the "winner" of the race could be cancelled by one of the other racers, and so the cancelled "winner" would never see the done flag, and would spend the rest of eternity stuck in a cv_wait() loop, hanging the thread that wanted memory. This commit uses a per-arena busy flag to block the later racers from reaching thread_call_enter1() until the race-winner's in-worker-thread memory allocation is complete. Additionally, the worker does less work updating stats, and only takes a mutex around the cv_signal(). The parent also checks for lost and duplicate cv_signals() error conditions. macOS: zvolRename needs to wait zvolRename needs to wait for IOKit to settle the changes, with a timeout. Rework the code to reuse the wait logic in a function. Remove old delay() hack. Most easily tested with zfs-tester run over; cli_root/zfs_create, zvol/zvol_cli, zvol/zvol_misc which results in testpool/vol.33979-renamed "is busy". macOS: Use vdev_disk_taskq when stack space limited. As we can trigger stack overflow in the IO path (especially with zvol) we detect if available space is below tunable spl_vmem_split_stack_below. In addition to this, remove small kmem_alloc of ldi_buf_t, vdev_buf_t and ldi_iokit_buf_t for each IO by attaching ldi_buf_t into zio_t. (See ZIO_OS_FIELDS) Track lowest stack remaining seen in vdev_disk as kstat It may be useful to know if we are being handed an especially deep stack in vdev_disk_io_start(), rather than just that we have been called with less than the threshold remaining. Additionally update variable names for clarity, notably reflecting that spl_stack_split_below is not just for vmem any more. Issue zvol_read/write async when needed macOS: Address 3 different ways to compress HFS Handle 2 xattr holding the compressed data stream, and detect when UF_COMPRESSED is being set, and if file size is zero we return zero. Makes 'tar' retry the compression. Upstream: test for -finline-hint-functions macOS: thread_call_allocate fix for older OS macOS: add support for sharenfs and sharesmb macOS: implement kcred for zfs_replay Some calls used by zfs_replay can not handle a NULL kcred and will panic. We use available functions to fetch the kernel cred, but it is somewhat of a hack, as we release it before using. Alternatively, we could hold reference in zfsvfs_setup() before calling zil_replay() and release after, with the hopes that kcred isn't used many other places. macOS: Fix sysctl macros and missing prototypes macOS: Add disable_trashes tunable zfs_disable_trashes Upstream: wrap in ifdef for SEEK_HOLE macOS: fix for earlier macOS also needs: - AC_MSG_FAILURE([*** clock_gettime is missing in libc and librt]) + AC_MSG_RESULT([*** clock_gettime is missing in libc and librt]) and remove -finline-hint-functions macOS: wrap mkostemp/s in OSX10.12 checks macOS: bzero the tqe in the allocation construction phase the tqent_next and tqent_prev fields are random, which causes problems with the IS_EMPTY() check, causing an assertion in zio_taskq_dispatch macOS: Clean up vmem_alloc_in_worker_thread for boot hang vmem_alloc_in_worker_thread() would use a local stack "cb" referenced by the thread, possibly after the stack was released. spl_lowest_alloc_stack_remaining would also trigger async calls when not neccessary, each time we got a new low, which was problematic during spl-kmem startup. macOS: M1 kext must contain x64 binary Otherwise notarize fails. macOS: minor cstyle fix commit before git bisect Upstream: fixing all the Makefiles for new build system Upstream: Changes and fixes to common files macOS: fixes and updates macOS: replace bcmp / bcopy / bfree macOS: cstyle fixes macOS: compile fixes Upstream: stop Linux from compiling macOS source files. It doesn't seem to work. Upstream: continued makefile fixes Upstream: Linux squats on mount_zfs, so hack around it macOS: ZFS_ENTER changes, blake3 tunable Huh where did zvol_wait go FIxes for cstyle, make install etc Remove all _impl_get() functions apparently we do MODULE_PARAM_VIRTUAL some other way now Bring back zfs_vdev_raidz_impl_get() I guess only one vdev_raidz_impl_get Signed-off-by: Jorgen Lundman <[email protected]> ABI changes Signed-off-by: Jorgen Lundman <[email protected]> Mac: Build without librt Signed-off-by: Andrew Innes <[email protected]> Workflow to build OpenZFS on mac Signed-off-by: Andrew Innes <[email protected]> spa_activate_os() - just one will be sufficient Initialize all members of kcf_create_mech_entry() Silence SPL startup debug Signed-off-by: Jorgen Lundman <[email protected]> Add darwin to default.cfg.in Signed-off-by: Jorgen Lundman <[email protected]> enum ZFS_PROP and zfs_prop_register must match. Signed-off-by: Jorgen Lundman <[email protected]> Attempting to rw_destroy() uninitialized rwlock Move the rw_init() higher, so all "goto error;" will work. Make sure kcf_mech_tabs is set to zero. Definitely unsure what is going on here. The address for the tabs keep moving, as in: class = 2; printf("kcf_mech_tabs_tab[class].met_tab is %p and class is %d\n", kcf_mech_tabs_tab[class].met_tab, class); printf("kcf_mech_tabs_tab[ 2 ].met_tab is %p and class is %d\n", kcf_mech_tabs_tab[2].met_tab, class); kcf_mech_tabs_tab[class].met_tab is 0xffffff7f8b933d90 and class is 2 kcf_mech_tabs_tab[ 2 ].met_tab is 0xffffff7f8b930d90 and class is 2 ..................................................^ which is most peculiar, so the memory that the 3 tabs set up, is full of garbage, no empty slot is found and it fails to add the ciphers, digests and macs. If we set a struct to nothing, or to = { 0 }; like: static kcf_mech_entry_t kcf_digest_mechs_tab[KCF_MAXDIGEST] = { 0 }; The original kcf_mech_tabs_tab is all zero, but when we start to use it and it shifts by 0x30000 it is garbage. This is true even if we set it to: = {{{0}, 0, 0, {0}}} But for some bizarre reason, if we set the first element to something not-zero, like: = { { "NOTUSED", 0, 0, {0}}, { {0}, 0, 0, {0}} } Then it works, yes the first entry is busy, but [1] and onwards is finally set to zero, and the pointer does not slide by 0x30000. I wish I knew why. Signed-off-by: Jorgen Lundman <[email protected]> Scripts no longer needed now everything is built in root zfs_get_data() can deadlock Instead of passing ZGET_FLAG_ASYNC from zfs_get_data() lets try calling just vnode_get() instead, as it will not msleep(). If we still hang, we need to #ifdef zfs_get_data(). Signed-off-by: Jorgen Lundman <[email protected]> Remove most of the warnings. Signed-off-by: Jorgen Lundman <[email protected]> Remove AARCH64 from M1 compiles. Sadly, no standard NEON on M1, at least, not yet. Signed-off-by: Jorgen Lundman <[email protected]> fixes
System information
Describe the problem you're observing
Kernel Panic after around 15 minutes of making time machine backup to APFS inside zvol
Describe how to reproduce the problem
Start Time Machine backup to APFS inside zvol, wait
Include any warning/errors/backtraces from the system logs
https://gist.github.com/JMoVS/21e46b2813bf2196fe1876bdccc6f029 (copy pasted version, here the direct file: https://gist.github.com/JMoVS/31ab6dc11516b200799258d5fd6a6905)
https://gist.github.com/JMoVS/1f0231732ea493c816b67ad40a80ad95
https://gist.github.com/JMoVS/33ac87f3c7faa127d6e026d24e96b69d
The text was updated successfully, but these errors were encountered: