You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The system is virtual machine host with block devices allocated as zvols, during guest startup several instances of the following trace are logged and I/O becomes unresponsive. There are also some earlier reports of hung tasks but I don't know if these are related. The configuration is a new build of our existing environment with both zfs (was 2.1.15) and the kernel (was 5.15.0) being updated. I did see #12775 but that thread has been quiet for 18+ months.
Describe how to reproduce the problem
Provision zvols for virtual machine (Xen) block storage and then try and boot the guests. For legacy reasons the zpool is initially created with zfs 0.8.3(?) and then zpool upgraded during the first boot. All zpool features are enabled. We are using a local build of the kernel in order to patch in AUFS support. What is also happening during the vm start is that the dom0 memory is ballooning down and it seems that the max_arc_size (un)tunable can end up being = system RAM. Is that a problem?
# cat zfs_arc_max
37038726144
# xl list
Name ID Mem VCPUs State Time(s)
Domain-0 0 64280 6 r----- 472.0
# xl mem-set 0 $((16 * 1024))
# xl list
Name ID Mem VCPUs State Time(s)
Domain-0 0 16384 6 r----- 530.7
# cat zfs_arc_max
17179869184
16 * 1024^3 = 17179869184
Include any warning/errors/backtraces from the system logs
# zpool get all
NAME PROPERTY VALUE SOURCE
diskconvm size 1.74T -
diskconvm capacity 4% -
diskconvm altroot - default
diskconvm health ONLINE -
diskconvm guid 12693268832595388421 -
diskconvm version - default
diskconvm bootfs - default
diskconvm delegation on default
diskconvm autoreplace off default
diskconvm cachefile - default
diskconvm failmode wait default
diskconvm listsnapshots on local
diskconvm autoexpand on local
diskconvm dedupratio 1.00x -
diskconvm free 1.66T -
diskconvm allocated 82.1G -
diskconvm readonly off -
diskconvm ashift 12 local
diskconvm comment - default
diskconvm expandsize - -
diskconvm freeing 0 -
diskconvm fragmentation 0% -
diskconvm leaked 0 -
diskconvm multihost off default
diskconvm checkpoint - -
diskconvm load_guid 16636724333282575354 -
diskconvm autotrim off default
diskconvm compatibility off default
diskconvm bcloneused 0 -
diskconvm bclonesaved 0 -
diskconvm bcloneratio 1.00x -
diskconvm feature@async_destroy enabled local
diskconvm feature@empty_bpobj active local
diskconvm feature@lz4_compress active local
diskconvm feature@multi_vdev_crash_dump enabled local
diskconvm feature@spacemap_histogram active local
diskconvm feature@enabled_txg active local
diskconvm feature@hole_birth active local
diskconvm feature@extensible_dataset active local
diskconvm feature@embedded_data active local
diskconvm feature@bookmarks enabled local
diskconvm feature@filesystem_limits enabled local
diskconvm feature@large_blocks enabled local
diskconvm feature@large_dnode enabled local
diskconvm feature@sha512 enabled local
diskconvm feature@skein enabled local
diskconvm feature@edonr enabled local
diskconvm feature@userobj_accounting active local
diskconvm feature@encryption enabled local
diskconvm feature@project_quota active local
diskconvm feature@device_removal enabled local
diskconvm feature@obsolete_counts enabled local
diskconvm feature@zpool_checkpoint enabled local
diskconvm feature@spacemap_v2 active local
diskconvm feature@allocation_classes enabled local
diskconvm feature@resilver_defer enabled local
diskconvm feature@bookmark_v2 enabled local
diskconvm feature@redaction_bookmarks enabled local
diskconvm feature@redacted_datasets enabled local
diskconvm feature@bookmark_written enabled local
diskconvm feature@log_spacemap active local
diskconvm feature@livelist active local
diskconvm feature@device_rebuild enabled local
diskconvm feature@zstd_compress enabled local
diskconvm feature@draid enabled local
diskconvm feature@zilsaxattr active local
diskconvm feature@head_errlog active local
diskconvm feature@blake3 enabled local
diskconvm feature@block_cloning enabled local
diskconvm feature@vdev_zaps_v2 active local
# zpool status
pool: diskconvm
state: ONLINE
config:
NAME STATE READ WRITE CKSUM
diskconvm ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
disk/by-partid/ata-SAMSUNG_MZ7L31T9HBLT-00A07_S6ESNE0T510485-part6 ONLINE 0 0 0
disk/by-partid/ata-SAMSUNG_MZ7L31T9HBLT-00A07_S6ESNE0T510513-part6 ONLINE 0 0 0
errors: No known data errors
# uname -a
Linux <hostname> 6.8.0-40-generic #40~22.04.1 SMP PREEMPT_DYNAMIC Mon Jul 22 18:19:19 UTC x86_64 x86_64 x86_64 GNU/Linux
# zfs version
zfs-2.2.4-5_g674a6de37a
zfs-kmod-2.2.4-5_g674a6de37a
The text was updated successfully, but these errors were encountered:
I don't see how can it be related to lz4 or CPUs getting stuck, but since you mentioned heavy ballooning I have to mention that ZFS is currently quite reluctant to react on kernel's memory pressure. See zfs_arc_shrinker_limit and #16197 . I wonder if system is over-provisioned that much, can it cause some swapping on a VM host side, that would look like random CPU freezes for the guest?
With some further investigation it seems trying to do anything 'clever' with the zfs_arc_{min,max} tunables results in an unhappy system. I'd guess that the common error in LZ4_uncompress_unknownOutputSize could be something to do with the result of a memory allocation not being checked for success but I've not looked in to that further. Leaving those tunables alone and the behaviour seems stable as the dom0 memory balloons. I've patched our zfs build to discourage tinkering:
diff --git a/module/zfs/arc.c b/module/zfs/arc.c
index 195364013..a7fb0c449 100644
--- a/module/zfs/arc.c
+++ b/module/zfs/arc.c
@@ -10662,10 +10662,10 @@ EXPORT_SYMBOL(arc_add_prune_callback);
EXPORT_SYMBOL(arc_remove_prune_callback);
ZFS_MODULE_PARAM_CALL(zfs_arc, zfs_arc_, min, param_set_arc_min,
- spl_param_get_u64, ZMOD_RW, "Minimum ARC size in bytes");
+ spl_param_get_u64, ZMOD_RD, "Minimum ARC size in bytes");
ZFS_MODULE_PARAM_CALL(zfs_arc, zfs_arc_, max, param_set_arc_max,
- spl_param_get_u64, ZMOD_RW, "Maximum ARC size in bytes");
+ spl_param_get_u64, ZMOD_RD, "Maximum ARC size in bytes");
ZFS_MODULE_PARAM(zfs_arc, zfs_arc_, meta_balance, UINT, ZMOD_RW,
"Balance between metadata and data on ghost hits.");
For the time being I'd say this is a 'user error' problem and the behaviour with defaults is good.
System information
Describe the problem you're observing
The system is virtual machine host with block devices allocated as zvols, during guest startup several instances of the following trace are logged and I/O becomes unresponsive. There are also some earlier reports of hung tasks but I don't know if these are related. The configuration is a new build of our existing environment with both zfs (was 2.1.15) and the kernel (was 5.15.0) being updated. I did see #12775 but that thread has been quiet for 18+ months.
Describe how to reproduce the problem
Provision zvols for virtual machine (Xen) block storage and then try and boot the guests. For legacy reasons the zpool is initially created with zfs 0.8.3(?) and then
zpool upgrade
d during the first boot. All zpool features are enabled. We are using a local build of the kernel in order to patch in AUFS support. What is also happening during the vm start is that the dom0 memory is ballooning down and it seems that the max_arc_size (un)tunable can end up being = system RAM. Is that a problem?16 * 1024^3 = 17179869184
Include any warning/errors/backtraces from the system logs
On the console:
Kernel log:
The text was updated successfully, but these errors were encountered: