fstrim on a xfs lv backed by vdo renders my computer useless for a very long time #64

beertje44 · 2023-05-08T02:30:48Z

I've expercienced this twice on a ALmaLinux box with a single PV and VG on a Samsung PM9A3 3.8TB Nvme SSD: when fstrim runs the system doesn't crash but the load becomes unworkable (in excess of 180!). It seems this is related to my vdo backed (compression and deduplication) main xfs LV. Last time I somehow managed to kill fstrim and everything returned to normal shortly after that.

Apart from the excessive load, dmesg tells me:

135.790363] INFO: task kworker/11:3:21829 blocked for more than 122 seconds. [79135.790368] Tainted: P OE 6.2.12-1.el9_1.******x86_64 #1 [79135.790371] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [79135.790373] task:kworker/11:3 state:D stack:0 pid:21829 ppid:2 flags:0x00004000 [79135.790380] Workqueue: xfs-inodegc/dm-2 xfs_inodegc_worker [xfs] [79135.790575] Call Trace: [79135.790577] <TASK> [79135.790581] __schedule+0x1fb/0x550 [79135.790589] schedule+0x5d/0xd0 [79135.790595] schedule_timeout+0x148/0x160 [79135.790602] ___down_common+0x111/0x170 [79135.790612] ? down+0x1a/0x60 [79135.790621] __down_common+0x1e/0xc0 [79135.790647] down+0x43/0x60 [79135.790659] xfs_buf_lock+0x2d/0xe0 [xfs] [79135.790857] xfs_buf_find_lock+0x45/0xf0 [xfs] [79135.791039] xfs_buf_lookup.constprop.0+0xe4/0x170 [xfs] [79135.791222] xfs_buf_get_map+0xc1/0x3a0 [xfs] [79135.791407] xfs_buf_read_map+0x54/0x2a0 [xfs] [79135.791593] ? xfs_read_agf+0x89/0x130 [xfs] [79135.791822] xfs_trans_read_buf_map+0x115/0x300 [xfs] [79135.792068] ? xfs_read_agf+0x89/0x130 [xfs] [79135.792253] xfs_read_agf+0x89/0x130 [xfs] [79135.792427] xfs_alloc_read_agf+0x50/0x210 [xfs] [79135.792602] xfs_alloc_fix_freelist+0x3dd/0x510 [xfs] [79135.792801] ? preempt_count_add+0x70/0xa0 [79135.792809] ? _raw_spin_lock+0x13/0x40 [79135.792816] ? _raw_spin_unlock+0x15/0x30 [79135.792823] ? xfs_inode_to_log_dinode+0x210/0x410 [xfs] [79135.793039] ? xfs_efi_item_format+0x72/0xd0 [xfs] [79135.793228] xfs_free_extent_fix_freelist+0x61/0xa0 [xfs] [79135.793409] __xfs_free_extent+0x72/0x1c0 [xfs] [79135.793584] xfs_trans_free_extent+0x45/0x100 [xfs] [79135.793809] xfs_extent_free_finish_item+0x69/0xa0 [xfs] [79135.793998] xfs_defer_finish_noroll+0x187/0x530 [xfs] [79135.794220] xfs_defer_finish+0x11/0x70 [xfs] [79135.794398] xfs_itruncate_extents_flags+0xca/0x250 [xfs] [79135.794608] xfs_inactive_truncate+0xab/0xe0 [xfs] [79135.794800] xfs_inactive+0x154/0x170 [xfs] [79135.794970] xfs_inodegc_worker+0xa3/0x170 [xfs] [79135.795156] process_one_work+0x1e5/0x3f0 [79135.795165] ? __pfx_worker_thread+0x10/0x10 [79135.795172] worker_thread+0x50/0x3a0 [79135.795179] ? __pfx_worker_thread+0x10/0x10 [79135.795185] kthread+0xe8/0x110 [79135.795189] ? __pfx_kthread+0x10/0x10 [79135.795194] ret_from_fork+0x2c/0x50 [79135.795205] </TASK>

vdo-8.2.0.2-1.el9.x86_64
kmod-kvdo-8.2.1.6-1.el9_1.*****.x86_64

The last one I got from here so I can use kernel-ml from elrepo. Also I can see several vdo kernel threads working during the problem.

The text was updated successfully, but these errors were encountered:

tigerblue77 · 2023-05-08T07:32:56Z

Hello, I can also confirm this behavior on professional hardware (Dell PowerEdge R720XD) with robust storage (12x 6TB hardware RAID 6) and it does indeed make a bunch of I/O that slows down the system drastically during fstrim. So I imagine that on a classic system (with a single disk, even SSD) this can be quite problematic.

beertje44 · 2023-05-08T16:43:08Z

Did some more digging around:

I booted the default kernel for AlmaLinux 9.1: 5.14.0-162.23.1.el9_1.x86_64. As soon as I entered time fstrim -v / the system almost completely locked up. That is: running processes continued to work but even a new shell process did not launch at all. So I did manage to reboot cleanly to recover from this.
From what I understand fstrim should not take very long or make the system completely unresponsive because of excessive load. But I'm no expert here :) FWIW: the manual at redhat.com for vdo does mention enabling the fstrim service as good practice....
I also booted into single user mode and entered time fstrim -v / with the same result: complete lockup. From the looks of it there was some high load on the ssd for a breef amount of time and then not so much (looking at the hdd led considering I was in single user mode without any tooling available).
I also use the ssd as log and cache device for my zfs storage pool. Zfs managed to trim both devices on demand without any problems or noticeable load.
Prometheus logged iops in excess of 200k related to discard operations when the fstrim service ran last time.

Currently running on kernel 6.2.12.

tigerblue77 · 2023-05-08T16:45:55Z

2 advices :

run fstrim in background (add "&" at the end of your command) so that it returns the PID and you can just kill <PID>
don't mix VDO and ZFS

beertje44 · 2023-05-08T19:24:36Z

2 advices :

run fstrim in background (add "&" at the end of your command) so that it returns the PID and you can just kill <PID>

Could have done that, but last time the fstrim service ran it took the kernel about 15 minutes to actually force kill the fstrim process, after that the system returned to its normal state :/

don't mix VDO and ZFS

Yeah, you are probably right on that one. Just couldn't help myself. Looked like a great idea at the time: more space on my root device. Since I always seem to be short on that in past :D

FWIW: it's just a homelab server, not a big production server continually running at its peak performance. And for now I rather help solve the issue (if there is one to begin with) then to run away from it ;)

raeburn · 2023-05-08T23:11:12Z

Yes, discards are a slow area in VDO currently. Right now each block is processed separately, making it about as costly as writing zero blocks to the same locations. If you do use fstrim, it would be best to use it at a time when other load on the system is likely to be light. The fstrim docs even point out that non-queued trim can have a performance impact on other work; in VDO’s case the penalty is a bit more severe, because VDO uses system resources (CPU, I/O bandwidth) rather than handling it all within the disk drive, and it doesn’t use them as efficiently as it probably could.

The “task … blocked” message is expected. It just means the calling thread has been waiting a while as VDO crunches away on a (possibly very large) discard request. If the worker thread sends a discard of 1 GB, for example, then VDO has to process 256k blocks before it can report that the operation is done, each possibly requiring journal updates, ref count updates, etc. There’s some parallelism, but that’s not enough to make 256k operations go quickly.

We’ve got a design that should improve discard handling, but haven’t scheduled the work yet.

In the meantime, there’s a way to control how many of VDO’s 2k internal I/Os-in-progress can be used for discards, if you don’t mind fiddling with low-level controls. Look in /sys/block/dm-/vdo/discards_limit, where dm- is the device name for the VDO device. The number there (default 1536) means about 3/4 of the pool can be used for discards, which could starve or drastically slow other work. If you write a smaller number into that file, it’ll slow the discards themselves a bit, but will reserve more of the pool for non-discard work. I hope that’s enough to make your system usable.

beertje44 · 2023-05-09T02:56:47Z

RedHat had some, a lot of info on tuning VDO for RHEL 7. Ofcourse that does not apply for the current LVM version. But hey I found those settings in lvm.conf and with so light googling around I managed to put it into a lvchange command. Ofcourse this can' t be done on a running system so you need a boot stick or something simular to pull it off:

lvchange --vdosettings 'ack_threads=4 bio_threads=8 cpu_threads=32 hash_zone_threads=4 logical_threads=4 physical_threads=4 max_discard=1024' neo/vpool0

I noticed the defaults were to use very few threads and I have 32 logical cores in this system.

fstrim still causes some delays (100% usage on the ssd for a short while) but I can' t call it lockups but rather slower then normal :) ANd best of it is ot finishes now:

# time fstrim -v / /: 3,3 TiB (3660264259584 bytes) is getrimd fstrim -v / 0,00s user 2,87s system 0% cpu 9:40,82 total

Will look into your suggestion, but its getting way too late over here right now ;)

beertje44 · 2023-05-09T02:58:52Z

Not marking this as closed yet, I had to do some guessing and googling ... IMHO this should at least be documented.

tigerblue77 · 2023-05-09T09:13:17Z

But hey I found those settings in lvm.conf and with so light googling around I managed to put it into a lvchange command. Ofcourse this can' t be done on a running system so you need a boot stick or something simular to pull it off:

Have a look at this where you'll see how I configured mine on Debian with LVM and every VDO parameters.

beertje44 · 2023-05-09T10:58:50Z

Wow that is very cool! Thank you very much, that is way more then I could ask for.

However IMHO there a 2 things:

fstrim is the recommended way to keep at least a SSD in good working performance over time, also for VDO.
fstrim with default settings performs horribly on a VDO volume.

I really don't mind tweaking settings to work around this, but couldn't those defaults be a bit different? I'm a huge fan of RedHat for a really long time now, yeah I'm getiing old. And also in this I like what they are doing with it. But if all of this is intended for people with extensive storage knowledge and the will to take the time to do a proper setup by testing, benchmarking and configuring everything the right way, they should at least state so! They make it sound like an easy addon for your setup on their internet pages on this. In a way it is, but not completely ;)

tigerblue77 mentioned this issue Jun 18, 2023

Throughput is caped on VDO volume #65

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fstrim on a xfs lv backed by vdo renders my computer useless for a very long time #64

fstrim on a xfs lv backed by vdo renders my computer useless for a very long time #64

beertje44 commented May 8, 2023 •

edited

Loading

tigerblue77 commented May 8, 2023

beertje44 commented May 8, 2023

tigerblue77 commented May 8, 2023

beertje44 commented May 8, 2023

raeburn commented May 8, 2023

beertje44 commented May 9, 2023

beertje44 commented May 9, 2023

tigerblue77 commented May 9, 2023

beertje44 commented May 9, 2023

fstrim on a xfs lv backed by vdo renders my computer useless for a very long time #64

fstrim on a xfs lv backed by vdo renders my computer useless for a very long time #64

Comments

beertje44 commented May 8, 2023 • edited Loading

tigerblue77 commented May 8, 2023

beertje44 commented May 8, 2023

tigerblue77 commented May 8, 2023

beertje44 commented May 8, 2023

raeburn commented May 8, 2023

beertje44 commented May 9, 2023

beertje44 commented May 9, 2023

tigerblue77 commented May 9, 2023

beertje44 commented May 9, 2023

beertje44 commented May 8, 2023 •

edited

Loading