-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fstrim on a xfs lv backed by vdo renders my computer useless for a very long time #64
Comments
Hello, I can also confirm this behavior on professional hardware (Dell PowerEdge R720XD) with robust storage (12x 6TB hardware RAID 6) and it does indeed make a bunch of I/O that slows down the system drastically during fstrim. So I imagine that on a classic system (with a single disk, even SSD) this can be quite problematic. |
Did some more digging around:
Currently running on kernel 6.2.12. |
2 advices :
|
Could have done that, but last time the fstrim service ran it took the kernel about 15 minutes to actually force kill the fstrim process, after that the system returned to its normal state :/
Yeah, you are probably right on that one. Just couldn't help myself. Looked like a great idea at the time: more space on my root device. Since I always seem to be short on that in past :D FWIW: it's just a homelab server, not a big production server continually running at its peak performance. And for now I rather help solve the issue (if there is one to begin with) then to run away from it ;) |
Yes, discards are a slow area in VDO currently. Right now each block is processed separately, making it about as costly as writing zero blocks to the same locations. If you do use fstrim, it would be best to use it at a time when other load on the system is likely to be light. The fstrim docs even point out that non-queued trim can have a performance impact on other work; in VDO’s case the penalty is a bit more severe, because VDO uses system resources (CPU, I/O bandwidth) rather than handling it all within the disk drive, and it doesn’t use them as efficiently as it probably could. The “task … blocked” message is expected. It just means the calling thread has been waiting a while as VDO crunches away on a (possibly very large) discard request. If the worker thread sends a discard of 1 GB, for example, then VDO has to process 256k blocks before it can report that the operation is done, each possibly requiring journal updates, ref count updates, etc. There’s some parallelism, but that’s not enough to make 256k operations go quickly. We’ve got a design that should improve discard handling, but haven’t scheduled the work yet. In the meantime, there’s a way to control how many of VDO’s 2k internal I/Os-in-progress can be used for discards, if you don’t mind fiddling with low-level controls. Look in /sys/block/dm-/vdo/discards_limit, where dm- is the device name for the VDO device. The number there (default 1536) means about 3/4 of the pool can be used for discards, which could starve or drastically slow other work. If you write a smaller number into that file, it’ll slow the discards themselves a bit, but will reserve more of the pool for non-discard work. I hope that’s enough to make your system usable. |
RedHat had some, a lot of info on tuning VDO for RHEL 7. Ofcourse that does not apply for the current LVM version. But hey I found those settings in lvm.conf and with so light googling around I managed to put it into a lvchange command. Ofcourse this can' t be done on a running system so you need a boot stick or something simular to pull it off:
I noticed the defaults were to use very few threads and I have 32 logical cores in this system. fstrim still causes some delays (100% usage on the ssd for a short while) but I can' t call it lockups but rather slower then normal :) ANd best of it is ot finishes now:
Will look into your suggestion, but its getting way too late over here right now ;) |
Not marking this as closed yet, I had to do some guessing and googling ... IMHO this should at least be documented. |
Have a look at this where you'll see how I configured mine on Debian with LVM and every VDO parameters. |
Wow that is very cool! Thank you very much, that is way more then I could ask for. However IMHO there a 2 things:
I really don't mind tweaking settings to work around this, but couldn't those defaults be a bit different? I'm a huge fan of RedHat for a really long time now, yeah I'm getiing old. And also in this I like what they are doing with it. But if all of this is intended for people with extensive storage knowledge and the will to take the time to do a proper setup by testing, benchmarking and configuring everything the right way, they should at least state so! They make it sound like an easy addon for your setup on their internet pages on this. In a way it is, but not completely ;) |
I've expercienced this twice on a ALmaLinux box with a single PV and VG on a Samsung PM9A3 3.8TB Nvme SSD: when fstrim runs the system doesn't crash but the load becomes unworkable (in excess of 180!). It seems this is related to my vdo backed (compression and deduplication) main xfs LV. Last time I somehow managed to kill fstrim and everything returned to normal shortly after that.
Apart from the excessive load, dmesg tells me:
135.790363] INFO: task kworker/11:3:21829 blocked for more than 122 seconds. [79135.790368] Tainted: P OE 6.2.12-1.el9_1.******x86_64 #1 [79135.790371] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [79135.790373] task:kworker/11:3 state:D stack:0 pid:21829 ppid:2 flags:0x00004000 [79135.790380] Workqueue: xfs-inodegc/dm-2 xfs_inodegc_worker [xfs] [79135.790575] Call Trace: [79135.790577] <TASK> [79135.790581] __schedule+0x1fb/0x550 [79135.790589] schedule+0x5d/0xd0 [79135.790595] schedule_timeout+0x148/0x160 [79135.790602] ___down_common+0x111/0x170 [79135.790612] ? down+0x1a/0x60 [79135.790621] __down_common+0x1e/0xc0 [79135.790647] down+0x43/0x60 [79135.790659] xfs_buf_lock+0x2d/0xe0 [xfs] [79135.790857] xfs_buf_find_lock+0x45/0xf0 [xfs] [79135.791039] xfs_buf_lookup.constprop.0+0xe4/0x170 [xfs] [79135.791222] xfs_buf_get_map+0xc1/0x3a0 [xfs] [79135.791407] xfs_buf_read_map+0x54/0x2a0 [xfs] [79135.791593] ? xfs_read_agf+0x89/0x130 [xfs] [79135.791822] xfs_trans_read_buf_map+0x115/0x300 [xfs] [79135.792068] ? xfs_read_agf+0x89/0x130 [xfs] [79135.792253] xfs_read_agf+0x89/0x130 [xfs] [79135.792427] xfs_alloc_read_agf+0x50/0x210 [xfs] [79135.792602] xfs_alloc_fix_freelist+0x3dd/0x510 [xfs] [79135.792801] ? preempt_count_add+0x70/0xa0 [79135.792809] ? _raw_spin_lock+0x13/0x40 [79135.792816] ? _raw_spin_unlock+0x15/0x30 [79135.792823] ? xfs_inode_to_log_dinode+0x210/0x410 [xfs] [79135.793039] ? xfs_efi_item_format+0x72/0xd0 [xfs] [79135.793228] xfs_free_extent_fix_freelist+0x61/0xa0 [xfs] [79135.793409] __xfs_free_extent+0x72/0x1c0 [xfs] [79135.793584] xfs_trans_free_extent+0x45/0x100 [xfs] [79135.793809] xfs_extent_free_finish_item+0x69/0xa0 [xfs] [79135.793998] xfs_defer_finish_noroll+0x187/0x530 [xfs] [79135.794220] xfs_defer_finish+0x11/0x70 [xfs] [79135.794398] xfs_itruncate_extents_flags+0xca/0x250 [xfs] [79135.794608] xfs_inactive_truncate+0xab/0xe0 [xfs] [79135.794800] xfs_inactive+0x154/0x170 [xfs] [79135.794970] xfs_inodegc_worker+0xa3/0x170 [xfs] [79135.795156] process_one_work+0x1e5/0x3f0 [79135.795165] ? __pfx_worker_thread+0x10/0x10 [79135.795172] worker_thread+0x50/0x3a0 [79135.795179] ? __pfx_worker_thread+0x10/0x10 [79135.795185] kthread+0xe8/0x110 [79135.795189] ? __pfx_kthread+0x10/0x10 [79135.795194] ret_from_fork+0x2c/0x50 [79135.795205] </TASK>
vdo-8.2.0.2-1.el9.x86_64
kmod-kvdo-8.2.1.6-1.el9_1.*****.x86_64
The last one I got from here so I can use kernel-ml from elrepo. Also I can see several vdo kernel threads working during the problem.
The text was updated successfully, but these errors were encountered: