Linux load average increased since v,2.2.0 #15453

jnyilas · 2023-10-25T20:34:03Z

System information

Type	Version/Name
Distribution Name	OpenSUSE Leap
Distribution Version	15.5
Kernel Version	5.14.21
Architecture	x86_64
OpenZFS Version	2.2.0-1

Describe the problem you're observing

I recently upgraded to v,2.2.0 on my workstation. Since this upgrade, I noticed my load average has increased +3 from what it was before. Since Linux includes uninterruptible tasks in the load calculation, I am pretty sure this is it, as I have three (new) uninterruptible sleep threads:

[jnyilas@photon:~]  > uptime
 16:12:25  up   5:20,  3 users,  load average: 3.29, 3.32, 3.19

[jnyilas@photon:~] 2 > ps aux | grep -w D\<
root       880  0.0  0.0      0     0 ?        D<   10:52   0:00 [vdev_autotrim]
root       978  0.0  0.0      0     0 ?        D<   10:52   0:00 [vdev_autotrim]
root       979  0.0  0.0      0     0 ?        D<   10:52   0:00 [vdev_autotrim]

Any ideas with what changed on the autotrim code path? I have been using this feature for quite a long time.
I am not seeing any abnormal latency issues nor excessive autotrim ops on my two zpools, and performance remains good despite the increased load average.

[jnyilas@photon:~] > zpool iostat -w

photon       total_wait     disk_wait    syncq_wait    asyncq_wait
latency      read  write   read  write   read  write   read  write  scrub   trim  rebuild
----------  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----
1ns             0      0      0      0      0      0      0      0      0      0      0
3ns             0      0      0      0      0      0      0      0      0      0      0
7ns             0      0      0      0      0      0      0      0      0      0      0
15ns            0      0      0      0      0      0      0      0      0      0      0
31ns            0      0      0      0      0      0      0      0      0      0      0
63ns            0      0      0      0      0      0      0      0      0      0      0
127ns           0      0      0      0      0      0      0      0      0      0      0
255ns           0      0      0      0    168  2.47K     37  1.65K      2     17      0
511ns           0      0      0      0  2.90K  10.2K    603  11.3K     74    666      0
1us             0      0      0      0  2.57K  5.27K    500  18.5K    137    134      0
2us             0      0      0      0    411  2.09K     52  5.39K     33     87      0
4us             0      0      0      0     54    350      1  4.40K      8      5      0
8us             0      0      0      0      5      6      9  8.51K      7      0      0
16us            2  18.7K     26  82.0K      3     15     16  19.7K     24      2      0
32us            2  42.1K    141   104K     27      9     20  25.9K     48      0      0
65us          114  52.8K  2.25K  44.2K     58      7     52  35.8K     73      0      0
131us       4.92K  62.1K  57.2K  20.0K     51     18    115  48.9K    143      0      0
262us       2.63K  37.2K   402K  7.14K     61     26    158  22.9K    352      0      0
524us       1.14K  2.99K  38.8K    785     23     30    165  1.88K    608    352      0
1ms         1.41K  3.00K    617  1.73K      2     50    126  2.25K  1.03K      0      0
2ms         1.76K  2.29K     36    247      1     60     33  2.00K  1.61K    335      0
4ms         2.47K  1.76K      6     46      0    114     49  1.56K  2.40K    699      0
8ms         3.77K  44.4K    127  8.29K      0    596     49  35.6K  3.52K  19.3K      0
16ms         414K  1.40K      0      0      0    485      0    757   417K    822      0
33ms        59.5K      5      0      0      0      0      0      5  56.3K      2      0
67ms        5.19K      0      0      0      0      0      0      0  5.17K      0      0
134ms       3.89K      0      0      0      0      0      0      0  3.89K      0      0
268ms         227      0      0      0      0      0      0      0    224      0      0
536ms           0      0      0      0      0      0      0      0      0      0      0
1s              0      0      0      0      0      0      0      0      0      0      0
2s              0      0      0      0      0      0      0      0      0      0      0
4s              0      0      0      0      0      0      0      0      0      0      0
8s              0      0      0      0      0      0      0      0      0      0      0
17s             0      0      0      0      0      0      0      0      0      0      0
34s             0      0      0      0      0      0      0      0      0      0      0
68s             0      0      0      0      0      0      0      0      0      0      0
137s            0      0      0      0      0      0      0      0      0      0      0
---------------------------------------------------------------------------------------

photon-dr    total_wait     disk_wait    syncq_wait    asyncq_wait
latency      read  write   read  write   read  write   read  write  scrub   trim  rebuild
----------  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----
1ns             0      0      0      0      0      0      0      0      0      0      0
3ns             0      0      0      0      0      0      0      0      0      0      0
7ns             0      0      0      0      0      0      0      0      0      0      0
15ns            0      0      0      0      0      0      0      0      0      0      0
31ns            0      0      0      0      0      0      0      0      0      0      0
63ns            0      0      0      0      0      0      0      0      0      0      0
127ns           0      0      0      0      0      0      0      0      0      0      0
255ns           0      0      0      0     11    132      0      0      3      0      0
511ns           0      0      0      0     73    851      1    354    182      0      0
1us             0      0      0      0    298    607      4  2.18K    731      0      0
2us             0      0      0      0     37    286      0    395    463      0      0
4us             0      0      0      0      0     26      0     28     31      0      0
8us             0      0      0      0      0      0      0     39      0      0      0
16us            0      5      0      5      0      6      0     76     16      0      0
32us            1      8      1      8      0      0      0    186      3      0      0
65us           36     12     79     35      0      0      0    442      6      0      0
131us         187  1.55K    378  9.54K      0      0      0  1.37K     26      0      0
262us         172  3.48K    309  10.4K      0      0      0  2.07K     59      0      0
524us         196  5.19K    393  3.79K      0      0      0  4.29K    161      0      0
1ms            12  8.09K    674    458      0      0      1  7.37K     10      0      0
2ms            26  4.90K  3.86K     16      0      0      0  2.95K     33      0      0
4ms           191    282  11.4K     37      0      0      0    178    184      0      0
8ms           608    374  25.2K     83      0      0      0    299    485      0      0
16ms        1.39K    360  41.5K     79      0      0      1    263  1.16K      0      0
33ms        1.11K    239   307K     28      0      0      7    203  1.04K      0      0
67ms        2.51K      0  1.79K      0      2      0     18      0  2.79K      0      0
134ms        307K     64    638     50      1      0      0     14   313K      0      0
268ms       29.9K      6     50      6      0      0      0      0  23.7K      0      0
536ms       14.8K      0     20      0      0      0      0      0  14.1K      0      0
1s          10.1K      0     11      0      0      0      0      0  9.92K      0      0
2s          8.38K      0     11      0      0      0      0      0  8.28K      0      0
4s          11.0K      0      0      0      0      0      0      0  11.0K      0      0
8s          4.85K      0      0      0      0      0      0      0  4.82K      0      0
17s           974      0      0      0      0      0      0      0    968      0      0
34s             0      0      0      0      0      0      0      0      0      0      0
68s             0      0      0      0      0      0      0      0      0      0      0
137s            0      0      0      0      0      0      0      0      0      0      0
---------------------------------------------------------------------------------------
[jnyilas@photon:~] > zpool iostat -l 5
              capacity     operations     bandwidth    total_wait     disk_wait    syncq_wait    asyncq_wait  scrub   trim  rebuild
pool        alloc   free   read  write   read  write   read  write   read  write   read  write   read  write   wait   wait   wait
----------  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----
photon      45.2G   411G     26     14  2.43M   234K   14ms    1ms  202us  236us    5us  464us  365us    1ms   15ms    6ms      -
photon-dr    313G   615G     20      1  16.6M  15.1K  365ms    1ms   21ms  539us  477us  739ns   34ms    1ms  361ms      -      -
----------  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----
photon      45.2G   411G      0     22      0   534K      -    1ms      -  155us      -  336ns      -    1ms      -      -      -
photon-dr    313G   615G      0      0      0      0      -      -      -      -      -      -      -      -      -      -      -
----------  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----

[jnyilas@photon:~] > zpool iostat -q
              capacity     operations     bandwidth    syncq_read    syncq_write   asyncq_read  asyncq_write   scrubq_read   trimq_write  rebuildq_write
pool        alloc   free   read  write   read  write   pend  activ   pend  activ   pend  activ   pend  activ   pend  activ   pend  activ   pend  activ
----------  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----
photon      45.2G   411G     26     14  2.41M   234K      0      0      0      0      0      0      0      0      0      0      0      0      0      0
photon-dr    313G   615G     20      1  16.4M  15.0K      0      0      0      0      0      0      0      0      0      0      0      0      0      0

Describe how to reproduce the problem

look for uninterruptible zfs threads contributing to load average increase

Include any warning/errors/backtraces from the system logs

N/A

The text was updated successfully, but these errors were encountered:

stuartthebruce · 2023-10-28T20:04:19Z

I have also seen this on an idle RL8.8 system with autotrim enabled after upgrading from 2.1.12 to 2.2.0 and rebooting,

[root@repo2 ~]# uptime
 12:54:01 up 11 min,  1 user,  load average: 1.00, 0.93, 0.58

[root@repo2 ~]# zpool get autotrim repo2
NAME   PROPERTY  VALUE     SOURCE
repo2  autotrim  on        local

[root@repo2 ~]# ps aux | grep -w D\<
root        1264  0.0  0.0      0     0 ?        D<   12:42   0:00 [vdev_autotrim]

[root@repo2 ~]# cat /etc/redhat-release 
Rocky Linux release 8.8 (Green Obsidian)

[root@repo2 ~]# uname -a
Linux repo2 4.18.0-477.27.1.el8_8.x86_64 #1 SMP Wed Sep 20 15:55:39 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

[root@repo2 ~]# cat /sys/module/zfs/version 
2.2.0-1

In addition, disabling autotrim did not get rid of the vdev_autotrim kernel thread,

[root@repo2 ~]# zpool set autotrim=off repo2

[root@repo2 ~]# zpool get autotrim repo2
NAME   PROPERTY  VALUE     SOURCE
repo2  autotrim  off       default

[root@repo2 ~]# ps aux | grep -w D\<
root        1264  0.0  0.0      0     0 ?        D<   12:42   0:00 [vdev_autotrim]

However, rebooting does get rid of that that kernel thread and restores the system to a load of less than 1,

[root@repo2 ~]# uptime
 13:03:02 up 1 min,  2 users,  load average: 0.15, 0.12, 0.05

[root@repo2 ~]# ps aux | grep -w D\<
root        3276  0.0  0.0  12144  1100 pts/1    S+   13:03   0:00 grep --color=auto -w D<

[root@repo2 ~]# zpool get autotrim repo2
NAME   PROPERTY  VALUE     SOURCE
repo2  autotrim  off       default

This then begs the question whether autotrim is working in 2.2.0?

stuartthebruce · 2023-10-28T20:10:55Z

Note, there are no extra lines in dbgmsg after enabling trim, which starts a D-state kernel thread,

[root@repo2 ~]# cat /proc/spl/kstat/zfs/dbgmsg
...
1698523656   spa_history.c:294:spa_history_log_sync(): command: zpool set autotrim=on repo2

[root@repo2 ~]# ps aux | grep -w D\<
root        7193  0.0  0.0      0     0 ?        D<   13:07   0:00 [vdev_autotrim]

jnyilas · 2023-10-29T06:12:59Z

It seems to me that autotrim is working with v,2.2.0, but differently than before. At least the request counts (-r) and latency (-w) as indicated by zpool-iostat(8) are showing that the disk trim IO is not blocking. Now, why it appears that the kernel thread is waiting for an IO to finish needs further investigation.

jnyilas · 2023-10-30T17:55:04Z

All three have the same stack:

# cat /proc/880/stack 
[<0>] cv_wait_common+0x117/0x130 [spl]
[<0>] vdev_autotrim_wait_kick+0x63/0xb0 [zfs]
[<0>] vdev_autotrim_thread+0x6d8/0x7c0 [zfs]
[<0>] thread_generic_wrapper+0x59/0x70 [spl]
[<0>] kthread+0x156/0x180
[<0>] ret_from_fork+0x22/0x30

behlendorf · 2023-10-30T20:13:43Z

This looks like it's related to the change in #12194 which does adjust how autotrim works slightly. @jxdking could you look in to this.

mailinglists35 · 2023-11-09T15:12:37Z

Getting the same behaviour after 2.1 -> 2.2 upgrade.

OL9, default uek kernel

$ ps -axo state,pid,command | grep ^D
D   92094 [vdev_autotrim]
D   92095 [vdev_autotrim]

$ top -b 1 | head
top - 17:13:53 up 19:29,  3 users,  load average: 2.20, 2.16, 3.23
Tasks: 354 total,   1 running, 352 sleeping,   1 stopped,   0 zombie
%Cpu0  :  0.0 us,  6.2 sy,  0.0 ni, 93.8 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu1  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu2  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu3  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
GiB Mem :     15.6 total,      8.9 free,      3.5 used,      3.8 buff/cache
GiB Swap:     40.0 total,     39.8 free,      0.2 used.     12.1 avail Mem

$ systool -avm zfs|grep trim
    l2arc_trim_ahead    = "0"
    zfs_trim_extent_bytes_max= "134217728"
    zfs_trim_extent_bytes_min= "32768"
    zfs_trim_metaslab_skip= "0"
    zfs_trim_queue_limit= "10"
    zfs_trim_txg_batch  = "32"
    zfs_vdev_trim_max_active= "2"
    zfs_vdev_trim_min_active= "1"

$ echo t |sudo tee /proc/sysrq-trigger
t

$ journalctl -kb --no-host
[...]
Nov 09 17:08:03 kernel:  </TASK>
Nov 09 17:08:03 kernel: task:vdev_autotrim   state:D stack:    0 pid:92094 ppid:     2 flags:0x00004000
Nov 09 17:08:03 kernel: Call Trace:
Nov 09 17:08:03 kernel:  <TASK>
Nov 09 17:08:03 kernel:  __schedule+0x224/0x588
Nov 09 17:08:03 kernel:  schedule+0x5f/0xdf
Nov 09 17:08:03 kernel:  cv_wait_common+0x106/0x140 [spl]
Nov 09 17:08:03 kernel:  ? finish_wait+0x90/0x89
Nov 09 17:08:03 kernel:  vdev_autotrim_thread+0x65e/0x950 [zfs]
Nov 09 17:08:03 kernel:  ? set_next_task_fair+0x30/0xdd
Nov 09 17:08:03 kernel:  ? vdev_trim_thread+0x460/0x460 [zfs]
Nov 09 17:08:03 kernel:  thread_generic_wrapper+0x59/0x70 [spl]
Nov 09 17:08:03 kernel:  ? spl_taskq_fini+0x80/0x80 [spl]
Nov 09 17:08:03 kernel:  kthread+0x127/0x144
Nov 09 17:08:03 kernel:  ? set_kthread_struct+0x60/0x52
Nov 09 17:08:03 kernel:  ret_from_fork+0x22/0x2d
Nov 09 17:08:03 kernel:  </TASK>
Nov 09 17:08:03 kernel: task:vdev_autotrim   state:D stack:    0 pid:92095 ppid:     2 flags:0x00004000
Nov 09 17:08:03 kernel: Call Trace:
Nov 09 17:08:03 kernel:  <TASK>
Nov 09 17:08:03 kernel:  __schedule+0x224/0x588
Nov 09 17:08:03 kernel:  schedule+0x5f/0xdf
Nov 09 17:08:03 kernel:  cv_wait_common+0x106/0x140 [spl]
Nov 09 17:08:03 kernel:  ? finish_wait+0x90/0x89
Nov 09 17:08:03 kernel:  vdev_autotrim_thread+0x65e/0x950 [zfs]
Nov 09 17:08:03 kernel:  ? set_next_task_fair+0x30/0xdd
Nov 09 17:08:03 kernel:  ? vdev_trim_thread+0x460/0x460 [zfs]
Nov 09 17:08:03 kernel:  thread_generic_wrapper+0x59/0x70 [spl]
Nov 09 17:08:03 kernel:  ? spl_taskq_fini+0x80/0x80 [spl]
Nov 09 17:08:03 kernel:  kthread+0x127/0x144
Nov 09 17:08:03 kernel:  ? set_kthread_struct+0x60/0x52
Nov 09 17:08:03 kernel:  ret_from_fork+0x22/0x2d
Nov 09 17:08:03 kernel:  </TASK>

giacomocariello · 2023-11-30T00:14:26Z

Same here, NixOS 23.11, brand new.

Gendra13 · 2023-12-30T21:05:28Z

This looks like it's related to the change in #12194 which does adjust how autotrim works slightly. @jxdking could you look in to this.

@behlendorf Is there any new insight into this problem? Because, unfortunately it doesn't look like jxdking has been active in the last time.

When I did my own tests, I was also able to reproduce this behavior both on Ubuntu 22.04 (with zfs-2.2.2 dkms manually installed) as well as both on Ubuntu 23.10 and the pre-release version of 24.04 which both ship with zfs-2.2.x build-in.

It looks like the load is permanently increased by one for each pool that has autotrim set to on even if there is no other activity on the drives.

jnyilas · 2023-12-31T23:36:02Z

It looks like the load is permanently increased by one for each pool that has autotrim set to on even if there is no other activity on the drives.

My interpretation of the data is different. The # of threads in uninterruptible sleep is actually the number of TRIMable underlying storage devices across all zpools. I have 2 zpools built on three NVME vdevs. My load is +3, and I can see one auto_trim thread in D state for each vdev.

In other words, the load will scale with the number of underlying storage devices capable of TRIM, so this is a bigger problem, not a smaller one.

maxpoulin64 · 2024-01-09T02:34:55Z

Also just noticed this on my system. I have two pools that are on SSDs, autotrim is enabled, and my load will only ever go down to 2.0, matching the other's descriptions of having one [vdev_autotrim] per trimmable pool. The HDD-backed pool is not affected since HDDs don't TRIM.

Fully updated ArchLinux and ZFS compiled from the AUR (no archzfs).

I might try building with that commit reverted and see if it still does it.

jnyilas · 2024-01-09T05:12:45Z

I have two pools that are on SSDs, autotrim is enabled, and my load will only ever go down to 2.0, matching the other's descriptions of having one [vdev_autotrim] per trimmable pool. The HDD-backed pool is not affected since HDDs don't TRIM.

No. The correct description is that you have two total TRIMable vdevs assigned to zpool.

The number of zpools is not a factor. The number of vdevs participating in auto_trim each get an uninterruptible thread. That is the deciding factor driving the load increase. This is why this is a bigger problem than it appears. And, the bigger and more complex your system is, the worse this is. The load will scale with the number of vdevs participating in auto_trim.

E.G.
Here is just one zpool consisting of 4 vdev SSDs, and the zpool set to auto_trim.
Guess what the load is if we have 4 threads in uninterruptible sleep?

# ps aux | grep D\[\<]
root     19937  0.0  0.0      0     0 ?        D<   23:48   0:00 [vdev_autotrim]
root     19938  0.0  0.0      0     0 ?        D<   23:48   0:00 [vdev_autotrim]
root     19939  0.0  0.0      0     0 ?        D<   23:48   0:00 [vdev_autotrim]
root     19940  0.0  0.0      0     0 ?        D<   23:48   0:00 [vdev_autotrim]

# zpool status trimtest
  pool: trimtest
 state: ONLINE
config:

	NAME        STATE     READ WRITE CKSUM
	trimtest    ONLINE       0     0     0
	  sda6      ONLINE       0     0     0
	  sda7      ONLINE       0     0     0
	  sda8      ONLINE       0     0     0
	  sda9      ONLINE       0     0     0

# w
 00:01:31 up 14:37,  1 user,  load average: 4.25, 3.92, 2.51 <<<<

QED

jxdking · 2024-01-16T18:22:50Z

Here is the attempt to fix the issue #15781 .

My test environment was teared down months ago in order to work on other projects. I have no way to test it. Please give feedback whether it fixes the issue. Thanks.

Gendra13 · 2024-01-16T19:17:44Z

Thank you very much.

I did a quick test on my testing environment running Ubuntu 22.04 LTS and can indeed confirm that the high load average is fixed and the uninterruptible threads are gone.

behlendorf · 2024-01-17T17:16:50Z

Fixed by #15781

jnyilas added the Type: Defect Incorrect behavior (e.g. crash, hang) label Oct 25, 2023

behlendorf added the Type: Regression Indicates a functional regression label Oct 25, 2023

jxdking mentioned this issue Jan 16, 2024

Autotrim High Load Average Fix #15781

Merged

13 tasks

behlendorf closed this as completed Jan 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Linux load average increased since v,2.2.0 #15453

Linux load average increased since v,2.2.0 #15453

jnyilas commented Oct 25, 2023

stuartthebruce commented Oct 28, 2023 •

edited

Loading

stuartthebruce commented Oct 28, 2023

jnyilas commented Oct 29, 2023

jnyilas commented Oct 30, 2023

behlendorf commented Oct 30, 2023

mailinglists35 commented Nov 9, 2023 •

edited

Loading

giacomocariello commented Nov 30, 2023

Gendra13 commented Dec 30, 2023

jnyilas commented Dec 31, 2023

maxpoulin64 commented Jan 9, 2024

jnyilas commented Jan 9, 2024

jxdking commented Jan 16, 2024 •

edited

Loading

Gendra13 commented Jan 16, 2024

behlendorf commented Jan 17, 2024

Linux load average increased since v,2.2.0 #15453

Linux load average increased since v,2.2.0 #15453

Comments

jnyilas commented Oct 25, 2023

System information

Describe the problem you're observing

Describe how to reproduce the problem

Include any warning/errors/backtraces from the system logs

stuartthebruce commented Oct 28, 2023 • edited Loading

stuartthebruce commented Oct 28, 2023

jnyilas commented Oct 29, 2023

jnyilas commented Oct 30, 2023

behlendorf commented Oct 30, 2023

mailinglists35 commented Nov 9, 2023 • edited Loading

giacomocariello commented Nov 30, 2023

Gendra13 commented Dec 30, 2023

jnyilas commented Dec 31, 2023

maxpoulin64 commented Jan 9, 2024

jnyilas commented Jan 9, 2024

jxdking commented Jan 16, 2024 • edited Loading

Gendra13 commented Jan 16, 2024

behlendorf commented Jan 17, 2024

stuartthebruce commented Oct 28, 2023 •

edited

Loading

mailinglists35 commented Nov 9, 2023 •

edited

Loading

jxdking commented Jan 16, 2024 •

edited

Loading