Low zfs list performance #8898
Replies: 9 comments
-
IIRC, this is one of the reasons |
Beta Was this translation helpful? Give feedback.
-
Regarding the workaround of using a channel program: as that runs as an atomic operation in the tgx sync context I would expect it to block the txg sync - that would be an an even worse outcome, even in case of it performing an order of magnitude better. As I also repeated the test after setting zfs_compressed_arc_enabled=0 and re-importing the pool: ~4 seconds less total runtime, so no real difference to running compressed arc. Unless setting the module parameter and exporting;importing the pool is turned into a NOP somehow... repeated decompression of ARC contents dosn't seem to be the bottleneck. |
Beta Was this translation helpful? Give feedback.
-
I've been seeing the same strange behaviour (I think from the beggining, which was zfs-0.6.5.9). |
Beta Was this translation helpful? Give feedback.
-
If you run The reason channel programs can be more efficient is because they can run through all datasets at a time, rather than an ioctl per dataset (as an iterator) |
Beta Was this translation helpful? Give feedback.
-
@richardelling I wonder what would happen when using a channel program to replace the Wouldn't it (as a ZCP runs atomically in one TXG) block the TXG sync and through this stall everything else accessing the pool (or worse)? |
Beta Was this translation helpful? Give feedback.
-
@GregorKopka good question. We know that running from userland through the iterator is slow (50 minutes). We don't know what that looks like when the work is all in kernel. Try it :-) |
Beta Was this translation helpful? Give feedback.
-
Here is a simple program that does something along the lines of
I've tried this out on an idle zpool comprising a single spindle - I export and reimport the pool to flush any caches. This has ~100 filesystems and ~10k snapshots. Results:
The cache seems to make little difference to the channel program in this case. If I try it on another pool (4 spindles, 100 fs, 5000 snaps), |
Beta Was this translation helpful? Give feedback.
-
Whoops - that was an unfair comparison; I should have used
Similarly on my 4-spindle pool a |
Beta Was this translation helpful? Give feedback.
-
System information
Describe the problem you're observing
Low performance when running
zfs list -o name -H -r -tall $pool
According to my tests the performance of having everything already cached in ARC (so it can be served from RAM) is only one order of magnitude faster than with having to read everything directly from HDD in the first place.
Granted, this pool has ~238k snapshots in 621 datasets, but nevertheless... ZFS shouldn't need >5 minutes to list these with all the data needed already being in ARC.
Plus the output of arcstat.py, while testing this, dosn't make any sense (see below).
Describe how to reproduce the problem
On an otherwise completely idle system
Linux 4.9.95-gentoo #2 SMP Wed Feb 20 11:21:13 -00 2019 x86_64 Intel(R) Xeon(R) CPU E31245 @ 3.30GHz GenuineIntel GNU/Linux
With all zfs/spl parameters at default, non-default pool properties of
I import the pool and run
which is ~14.36ms per dataset listed (~70 datasets/s).
This isn't great but, as the pool is HDDs based and
zfs list
reading through the metadata sequentially (as the next read needs data from the current one) being limited to the IOPS of one disk, somewhat expected.On that first run I see a
zpool iostat 10
output ofwhile
arcstat.py 10
outputs strange numbersDirectly afterwards I repeat the operation, now with with the ARC having cached everything:
which is ~1.84ms/dataset (~543 datasets/s).
Less than 10 times faster than having to read from disk,
On the cached run I see a representative
top
output ofand according to
zpool iostat
no IO to the drives (=fully cached).The output of
arcstat.py 10
continues to output very strange numbers:Repeating this with the adition of
-s name
reduces the runtime a little, but it's still taking ~1.4ms/dataset (or only ~715 datasets/s) and the same strange numbers in
arcstats.py
output as on the cached run above.Include any warning/errors/backtraces from the system logs
Nothing in the logs.
Beta Was this translation helpful? Give feedback.
All reactions