[2.1] spa: make read/write queues configurable #15696
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Backporting #15675 to 2.1.
Note that master has removed
ZTI_BATCH
and addedZTI_SYNC
. This PR matches 2.1, that is, it acceptsbatch
instead ofsync
. Comments and manpages have been updated accordingly.Motivation and Context
We are finding that as customers get larger and faster machines (hundreds of cores, large NVMe-backed pools) they keep hitting relatively low performance ceilings. Our profiling work almost always finds that they're running into bottlenecks on the SPA IO taskqs. Unfortunately there's often little we can advise at that point, because there's very few ways to change behaviour without patching.
Description
This commit adds two load-time parameters
zio_taskq_read
andzio_taskq_write
that can configure the READ and WRITE IO taskqs directly.This achieves two goals: it gives operators (and those that support them) a way to tune things without requiring a custom build of OpenZFS, which is often not possible, and it lets us easily try different config variations in a variety of environments to inform the development of better defaults for these kind of systems.
Because tuning the IO taskqs really requires a fairly deep understanding of how IO in ZFS works, and generally isn't needed without a pretty serious workload and an ability to identify bottlenecks, only minimal documentation is provided. Its expected that anyone using this is going to have the source code there as well.
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
How Has This Been Tested?
This PR is a further backport of #15695 and has been compiled and sanity checked only. However the "original" version of this was developed at a customer site against 2.1 and has seen hours of testing, so I feel pretty confident about it. Still, ets see what the other PR and CI here shakes out.
Types of changes
Checklist:
Signed-off-by
.