-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fast Dedup: Dedup Quota #15889
Fast Dedup: Dedup Quota #15889
Conversation
Addressed review feedback and rebased to fix merge conflicts |
476200f
to
8ab3dc6
Compare
Fixed dedup quota test for Redhat distros |
8ab3dc6
to
79a7842
Compare
rebased after zap shrinking was landed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, except some cosmetics in tests.
Just a though, not directly related to this commit, except may be the tests, but close: wouldn't it make sense to block dedup for blocks below certain size via some tunable or property? Dedup for blocks with 4KB physical size IMHO is highly questionable, while for blocks with 512 bytes physical size I'd consider malicious, unless there is somehow a huge hit rate for them.
module/zfs/ddt_stats.c
Outdated
ddt_get_ddt_dsize(spa_t *spa) | ||
{ | ||
ddt_object_t ddo_total; | ||
|
||
ddt_get_dedup_object_stats(spa, &ddo_total); | ||
|
||
return (spa->spa_dedup_dsize); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've got a performance overhead concern here. If quota is set to a specific value, ddt_over_quota() called for each new record will call ddt_get_ddt_dsize() each time, which by calling ddt_get_dedup_object_stats() will recalculate space statistics for all checksums, types and classes. While requests will likely be cached, I am worried by CPU overhead. Do we really need to update it each time if load and sync already update it, and it should not change otherwise? Have I missed explanation why do we need it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, there isn't a need to call it so often. Pushed a fix.
[Fast dedup stack rebased to master c98295e] |
@@ -1032,6 +1139,7 @@ ddt_sync_table(ddt_t *ddt, dmu_tx_t *tx, uint64_t txg) | |||
memcpy(&ddt->ddt_histogram_cache, ddt->ddt_histogram, | |||
sizeof (ddt->ddt_histogram)); | |||
spa->spa_dedup_dspace = ~0ULL; | |||
spa->spa_dedup_dsize = ~0ULL; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does spa_dedup_dsize
need to be wiped anywhere aside of may be ddt_load()
? I think it may create a time window for open context between spa_dedup_dsize
being wiped and reset when dedup is disabled for no good reason. I don't think we need to wipe it, only update to the new up to date value during sync.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The performance issue you raised was caused by a past regression in ddt_get_ddt_dsize()
. The original mechanism was that spa_dedup_dsize
(which is a cached value) was reset after syncing the ddt and the next reader of ddt_get_ddt_dsize()
would check if it had been reset and perform a recalculation and the cached value was good until the next txg sync.
I have restored that original behavior and made sure there were no direct consumers of spa_dedup_dsize
and retested to confirm that we only recalculate after ddt is updated during sync.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you. It should help with extra requests. But it seems to me there is another issue, possibly affecting existing spa_dedup_dspace also. I think by this time DDT ZAP is already written, but not yet synced, that means the dnode does not have the new information on used space yet. So ddt_get_ddt_dsize() may regularly fetch and cache the value that is one TXG old.
Last push just tightens up a number parse in the |
I assume if you set it to If you have a |
This adds two new pool properties: - dedup_table_size, the total size of all DDTs on the pool; and - dedup_table_quota, the maximum possible size of all DDTs in the pool When set, quota will be enforced by checking when a new entry is about to be created. If the pool is over its dedup quota, the entry won't be created, and the corresponding write will be converted to a regular non-dedup write. Note that existing entries can be updated (ie their refcounts changed), as that reuses the space rather than requiring more. dedup_table_quota can be set to 'auto', which will set it based on the size of the devices backing the "dedup" allocation device. This makes it possible to limit the DDTs to the size of a dedup vdev only, such that when the device fills, no new blocks are deduplicated. Sponsored-by: iXsystems, Inc. Sponsored-By: Klara Inc. Co-authored-by: Rob Wing <[email protected]> Co-authored-by: Sean Eric Fagan <[email protected]> Co-authored-by: Allan Jude <[email protected]> Signed-off-by: Rob Norris <[email protected]> Signed-off-by: Don Brady <[email protected]>
If you set it to
|
. $STF_SUITE/include/libtest.shlib | ||
|
||
DISK=${DISKS%% *} | ||
|
||
default_setup $DISK |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you ever use the default pool in dedup_quota.ksh
, so I think you can git rid of setup.ksh and cleanup.ksh.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe followup commits add additional tests which depends on these so there's no harm in adding them now.
I haven't worked much with the dedup code, but I don't see any major issues. |
If you can fix the minor setup.ksh and cleanup.ksh stuff, then I think we can start wrapping this up. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for addressing my previous comments. Looks good.
* If dedup quota is 0, we translate this into 'none' | ||
* (unless literal is set). And if it is UINT64_MAX | ||
* we translate that as 'automatic' (limit to size of | ||
* the dedicated dedup VDEV. Otherwise, fall throught |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* the dedicated dedup VDEV. Otherwise, fall throught | |
* the dedicated dedup VDEV. Otherwise, fall through |
This adds two new pool properties: - dedup_table_size, the total size of all DDTs on the pool; and - dedup_table_quota, the maximum possible size of all DDTs in the pool When set, quota will be enforced by checking when a new entry is about to be created. If the pool is over its dedup quota, the entry won't be created, and the corresponding write will be converted to a regular non-dedup write. Note that existing entries can be updated (ie their refcounts changed), as that reuses the space rather than requiring more. dedup_table_quota can be set to 'auto', which will set it based on the size of the devices backing the "dedup" allocation device. This makes it possible to limit the DDTs to the size of a dedup vdev only, such that when the device fills, no new blocks are deduplicated. Sponsored-by: iXsystems, Inc. Sponsored-By: Klara Inc. Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Rob Norris <[email protected]> Signed-off-by: Don Brady <[email protected]> Co-authored-by: Don Brady <[email protected]> Co-authored-by: Rob Wing <[email protected]> Co-authored-by: Sean Eric Fagan <[email protected]> Closes openzfs#15889
Some aspects are clear to me, others not
unclear
|
Motivation and Context
Dedup tables can grow unbounded, which lets them consume an entire dedup vdev and so spill into the main pool, or grow too big to fit in RAM, hurting performance. This change adds options to allow the administrator to set a quota, which when reached will effectively disable dedup for new blocks.
Description
This adds two new pool properties:
dedup_table_size
, the total size of all DDTs on the pool; anddedup_table_quota
, the maximum possible size of all DDTs in the poolWhen set, quota will be enforced by checking when a new entry is about to be created. If the pool is over its dedup quota, the entry won't be created, and the corresponding write will be converted to a regular non-dedup write. Note that existing entries can be updated (ie their refcounts changed), as that reuses the space rather than requiring more.
dedup_table_quota
can be set toauto
, which will set it based on the size of the devices backing the "dedup" allocation class. This makes it possible to limit the DDTs to the size of a dedup vdev only, such that when the device fills, no new blocks are deduplicated.This replaces #10169
How Has This Been Tested?
Test added.
Types of changes
Checklist:
Signed-off-by
.