zio_compress: introduce max_size threshold #9416

gmelikov · 2019-10-05T19:38:07Z

Created brand new PR because of some problems with rebasing old one #9311, + updated description.

Motivation and Context

87.5% minimum compressratio is way to large, on 1TB with recordsize 16M such artifical limitation may lead to up to 125GB of wasted space in worst case (but you must be very, very unlucky).

So i propose to check only for max ashift of pool, it may save you nearly 99% of compressratio on large recordsizes.

And I think if you enable slow compression on dataset you will prefer better compressratio over decompression absence for data nowadays.

So, pros:

better compressratio
better IO if decompress is faster than disks
we're already compressing everything
it may be tuned for not so compressible data via module parameter

Cons:

if you have data, which used to be written uncompressible due to previous code - you will see better compressratio at the price of more CPU load

Description

Now default compression is lz4, which can stop
compression process by itself on incompressible data.
So we can check only for common sector size.

How Has This Been Tested?

Generate 12% (just as needed to get into 87.5% threshold) compressible file with fio:

# cat fio.conf
[workload]
bs=128k
ioengine=libaio
iodepth=1
numjobs=1
direct=1
runtime=10
size=100m
filename=bigfile
rw=read
thread
unified_rw_reporting=1
group_reporting=1
buffer_compress_percentage=12
refill_buffers
buffer_pattern=0xdeadbeef

zfs set compression=lz4 rpool/test

version	recordsize	ashift	written	diff (from original)
0.7.13	1M	12	100M
master_patched	1M	12	88.8M	~11.2%

So there may be up to ~12.5% compression gain for hardly compressible files.

Tested manually in VM, didn't run actual system on this code yet.

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Performance enhancement (non-breaking change which improves efficiency)
Code cleanup (non-breaking change which makes code smaller or more readable)
Breaking change (fix or feature that would cause existing functionality to change)
Documentation (a change to man pages or other documentation)

Checklist:

My code follows the ZFS on Linux code style requirements.
I have updated the documentation accordingly.
I have read the contributing document.
I have added tests to cover my changes.
All new and existing tests passed.
All commit messages are properly formatted and contain Signed-off-by.

codecov · 2019-10-06T04:05:10Z

Codecov Report

Merging #9416 into master will increase coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master    #9416      +/-   ##
==========================================
+ Coverage   79.15%   79.15%   +<.01%     
==========================================
  Files         416      416              
  Lines      123655   123658       +3     
==========================================
+ Hits        97876    97880       +4     
+ Misses      25779    25778       -1

Flag	Coverage Δ
#kernel	`79.75% <85.71%> (+0.01%)`	⬆️
#user	`66.64% <85.71%> (-0.54%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 685abd5...39d1434. Read the comment docs.

GregorKopka

See inline comments.

Also it should be verified that changing the module parameter won't modify the result for existing data when called from https://github.com/zfsonlinux/zfs/blob/13a4027a7cd68069cb252e94c18ba1e5eb5af1cd/module/zfs/arc.c#L1625 as that would break encryption MAC validation.

Possibly this needs to be a feature tracking the TXG it's activated on to behave differently depending on the age of the data.

man/man5/zfs-module-parameters.5

module/zfs/zio_compress.c

PrivatePuffin · 2019-12-25T20:08:03Z

@gmelikov Thanks for pointing me here.
I can at least confirm that 12% compressible writes still don't get compressed as today on MASTER, so this is not just an issue with 0.7.13.

Good work on this, LGTM, this PR deserves some more love! 👍

edit
To be more clear:
This issue isn't just with extremely large record sizes, with 1M record sizes this can already contribute to significant savings!

module/zfs/zio_compress.c

ahrens · 2020-04-14T15:42:29Z

I would prefer to eliminate the module parameter, and instead "use sector size of this pool as threshold". I.e. always make the threshold 1 sector (spa_max_ashift).

gmelikov · 2020-04-14T17:36:26Z

@ahrens hmm, I agree with you, we don't need to use larger threshold here with lz4 and future zstd.

Will get back to this PR in a week.

codecov-io · 2020-04-18T02:07:28Z

Codecov Report

Merging #9416 into master will decrease coverage by 0.00%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master    #9416      +/-   ##
==========================================
- Coverage   79.38%   79.38%   -0.01%     
==========================================
  Files         388      388              
  Lines      123392   123396       +4     
==========================================
  Hits        97953    97953              
- Misses      25439    25443       +4

Flag	Coverage Δ
#kernel	`79.83% <87.50%> (-0.10%)`	⬇️
#user	`66.07% <75.00%> (+0.56%)`	⬆️

Impacted Files	Coverage Δ
module/zfs/arc.c	`85.65% <100.00%> (+0.44%)`	⬆️
module/zfs/zio.c	`88.76% <100.00%> (+0.25%)`	⬆️
module/zfs/zio_compress.c	`92.85% <100.00%> (+0.54%)`	⬆️
module/os/linux/spl/spl-zlib.c	`55.35% <0.00%> (-28.58%)`	⬇️
module/os/linux/spl/spl-kmem-cache.c	`75.58% <0.00%> (-8.14%)`	⬇️
module/zfs/dsl_synctask.c	`92.40% <0.00%> (-2.54%)`	⬇️
module/zfs/zil.c	`91.25% <0.00%> (-1.87%)`	⬇️
module/zfs/vdev_indirect.c	`73.83% <0.00%> (-0.84%)`	⬇️
module/os/linux/zfs/vdev_disk.c	`83.27% <0.00%> (-0.73%)`	⬇️
module/zfs/vdev_trim.c	`95.66% <0.00%> (-0.73%)`	⬇️
... and 51 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a7929f3...f7523c3. Read the comment docs.

gmelikov · 2020-04-18T11:22:36Z

Updated,

I've removed module parameter and use max_ashift of pool;
tested interoperability with old code, manually imported pool with new code, created data with 89% effective compression, imported it with old code, scrub/read/write is ok, rewrote it, imported with new code, scrub/read/write ok;
nightly ztest is almost all green, I've got just one ASSERT at ztest.c:6050:ztest_scrub()/home/debian/zfs/cmd/ztest/.libs/ztest(+0x9444)[0x564c51a51444], but looks like it's not related to this patch
I've found only one place in code where we recompress data to checksum it - https://github.com/openzfs/zfs/pull/9416/files#diff-163982413ed6b6861e08633c89a77395R1759 , so looks like encrypted dataset and ARC without compression may trigger a checksum error on pool with 89+% compressed data on old code, @tcaputi could you please help me here and prove it or give some information. I can't reproduce this case, looks like I'm wrong;
I looked through all zio_compress_data() usages and can conclude that if data wasn't compressed - it will be saved as ZIO_COMPRESS_OFF, so if we really need to recompress data without ashift info and check anything - we can just compress it with zero threshold;
Old code (unless of uncompressed ARC and encryption case) doesn't recompress data for checksums, so if that case will be resolved - we can use this patch without feature flag!

tcaputi · 2020-05-07T10:11:52Z

@gmelikov without looking at it too closely i think i might know whats going on. I think there are 2 things as play which are combining to result in the ARC's "recompress before checksumming" to work ok, but I think it is a pretty brittle dependency.

For some quick background, we need to store data in the L2ARC exactly the same as it is on disk in order for the decryption to work. This is why we do the re-compression here in the ARC layer.

When the data gets recompressed, the ARC uses the compression algorithm that it stored in the ARC header. This info comes from the bp when the block first enters the ARC. This patch doesn't change the way compression is done (AFAICT), it just changes the threshold at which we decide the compression is "worth it". However, no matter what that threshold is, if ZFS decides not to compress the block it sets the compression algorithm to ZIO_COMPRESS_OFF. So the encryption code will only recompress it if it was actually compressed. This code does NOT have the same check for whether or not the compression was worth it. It simply relies on the value from the bp, so it ends up doing the right thing in all these cases.

That being said, this discussion is making me wonder if QAT compression support breaks L2ARC with encryption + no compression. I suspect it does, but that is a very niche bug with very minor repercussions in that case (the data simply fails to read from the L2ARC) and checks the main pool instead. Probably a problem for another time.

Anyway, hope that helps.

gmelikov · 2020-05-07T10:33:24Z

@tcaputi Thank you for insight, you're right, if it was compressed - we can use zero threshold here too, so I was wrong here (how glad I am)!

So, this patch is ready to review, I've rebased it.

gmelikov · 2020-05-07T12:59:58Z

module/zfs/arc.c

@@ -1755,7 +1756,7 @@ arc_hdr_authenticate(arc_buf_hdr_t *hdr, spa_t *spa, uint64_t dsobj)
 		abd_take_ownership_of_buf(abd, B_TRUE);

 		csize = zio_compress_data(HDR_GET_COMPRESS(hdr),
-		    hdr->b_l1hdr.b_pabd, tmpbuf, lsize);


Nope, steps to get problem here:

Use code from this PR

get 89+% compressed encrypted block

L2ARC attached to pool

Load pool with pre-PR code

zio_compress_data will not compress this block again here due to old 12.5% threshold inside it

Heck, because it's the problem - I'll add a read-only feature flag only to make it safe for this case. In future zio_compress_data will have threshold argument so we won't get in this probem again.

I see, if you move the pool to a system with older software, it doesn't correctly write to the L2ARC, because even though the block was compressed, the old L2ARC-writing code will not compress it if it doesn't save at least 1/8th of the space. This only applies if the old system has disabled compressed ARC, right?

Introducing a readonly-compatible feature flag is a bit unfortunate. It would mean maintaining the old 1/8th code forever. I wonder how common it is to disable compressed ARC, nowadays? It might be reasonable to say that if you disable compressed ARC, and use L2ARC, and bring a newer pool back to old bits, then some of the data won't be able to "stick" in the L2ARC. I mean, we previously considered (but didn't implement) saying that disabling compressed ARC makes L2ARC not work with compressed blocks at all (i.e. removing the recompress-on-L2ARC-write code entirely).

@ahrens yes, you've got the point. The funny thing is that I can't reproduce problems in such theoretic scenario, l2arc works well under manual tests with data from new code, but with old module and disabled ARC compression.

There is a "best effort" thing near arc_hdr_authenticate function, maybe it's the case?

I've tried to add a feature flag, and code began to transform from beautiful and easy to read info ugly if-if monster. I don't want to do this :(

If it's already a problematic and rare case (compressed l2arc with encryption on old code with pool and data from new code), and we are ready to go with this - I propose to merge it as is without a feature flag, PR is ready for it. I can add this comment somewhere (I'd be glad if someone gives me exact place for this).

adamdmoss · 2020-05-08T22:14:01Z

(as an aside, the comments here talk about a >1MB recordsize a few times - I thought 1MB was the upper limit?)

gmelikov · 2020-05-10T21:07:28Z

@adamdmoss by default it is, but technically 16MB is an upper limit now, you may just change this module parameter and voila https://github.com/openzfs/zfs/wiki/ZFS-on-Linux-Module-Parameters#zfs_max_recordsize

It's good if you have really large block workload with rare read.

tonyhutter

Sorry this fell off our radar - approved

gmelikov · 2024-08-20T12:37:50Z

Don't see CI fails related to changes, previous runs were successfull, rebased up to date.

behlendorf · 2024-08-20T22:24:06Z

@gmelikov the Fedora build failures do look like they're introduced by this PR.

gmelikov · 2024-08-29T18:16:16Z

@behlendorf not sure what was with fedora runner, after rebase I've got 2 green runs.

But I had flaky tests (they've failed only once):

import_devices_missing
redundancy_draid_spare1 (saw in some other PRs too)

Now default compression is lz4, which can stop compression process by itself on incompressible data. If there are additional size checks - we will only make our compressratio worse. New usable compression thresholds are: - less than BPE_PAYLOAD_SIZE (embedded_data feature); - at least one saved sector. Old 12.5% threshold is left to minimize affect on existing user expectations of CPU utilization. If data wasn't compressed - it will be saved as ZIO_COMPRESS_OFF, so if we really need to recompress data without ashift info and check anything - we can just compress it with zero threshold. So, we don't need a new feature flag here! Signed-off-by: George Melikov <[email protected]>

ZLE compressor needs additional bytes to process d_len argument efficiently. Don't use BPE_PAYLOAD_SIZE as d_len with it before we rework zle compressor somehow. Signed-off-by: George Melikov <[email protected]>

On compression we could be more explicit here for cases where we can not recompress the data. Co-authored-by: Alexander Motin <[email protected]> Signed-off-by: George Melikov <[email protected]>

behlendorf

Thanks for updating this. Looks good, the CI failures were unrelated.

ZLE compressor needs additional bytes to process d_len argument efficiently. Don't use BPE_PAYLOAD_SIZE as d_len with it before we rework zle compressor somehow. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: George Melikov <[email protected]> Closes #9416

On compression we could be more explicit here for cases where we can not recompress the data. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Co-authored-by: Alexander Motin <[email protected]> Signed-off-by: George Melikov <[email protected]> Closes #9416

Now default compression is lz4, which can stop compression process by itself on incompressible data. If there are additional size checks - we will only make our compressratio worse. New usable compression thresholds are: - less than BPE_PAYLOAD_SIZE (embedded_data feature); - at least one saved sector. Old 12.5% threshold is left to minimize affect on existing user expectations of CPU utilization. If data wasn't compressed - it will be saved as ZIO_COMPRESS_OFF, so if we really need to recompress data without ashift info and check anything - we can just compress it with zero threshold. So, we don't need a new feature flag here! Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: George Melikov <[email protected]> Closes openzfs#9416

ZLE compressor needs additional bytes to process d_len argument efficiently. Don't use BPE_PAYLOAD_SIZE as d_len with it before we rework zle compressor somehow. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: George Melikov <[email protected]> Closes openzfs#9416

On compression we could be more explicit here for cases where we can not recompress the data. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Co-authored-by: Alexander Motin <[email protected]> Signed-off-by: George Melikov <[email protected]> Closes openzfs#9416

Now default compression is lz4, which can stop compression process by itself on incompressible data. If there are additional size checks - we will only make our compressratio worse. New usable compression thresholds are: - less than BPE_PAYLOAD_SIZE (embedded_data feature); - at least one saved sector. Old 12.5% threshold is left to minimize affect on existing user expectations of CPU utilization. If data wasn't compressed - it will be saved as ZIO_COMPRESS_OFF, so if we really need to recompress data without ashift info and check anything - we can just compress it with zero threshold. So, we don't need a new feature flag here! Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: George Melikov <[email protected]> Closes openzfs#9416

ZLE compressor needs additional bytes to process d_len argument efficiently. Don't use BPE_PAYLOAD_SIZE as d_len with it before we rework zle compressor somehow. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: George Melikov <[email protected]> Closes openzfs#9416

On compression we could be more explicit here for cases where we can not recompress the data. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Co-authored-by: Alexander Motin <[email protected]> Signed-off-by: George Melikov <[email protected]> Closes openzfs#9416

Now default compression is lz4, which can stop compression process by itself on incompressible data. If there are additional size checks - we will only make our compressratio worse. New usable compression thresholds are: - less than BPE_PAYLOAD_SIZE (embedded_data feature); - at least one saved sector. Old 12.5% threshold is left to minimize affect on existing user expectations of CPU utilization. If data wasn't compressed - it will be saved as ZIO_COMPRESS_OFF, so if we really need to recompress data without ashift info and check anything - we can just compress it with zero threshold. So, we don't need a new feature flag here! Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: George Melikov <[email protected]> Closes openzfs#9416

ZLE compressor needs additional bytes to process d_len argument efficiently. Don't use BPE_PAYLOAD_SIZE as d_len with it before we rework zle compressor somehow. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: George Melikov <[email protected]> Closes openzfs#9416

On compression we could be more explicit here for cases where we can not recompress the data. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Co-authored-by: Alexander Motin <[email protected]> Signed-off-by: George Melikov <[email protected]> Closes openzfs#9416

gmelikov mentioned this pull request Oct 5, 2019

WIP: zio_compress: remove compressratio threshold #9311

Closed

12 tasks

gmelikov force-pushed the compress_ng branch from 12c2937 to 6e53663 Compare October 6, 2019 09:38

GregorKopka suggested changes Oct 6, 2019

View reviewed changes

man/man5/zfs-module-parameters.5 Outdated Show resolved Hide resolved

module/zfs/zio_compress.c Outdated Show resolved Hide resolved

module/zfs/zio_compress.c Outdated Show resolved Hide resolved

behlendorf added the Status: Code Review Needed Ready for review and testing label Oct 9, 2019

gmelikov force-pushed the compress_ng branch 2 times, most recently from 10488d9 to 39d1434 Compare October 22, 2019 08:06

gmelikov added the Status: Work in Progress Not yet ready for general review label Nov 20, 2019

gmelikov mentioned this pull request Dec 23, 2019

Introduce ZSTD compression to ZFS #9735

Closed

12 tasks

gmelikov mentioned this pull request Apr 14, 2020

Compression not compressing files #10201

Closed

gmelikov changed the title ~~zio_compress: use min sector size as threshold~~ WIP: zio_compress: use min sector size as threshold Apr 14, 2020

ahrens reviewed Apr 14, 2020

View reviewed changes

module/zfs/zio_compress.c Outdated Show resolved Hide resolved

gmelikov force-pushed the compress_ng branch 2 times, most recently from c147e6d to f7523c3 Compare April 17, 2020 19:41

gmelikov changed the title ~~WIP: zio_compress: use min sector size as threshold~~ WIP: zio_compress: use sector size as threshold Apr 18, 2020

gmelikov force-pushed the compress_ng branch from f7523c3 to e94ca35 Compare May 7, 2020 10:33

gmelikov removed the Status: Work in Progress Not yet ready for general review label May 7, 2020

gmelikov changed the title ~~WIP: zio_compress: use sector size as threshold~~ zio_compress: use sector size as threshold May 7, 2020

gmelikov commented May 7, 2020

View reviewed changes

gmelikov added the Status: Work in Progress Not yet ready for general review label May 7, 2020

gmelikov mentioned this pull request May 18, 2020

Introduce ZSTD compression to ZFS #10278

Closed

17 tasks

tonyhutter approved these changes Aug 19, 2024

View reviewed changes

gmelikov force-pushed the compress_ng branch from e9338e4 to fa6d513 Compare August 19, 2024 17:41

gmelikov force-pushed the compress_ng branch 4 times, most recently from 9b3d7f7 to 5737823 Compare August 29, 2024 10:20

gmelikov and others added 3 commits September 19, 2024 10:40

ZLE compression: don't use BPE_PAYLOAD_SIZE

caa6588

ZLE compressor needs additional bytes to process d_len argument efficiently. Don't use BPE_PAYLOAD_SIZE as d_len with it before we rework zle compressor somehow. Signed-off-by: George Melikov <[email protected]>

arc_hdr_authenticate: make explicit error

c12ed0d

On compression we could be more explicit here for cases where we can not recompress the data. Co-authored-by: Alexander Motin <[email protected]> Signed-off-by: George Melikov <[email protected]>

gmelikov force-pushed the compress_ng branch from 5737823 to c12ed0d Compare September 19, 2024 07:40

amotin approved these changes Sep 19, 2024

View reviewed changes

behlendorf approved these changes Sep 20, 2024

View reviewed changes

behlendorf added Status: Accepted Ready to integrate (reviewed, tested) and removed Status: Code Review Needed Ready for review and testing labels Sep 20, 2024

behlendorf closed this in 522f262 Sep 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

zio_compress: introduce max_size threshold #9416

zio_compress: introduce max_size threshold #9416

gmelikov commented Oct 5, 2019 •

edited

Loading

codecov bot commented Oct 6, 2019 •

edited

Loading

GregorKopka left a comment

PrivatePuffin commented Dec 25, 2019 •

edited

Loading

ahrens commented Apr 14, 2020

gmelikov commented Apr 14, 2020

codecov-io commented Apr 18, 2020 •

edited

Loading

gmelikov commented Apr 18, 2020

tcaputi commented May 7, 2020

gmelikov commented May 7, 2020

gmelikov May 7, 2020

ahrens May 8, 2020

gmelikov May 10, 2020 •

edited

Loading

adamdmoss commented May 8, 2020

gmelikov commented May 10, 2020

tonyhutter left a comment

gmelikov commented Aug 20, 2024

behlendorf commented Aug 20, 2024

gmelikov commented Aug 29, 2024

behlendorf left a comment

zio_compress: introduce max_size threshold #9416

zio_compress: introduce max_size threshold #9416

Conversation

gmelikov commented Oct 5, 2019 • edited Loading

Motivation and Context

Description

How Has This Been Tested?

Types of changes

Checklist:

codecov bot commented Oct 6, 2019 • edited Loading

Codecov Report

GregorKopka left a comment

Choose a reason for hiding this comment

PrivatePuffin commented Dec 25, 2019 • edited Loading

ahrens commented Apr 14, 2020

gmelikov commented Apr 14, 2020

codecov-io commented Apr 18, 2020 • edited Loading

Codecov Report

gmelikov commented Apr 18, 2020

tcaputi commented May 7, 2020

gmelikov commented May 7, 2020

gmelikov May 7, 2020

Choose a reason for hiding this comment

ahrens May 8, 2020

Choose a reason for hiding this comment

gmelikov May 10, 2020 • edited Loading

Choose a reason for hiding this comment

adamdmoss commented May 8, 2020

gmelikov commented May 10, 2020

tonyhutter left a comment

Choose a reason for hiding this comment

gmelikov commented Aug 20, 2024

behlendorf commented Aug 20, 2024

gmelikov commented Aug 29, 2024

behlendorf left a comment

Choose a reason for hiding this comment

gmelikov commented Oct 5, 2019 •

edited

Loading

codecov bot commented Oct 6, 2019 •

edited

Loading

PrivatePuffin commented Dec 25, 2019 •

edited

Loading

codecov-io commented Apr 18, 2020 •

edited

Loading

gmelikov May 10, 2020 •

edited

Loading