Early stopping #17

Pr0methean · 2023-05-27T23:21:16Z

I suspect that without too much difficulty, theorems along the lines of the following could be proven mathematically for the algorithm and debug_asserted for the implementation:

When an iteration of Zopfli hasn't reduced the file size, subsequent iterations won't do so either.
If a file's uncompressed size is N bytes, the minimum compressed size will be found within cN + d iterations for some small constants c and d (probably c < 10 and d < 10).

It would be helpful to have these applied to limit the number of iterations for small blocks, which would help with fuzz testing (where a very large iteration count and a very small file can be properties of a corner case that needs to be tested, even if having them happen in production would indicate a wrong assumption), especially given cargo fuzz's bias toward very small Vec<u8>s.

The text was updated successfully, but these errors were encountered:

AlexTMjugador · 2023-05-28T17:50:50Z

Your hypotheses sound sensible and useful to me, although I'm not sure right now on how to go about formally proving them. I'd need to understand the applied mathematics behind the algorithm more than I currently do to make a definitive statement.

Out of curiosity, do you happen to know of any good resources to learn how exactly Zopfli works? The Zopfli whitepaper can be summarized as "we made this compressor, tested it, and it turned out to work well in practice", which is not very helpful. On the other hand, the books and papers I've found on LZ77 and compression algorithms in general tend to be somewhat old and disconnected from the considerations and refinements made by state-of-the-art implementations like Zopfli.

andrews05 · 2023-05-31T21:46:42Z

While I don't know anything myself, I have noticed libdeflate's code is super well documented - could be helpful to read over it. I think it borrows a lot of concepts from zstd but the near-optimal algorithm likely has some similarities with zopfli.

AlexTMjugador mentioned this issue Jun 24, 2023

Let user specify iterations without improvement #21

Merged

AlexTMjugador added the enhancement New feature or request label Aug 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Early stopping #17

Early stopping #17

Pr0methean commented May 27, 2023 •

edited

Loading

AlexTMjugador commented May 28, 2023

andrews05 commented May 31, 2023

Early stopping #17

Early stopping #17

Comments

Pr0methean commented May 27, 2023 • edited Loading

AlexTMjugador commented May 28, 2023

andrews05 commented May 31, 2023

Pr0methean commented May 27, 2023 •

edited

Loading