You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I suspect that without too much difficulty, theorems along the lines of the following could be proven mathematically for the algorithm and debug_asserted for the implementation:
When an iteration of Zopfli hasn't reduced the file size, subsequent iterations won't do so either.
If a file's uncompressed size is N bytes, the minimum compressed size will be found within cN + d iterations for some small constants c and d (probably c < 10 and d < 10).
It would be helpful to have these applied to limit the number of iterations for small blocks, which would help with fuzz testing (where a very large iteration count and a very small file can be properties of a corner case that needs to be tested, even if having them happen in production would indicate a wrong assumption), especially given cargo fuzz's bias toward very small Vec<u8>s.
The text was updated successfully, but these errors were encountered:
Your hypotheses sound sensible and useful to me, although I'm not sure right now on how to go about formally proving them. I'd need to understand the applied mathematics behind the algorithm more than I currently do to make a definitive statement.
Out of curiosity, do you happen to know of any good resources to learn how exactly Zopfli works? The Zopfli whitepaper can be summarized as "we made this compressor, tested it, and it turned out to work well in practice", which is not very helpful. On the other hand, the books and papers I've found on LZ77 and compression algorithms in general tend to be somewhat old and disconnected from the considerations and refinements made by state-of-the-art implementations like Zopfli.
While I don't know anything myself, I have noticed libdeflate's code is super well documented - could be helpful to read over it. I think it borrows a lot of concepts from zstd but the near-optimal algorithm likely has some similarities with zopfli.
I suspect that without too much difficulty, theorems along the lines of the following could be proven mathematically for the algorithm and
debug_assert
ed for the implementation:It would be helpful to have these applied to limit the number of iterations for small blocks, which would help with fuzz testing (where a very large iteration count and a very small file can be properties of a corner case that needs to be tested, even if having them happen in production would indicate a wrong assumption), especially given
cargo fuzz
's bias toward very smallVec<u8>
s.The text was updated successfully, but these errors were encountered: