-
Notifications
You must be signed in to change notification settings - Fork 185
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix computation of GZip filter overhead. #5296
Conversation
@@ -0,0 +1,86 @@ | |||
#include <climits> | |||
|
|||
#include <tiledb/tiledb> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add a comment somewhere that explains that this adds a var sized attribute encoded with the GZIP filter and that we write data that will result in a 0 sized var tile.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that will result in a 0 sized var tile
On second look this is quite non-apparent. How about I remove this regression test and write a targeted unit test for compressing a zero-sized buffer with the GZip filter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's keep the regression test in place and just add a quick comment.
But you are right, we should probably add a separate unit test as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the test's title, hope that's sufficient.
} | ||
|
||
}; // namespace sm | ||
} // namespace sm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's take the opportunity to change this to tiledb::sm above instead of having the two seperated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
out_buf_dec_storage.data(), out_buf_dec_storage.size()}; | ||
|
||
GZip::decompress(&in_buf_dec, &out_buf_dec); | ||
// Check that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are we checking here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The root cause of the issue, that an empty buffer is compressed and decompressed correctly.
SC-48428
The overhead of the DEFLATE algorithm is stated to be$5(\lfloor\frac{n}{16383}\rfloor+1)$ , but we computed it as $5(\lceil\frac{n}{16383}\rceil)$ , which produces a wrong result on zero. This caused incomplete data to be written during compression, which led to failures during decompression
This PR updates
GZip::overhead
to use the first definition, and also updatesGZip::compress
to fail when the output buffer is not large enough.Validated by a regression test written by @davisp, which failed before this change and succeeds after it.
TYPE: BUG
DESC: Fixed GZip compression of empty data.