-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding zstd compression and decompression support for workload corpora #410
Conversation
…orpora Signed-off-by: beaioun <[email protected]>
@@ -344,6 +370,18 @@ def _do_decompress_manually_with_lib(target_directory, filename, compressed_file | |||
compressed_file.close() | |||
|
|||
|
|||
def _do_decompress_zstd(target_directory, filename): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of creating a separate function for this, is it possible to implement this through the _do_decompress_manually
as is done with other decompressors like bz2
? This would remove boilerplate code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I'll be working on that, thanks
@beaioun You are correct, there is no io.compress(). As of now, the only time compressing is used is in the metrics.py and corpus.py, but these call the libraries zlib and bz2 directly. Adding a separate function to io.py to compress like you did doesn't hurt. The issue I wrote should've clarified this but I can create a separate issue on this: it'd be good to include different compression options in corpus.py. I'll explain this further in a separate issue since this is out of scope for this PR! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just need to fix the pylint error in the CI Unittests and we should be good
Thank you @IanHoang for the clarification. I'll take a look at corpus.py and see if we can implement that. |
Thanks @IanHoang, will do |
@beaioun I think it's fine to go ahead and fix the CI unittests here and we can merge this in. We can open a separate issue for corpus.py and create a separate PR for it. This PR has a good amount of changes |
… If not I will fix from this point Signed-off-by: beaioun <[email protected]>
Hey @IanHoang , it's been a few days. I'll go ahead and commit the recent changes. Even though I am still left with this throughput throttled error unfixed, I feel like it has something to do with my PC setup. Let's see how the tests go and I will fix it from there. 4f37b4b |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this!
Description
This PR adds support for zstd compression and decompression of workload corpora
Issues Resolved
This PR is aiming to solve issue #385
Testing
REQUEST FOR HELP:
I need help understanding compression in the io module. I have not found where the io.compress function is referenced thus not really sure how the zstd compression will work in this case. But I included compress_zstd as a separate function to be called in actual use cases.
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.