Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support zlib compression #101

Merged
merged 1 commit into from
Dec 3, 2024
Merged

support zlib compression #101

merged 1 commit into from
Dec 3, 2024

Conversation

magland
Copy link
Collaborator

@magland magland commented Dec 3, 2024

This adds support for zlib compression when creating datasets, so you can do something like:

ds0 = g.create_dataset(
    f"spike_counts_ds_{ds_factor}",
    data=spike_counts_ds.astype(np.int32),
    chunks=(np.minimum(num_bins_per_chunk, num_ds_bins), num_units),
    compression='zlib'
)

This is important because zlib compression (unlike gzip and blosc) is deterministic (given the compression level). So then, if zlib compression is used, then two lindi.tar files that were created in the same way are going to be byte-equivalent. This is relevant for when I have a script that generates a file and stores in a content-addressable database (like kachery).

@magland magland requested a review from rly December 3, 2024 18:49
@rly
Copy link
Contributor

rly commented Dec 3, 2024

Note to self: Python gzip is not deterministic, as I had thought, because it adds a timestamp

@codecov-commenter
Copy link

Codecov Report

Attention: Patch coverage is 11.11111% with 8 lines in your changes missing coverage. Please review.

Project coverage is 81.46%. Comparing base (65491bf) to head (90725f8).
Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
...indi/LindiH5pyFile/writers/LindiH5pyGroupWriter.py 0.00% 7 Missing ⚠️
lindi/conversion/h5_filters_to_codecs.py 50.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #101      +/-   ##
==========================================
- Coverage   81.68%   81.46%   -0.23%     
==========================================
  Files          30       30              
  Lines        2774     2784      +10     
==========================================
+ Hits         2266     2268       +2     
- Misses        508      516       +8     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@magland magland merged commit 34769cd into main Dec 3, 2024
6 checks passed
@magland magland deleted the zlib-compression branch December 3, 2024 20:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants