Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compression/conversion of floats (for OME-Zarr) #116

Open
tischi opened this issue Dec 9, 2023 · 2 comments
Open

Compression/conversion of floats (for OME-Zarr) #116

tischi opened this issue Dec 9, 2023 · 2 comments
Labels
enhancement New feature or request

Comments

@tischi
Copy link

tischi commented Dec 9, 2023

@constantinpape

  1. which compression are you using for OME-Zarr ?
  2. could it be that this does not work well for floating point data?

Could we add an option to the addImage for a conversion to uint8 ?
I guess for this one would need an array conversion_min_max[ 2 ], which would then be used for linear conversion:

min = conversion_min_max[ 0 ]
max = conversion_min_max[ 1 ]
value_unit8 = 255 * ( value - min ) / ( max - mix )

I am asking, because I am dealing with a floating point dataset and I have a feeling that it is much slower to load than from a unit8 dataset that is of comparable size, chunking and dimensions. Of course they are not the same, so I am not sure.

Maybe I could first try to "manually" convert the float to unit8 and see if that indeed helps.

@tischi tischi added the enhancement New feature or request label Dec 9, 2023
@tischi
Copy link
Author

tischi commented Dec 9, 2023

I looked a bit into the data:

3255557 bytes in xray.ome.zarr/s0/16/16/16
774407 bytes in em.ome.zarr/s0/10/20/30

em is unit8 ( 1 byte ).

the x-ray chunk here is ~4.5 times larger in terms of bytes than the em chunk; which is roughly explained by the fact that it is float ( 4 bytes ).

chunk sizes are 96^3

774407 / 96^3 = 0.875 => compression of the uint8 em is not amazing (1.0 would be no compression), is it?

I have no experience, is it normal that EM data does not compress well?

@constantinpape
Copy link
Contributor

It should use the default compression for zarr-python, which I think is blosc + lz4. You can check on the details in the .zarray file.

In general: how well compression works depends on the distribution of intensity values in the data. EM has a fairly even distribution between min and max (typically [0, 255]). So yes, it is expected that it does not compress well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants