Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zarr datasets info lack compression data #186

Open
h-mayorquin opened this issue Apr 26, 2024 · 3 comments
Open

Zarr datasets info lack compression data #186

h-mayorquin opened this issue Apr 26, 2024 · 3 comments
Assignees
Labels
category: bug errors in the code or code behavior priority: low alternative solution already working and/or relevant to only specific user(s)

Comments

@h-mayorquin
Copy link

h-mayorquin commented Apr 26, 2024

So this:

import zarr
from numcodecs import Blosc

# Create a Zarr array
data = zarr.zeros((1000, 1000), chunks=(10, 10), dtype='float32')

# Set compression options
compressor = Blosc(cname='zstd', clevel=3, shuffle=Blosc.SHUFFLE)

# Create a DirectoryStore
store = zarr.DirectoryStore("./zarr_test.zarr", "w")

# Create a Zarr group and store the array
group = zarr.group(store)
group.create_dataset('data', data=data, compressor=compressor)

group_reloaded = zarr.open(path, mode='r')
group_reloaded["data"].info

Contains compression data (see the end of the image):

image

But if I crate data through the package and then re-read it:

from numcodecs import Blosc
from hdmf_zarr import ZarrDataIO
import numpy as np
from pynwb.testing.mock.file import mock_NWBFile
from hdmf_zarr.nwb import NWBZarrIO
import os
from numcodecs import Blosc, Delta

from pynwb.testing.mock.ecephys import mock_ElectricalSeries
filters = [Delta(dtype="i4")]

data_with_zarr_data_io = ZarrDataIO(
    data=np.arange(100000000, dtype='i4').reshape(10000, 10000),
    chunks=(1000, 1000),
    compressor=Blosc(cname='zstd', clevel=3, shuffle=Blosc.SHUFFLE),
    # filters=filters,
)

timestamps = np.arange(10000)

data = data_with_zarr_data_io

nwbfile = mock_NWBFile()
electrical_series_name = "ElectricalSeries"
rate = None
electrical_series = mock_ElectricalSeries(name=electrical_series_name, data=data, nwbfile=nwbfile, timestamps=timestamps, rate=None)


path = "zarr_test.nwb.zarr"
absolute_path = os.path.abspath(path)
with NWBZarrIO(path=path, mode="w") as io:
    io.write(nwbfile)
 
from hdmf_zarr.nwb import NWBZarrIO

path = "zarr_test.nwb.zarr"

io = NWBZarrIO(path=path, mode="r")
nwbfile = io.read()
nwbfile


electrical_series_name = "ElectricalSeries"
electrical_series = nwbfile.acquisition[electrical_series_name]
electrical_series.data.info

Then that type of information is somehow not available:

image

I have no idea why this is the case.

@h-mayorquin h-mayorquin changed the title Zarr datasets info created lack compression data Zarr datasets info lack compression data Apr 26, 2024
@mavaylon1
Copy link
Contributor

@h-mayorquin I'm looking at the pictures and can't see the issue. The type for both are present in the images.

@h-mayorquin
Copy link
Author

Sorry @mavaylon1 , I was not precise enough by saying "info". What is lacking is the "Storeage Ratio" and the "No. Bytes Stored". Both are present when I saved with zarr but not when I saved through hdmf-zarr.

@mavaylon1
Copy link
Contributor

I see
Do you want to take this on since it is related to improving html representation of datasets? I could look into it, but it won't be till sometime next week.

@oruebel oruebel added category: bug errors in the code or code behavior priority: low alternative solution already working and/or relevant to only specific user(s) labels May 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: bug errors in the code or code behavior priority: low alternative solution already working and/or relevant to only specific user(s)
Projects
None yet
Development

No branches or pull requests

3 participants