Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Small patch to html repr in #1100 #1201

Merged
merged 4 commits into from
Nov 11, 2024

Conversation

h-mayorquin
Copy link
Contributor

Motivation

There are some edge cases that #1100 did not take into account. I am submitting this patch for discussion, happy to close if a better solution comes.

The problem is that the code here assumes that things that have an IO are hdf5 datasets:

# get info from hdf5 dataset
compressed_size = dataset.id.get_storage_size()
if hasattr(dataset, "nbytes"): # TODO: Remove this after h5py minimal version is larger than 3.0
uncompressed_size = dataset.nbytes
else:
uncompressed_size = dataset.size * dataset.dtype.itemsize
compression_ratio = uncompressed_size / compressed_size if compressed_size != 0 else "undefined"
hdf5_info_dict = {"Chunk shape": dataset.chunks,
"Compression": dataset.compression,
"Compression opts": dataset.compression_opts,
"Compression ratio": compression_ratio}

hdmf/src/hdmf/container.py

Lines 754 to 763 in be602e5

"""Generates HTML for array data"""
read_io = self.get_read_io() # if the Container was read from file, get IO object
if read_io is not None:
repr_html = read_io.generate_dataset_html(array)
else:
array_info_dict = get_basic_array_info(array)
repr_html = generate_array_html_repr(array_info_dict, array, "NumPy array")
return f'<div style="margin-left: {level * 20}px;" class="container-fields">{repr_html}</div>'

But sometimes, this assumption is false. For example, the starting frames of an ImageSeries are a numpy object even after they are written. Maybe there are more such cases?

How to test the behavior?

The following code generates an error when using dev.

from pynwb.testing.mock.file import mock_NWBFile
from pynwb.image import ImageSeries

nwbfile = mock_NWBFile()

series = ImageSeries(name="ImageSeries", description="", external_file=["test"], rate=0.1)
nwbfile.add_acquisition(series)


from pynwb import NWBHDF5IO

nwbfile_path = "./test_nwb.nwb"
with NWBHDF5IO(nwbfile_path, 'w') as io:
    io.write(nwbfile)
    
io = NWBHDF5IO(nwbfile_path, 'r')
nwbfile_read = io.read()
nwbfile_read

AttributeError: 'numpy.ndarray' object has no attribute 'id'

Checklist

  • Did you update CHANGELOG.md with your changes?
  • Does the PR clearly describe the problem and the solution?
  • Have you reviewed our Contributing Guide?
  • Does the PR use "Fix #XXX" notation to tell GitHub to close the relevant issue numbered XXX when the PR is merged?

Copy link

codecov bot commented Nov 6, 2024

Codecov Report

Attention: Patch coverage is 70.00000% with 3 lines in your changes missing coverage. Please review.

Project coverage is 89.12%. Comparing base (0b65dc6) to head (35915d1).
Report is 1 commits behind head on dev.

Files with missing lines Patch % Lines
src/hdmf/backends/hdf5/h5tools.py 66.66% 1 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##              dev    #1201      +/-   ##
==========================================
- Coverage   89.12%   89.12%   -0.01%     
==========================================
  Files          45       45              
  Lines        9944     9945       +1     
  Branches     2825     2826       +1     
==========================================
  Hits         8863     8863              
  Misses        762      762              
- Partials      319      320       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@stephprince
Copy link
Contributor

@rly this was the case we discussed in person yesterday, do you know if there any other edge cases we should consider when a Container would have a read_io object but the array would not be an hdf5 dataset?

Copy link
Contributor

@stephprince stephprince left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@h-mayorquin could you add a comment to this line indicating that even if the object has a read_io the array may still be a numpy array, so generate_dataset_html will be called even if it is not an HDF5/Zarr dataset.

repr_html = read_io.generate_dataset_html(array)

At some point we may want to add a HDMFIO.is_dataset method, so that generate_dataset_html is not unnecessarily called on a numpy array, but I think a comment is sufficient for now. Other than that, Iooks good to me!

@h-mayorquin
Copy link
Contributor Author

@stephprince
Done
But I think I wrote a more robust solution here #1206.
I branched from this one though, so this can be merged first and then we can discuss the other in more detail.

@stephprince stephprince marked this pull request as ready for review November 11, 2024 17:51
@stephprince
Copy link
Contributor

Sounds good, let's merge this for now and I will take a look at the other solution you proposed later.

Since this is a small patch to #1100 I think it is ok to not include a changelog update.

@stephprince stephprince merged commit ea6504f into hdmf-dev:dev Nov 11, 2024
29 checks passed
@h-mayorquin h-mayorquin deleted the patch_to_html_repr branch November 11, 2024 18:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants