Small patch to html repr in #1100 #1201

h-mayorquin · 2024-11-06T01:05:59Z

Motivation

There are some edge cases that #1100 did not take into account. I am submitting this patch for discussion, happy to close if a better solution comes.

The problem is that the code here assumes that things that have an IO are hdf5 datasets:

hdmf/src/hdmf/backends/hdf5/h5tools.py

Lines 1612 to 1623 in be602e5

    
           # get info from hdf5 dataset 
        
           compressed_size = dataset.id.get_storage_size() 
        
           if hasattr(dataset, "nbytes"):  # TODO: Remove this after h5py minimal version is larger than 3.0 
        
               uncompressed_size = dataset.nbytes 
        
           else: 
        
               uncompressed_size = dataset.size * dataset.dtype.itemsize 
        
           compression_ratio = uncompressed_size / compressed_size if compressed_size != 0 else "undefined" 
        
           hdf5_info_dict = {"Chunk shape": dataset.chunks, 
        
                             "Compression": dataset.compression, 
        
                             "Compression opts": dataset.compression_opts, 
        
                             "Compression ratio": compression_ratio}

hdmf/src/hdmf/container.py

Lines 754 to 763 in be602e5

    
           """Generates HTML for array data""" 
        
           read_io = self.get_read_io()  # if the Container was read from file, get IO object 
        
           if read_io is not None: 
        
               repr_html = read_io.generate_dataset_html(array) 
        
           else: 
        
               array_info_dict = get_basic_array_info(array) 
        
               repr_html = generate_array_html_repr(array_info_dict, array, "NumPy array") 
        
           return f'<div style="margin-left: {level * 20}px;" class="container-fields">{repr_html}</div>'

But sometimes, this assumption is false. For example, the starting frames of an ImageSeries are a numpy object even after they are written. Maybe there are more such cases?

How to test the behavior?

The following code generates an error when using dev.

from pynwb.testing.mock.file import mock_NWBFile
from pynwb.image import ImageSeries

nwbfile = mock_NWBFile()

series = ImageSeries(name="ImageSeries", description="", external_file=["test"], rate=0.1)
nwbfile.add_acquisition(series)


from pynwb import NWBHDF5IO

nwbfile_path = "./test_nwb.nwb"
with NWBHDF5IO(nwbfile_path, 'w') as io:
    io.write(nwbfile)
    
io = NWBHDF5IO(nwbfile_path, 'r')
nwbfile_read = io.read()
nwbfile_read

AttributeError: 'numpy.ndarray' object has no attribute 'id'

Checklist

Did you update CHANGELOG.md with your changes?
Does the PR clearly describe the problem and the solution?
Have you reviewed our Contributing Guide?
Does the PR use "Fix #XXX" notation to tell GitHub to close the relevant issue numbered XXX when the PR is merged?

for more information, see https://pre-commit.ci

codecov · 2024-11-06T01:07:38Z

Codecov Report

Attention: Patch coverage is 70.00000% with 3 lines in your changes missing coverage. Please review.

Project coverage is 89.12%. Comparing base (0b65dc6) to head (35915d1).
Report is 1 commits behind head on dev.

Files with missing lines	Patch %	Lines
src/hdmf/backends/hdf5/h5tools.py	66.66%	1 Missing and 2 partials ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##              dev    #1201      +/-   ##
==========================================
- Coverage   89.12%   89.12%   -0.01%     
==========================================
  Files          45       45              
  Lines        9944     9945       +1     
  Branches     2825     2826       +1     
==========================================
  Hits         8863     8863              
  Misses        762      762              
- Partials      319      320       +1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

stephprince · 2024-11-06T17:43:30Z

@rly this was the case we discussed in person yesterday, do you know if there any other edge cases we should consider when a Container would have a read_io object but the array would not be an hdf5 dataset?

stephprince

@h-mayorquin could you add a comment to this line indicating that even if the object has a read_io the array may still be a numpy array, so generate_dataset_html will be called even if it is not an HDF5/Zarr dataset.

hdmf/src/hdmf/container.py

Line 758 in be602e5

repr_html = read_io.generate_dataset_html(array)

At some point we may want to add a HDMFIO.is_dataset method, so that generate_dataset_html is not unnecessarily called on a numpy array, but I think a comment is sufficient for now. Other than that, Iooks good to me!

h-mayorquin · 2024-11-09T18:06:47Z

@stephprince
Done
But I think I wrote a more robust solution here #1206.
I branched from this one though, so this can be merged first and then we can discuss the other in more detail.

stephprince · 2024-11-11T17:52:49Z

Sounds good, let's merge this for now and I will take a look at the other solution you proposed later.

Since this is a small patch to #1100 I think it is ok to not include a changelog update.

h-mayorquin and others added 2 commits November 5, 2024 18:47

small patch to html repr

abc0efe

[pre-commit.ci] auto fixes from pre-commit.com hooks

1fc5879

for more information, see https://pre-commit.ci

stephprince reviewed Nov 8, 2024

View reviewed changes

stephprince and others added 2 commits November 8, 2024 14:16

Merge branch 'dev' into patch_to_html_repr

b84008b

comment request

35915d1

h-mayorquin mentioned this pull request Nov 9, 2024

Route array representation for HTML #1206

Merged

stephprince marked this pull request as ready for review November 11, 2024 17:51

stephprince approved these changes Nov 11, 2024

View reviewed changes

stephprince merged commit ea6504f into hdmf-dev:dev Nov 11, 2024
29 checks passed

h-mayorquin deleted the patch_to_html_repr branch November 11, 2024 18:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Small patch to html repr in #1100 #1201

Small patch to html repr in #1100 #1201

h-mayorquin commented Nov 6, 2024

codecov bot commented Nov 6, 2024 •

edited

Loading

stephprince commented Nov 6, 2024

stephprince left a comment •

edited

Loading

h-mayorquin commented Nov 9, 2024

stephprince commented Nov 11, 2024

	# get info from hdf5 dataset
	compressed_size = dataset.id.get_storage_size()
	if hasattr(dataset, "nbytes"): # TODO: Remove this after h5py minimal version is larger than 3.0
	uncompressed_size = dataset.nbytes
	else:
	uncompressed_size = dataset.size * dataset.dtype.itemsize
	compression_ratio = uncompressed_size / compressed_size if compressed_size != 0 else "undefined"

	hdf5_info_dict = {"Chunk shape": dataset.chunks,
	"Compression": dataset.compression,
	"Compression opts": dataset.compression_opts,
	"Compression ratio": compression_ratio}

	"""Generates HTML for array data"""

	read_io = self.get_read_io() # if the Container was read from file, get IO object
	if read_io is not None:
	repr_html = read_io.generate_dataset_html(array)
	else:
	array_info_dict = get_basic_array_info(array)
	repr_html = generate_array_html_repr(array_info_dict, array, "NumPy array")

	return f'<div style="margin-left: {level * 20}px;" class="container-fields">{repr_html}</div>'

Small patch to html repr in #1100 #1201

Small patch to html repr in #1100 #1201

Conversation

h-mayorquin commented Nov 6, 2024

Motivation

How to test the behavior?

Checklist

codecov bot commented Nov 6, 2024 • edited Loading

Codecov Report

stephprince commented Nov 6, 2024

stephprince left a comment • edited Loading

Choose a reason for hiding this comment

h-mayorquin commented Nov 9, 2024

stephprince commented Nov 11, 2024

codecov bot commented Nov 6, 2024 •

edited

Loading

stephprince left a comment •

edited

Loading