Add file create data appending #1163

t-b · 2020-01-29T23:35:37Z

Close #990.

rly · 2020-01-30T00:00:09Z

tests/unit/test_file.py

+            nwbfile = writer.read()
+
+            # added one more entry as opened read/write
+            self.assertEqual(len(nwbfile.file_create_date), 2)


Please also test the second round-trip, i.e., close the file and re-open it in read-mode and confirm that the change to file_create_date is still present. I am concerned that the file_create_date dataset is not chunked and therefore cannot grow, or the change is not saved for some reason.

@rly I've pushed something but I need to review that again tomorrow.

@rly You were right. The additional entry does not reach the file.

h5dump -A unittest_file_create_date.nwb | grep -A 10 file_create_date HDF5 "unittest_file_create_date.nwb" { GROUP "/" { ATTRIBUTE ".specloc" { DATATYPE H5T_REFERENCE { H5T_STD_REF_OBJECT } DATASPACE SCALAR DATA { (0): GROUP 6512 /specifications } } ATTRIBUTE "namespace" { DATATYPE H5T_STRING { -- DATASET "file_create_date" { DATATYPE H5T_STRING { STRSIZE H5T_VARIABLE; STRPAD H5T_STR_NULLTERM; CSET H5T_CSET_ASCII; CTYPE H5T_C_S1; } DATASPACE SIMPLE { ( 1 ) / ( 1 ) } } GROUP "general" { DATASET "institution" {

Questions:

How can I fix that?

How can I require a newer hdmf version to that the tests pass?

To fix that, the dataset has to be chunked. @ajtritt -- is there a way to chunk only the NWBFile.file_create_date dataset? I am also in favor of blanket chunking all datasets in NWB...

To use changes in a newer hdmf version, the changes must have been released on PyPI. The recent "mode" function addition isn't released yet, but we could do that this week if these issues are pressing.

A new hdmf would be nice!

How do I force the stored dataset to be chunked?

I tried

diff --git a/src/pynwb/io/file.py b/src/pynwb/io/file.py index 1ddeb310..2ec342d9 100644 --- a/src/pynwb/io/file.py +++ b/src/pynwb/io/file.py @@ -3,6 +3,7 @@ from hdmf.build import ObjectMapper from .. import register_map from ..file import NWBFile, Subject from ..core import ScratchData +from hdmf.backends.hdf5.h5_utils import H5DataIO @register_map(NWBFile) @@ -156,6 +157,10 @@ class NWBFileMap(ObjectMapper): dates = list(map(dateutil_parse, datestr)) return dates + @ObjectMapper.object_attr('file_create_date') + def file_create_date_obj_attr(self, container, manager): + return H5DataIO(container.file_create_date, chunks=True) + @ObjectMapper.constructor_arg('file_name') def name(self, builder, manager): return builder.name

but that does not work.

Not sure if I have the right solution for you, but a couple of thoughts:

I think it is important to expose this behavior explicitly to user. While doing this implicitly behind the scenes is convenient, it make the process intransparent.

We should try not to mix front-end and backend functionality, i.e, using the HDF5-specific H5DataIO in the ObjectMapper (or Container) is problematic as this will not translate to other backends.

This issue also has come up with DynamicTable at some point, because we wanted all columns of the table to be chunked so they can be extended. @rly @ajtritt was that issue solved and would that same strategy apply here?

Ultimately, I think the core issue is that we want specific datasets to be written in a resizable fashion (so they can grow). In the case of HDF5 that requires chunking but for other backends that may or may not be the case. In that vain, I think what we may need is a generic (backend-agnostic) way to provide write-hints, which in this case would say "make this dataset resizeable". I'm wondering whether we could add I/O hints on the builder for this and in the object-mapper a way to ask for I/O hints for fields. It would then be up to the backend to decide what to do with those I/O hints.

@oruebel It totally agree that a HDF5 specific solution is the wrong thing to do here. But up to now I don't have any solution at all.

I'm starting to work on this again.

@oruebel

I think it is important to expose this behavior explicitly to user. While doing this implicitly behind the scenes is convenient, it make the process intransparent.

What implicit part are you concerned about? The "making the dataset chunked" or "adding new entries in the file_create_dataset"? The latter is what nwb-schema says how file_create_dataset should be handled.

Ultimately, I think the core issue is that we want specific datasets to be written in a resizable fashion (so they can grow). In the case of HDF5 that requires chunking but for other backends that may or may not be the case. In that vain, I think what we may need is a generic (backend-agnostic) way to provide write-hints, which in this case would say "make this dataset resizeable". I'm wondering whether we could add I/O hints on the builder for this and in the object-mapper a way to ask for I/O hints for fields. It would then be up to the backend to decide what to do with those I/O hints.

Yes that would be required. Of course my above hack is a hack and can not be merged as is, but I first wanted to get something working and then make the solution generalizable. I just saw that hdmf.builders.DatasetBuilder has a chunks argument as well.

I seem to not understand how the object mappers work. According to https://pynwb.readthedocs.io/en/stable/overview_software_architecture.html?highlight=architecture#objectmapper I would think that

$ git diff . diff --git a/src/pynwb/io/file.py b/src/pynwb/io/file.py index 2c629ab7..a7057941 100644 --- a/src/pynwb/io/file.py +++ b/src/pynwb/io/file.py @@ -3,7 +3,7 @@ from hdmf.build import ObjectMapper from .. import register_map from ..file import NWBFile, Subject from ..core import ScratchData - +from hdmf.build import DatasetBuilder @register_map(NWBFile) class NWBFileMap(ObjectMapper): @@ -152,6 +152,10 @@ class NWBFileMap(ObjectMapper): date = dateutil_parse(datestr) return date + @ObjectMapper.object_attr('file_create_date') + def file_create_date_obj_attr(self, container, manager): + return DatasetBuilder('file_create_date', data=container.file_create_date, chunks=True) + @ObjectMapper.constructor_arg('file_create_date') def dateconversion_list(self, builder, manager): datestr = builder.get('file_create_date').data

should work, but it doesn't. Any hints?

bendichter · 2020-11-03T18:39:07Z

@rly @t-b what's the status of this?

t-b · 2020-11-03T18:45:47Z

We need to find a way to tell pynwb that certain datasets in HDF5 need to be written as chunked by default. Only then they are appendable. I don't know how to do that.

bendichter · 2020-11-03T18:49:32Z

@t-b ah, ok. Sounds like a job for H5DataIO

Using an if/elif chain is easier to understand.

… load The file_create_date entry holds according to [1] A record of the date the file was created and of subsequent modifications. But until now we never added additional entries to file_create_date. We now do that when the file is not opened read-only. [1]: https://nwb-schema.readthedocs.io/en/latest/format.html#nwb-n-file

t-b requested a review from rly January 29, 2020 23:35

rly previously approved these changes Jan 30, 2020

View reviewed changes

rly self-requested a review January 30, 2020 00:00

t-b dismissed rly’s stale review via 9137f2a January 30, 2020 00:19

t-b force-pushed the add-file-create-data-appending branch from e6459da to 9137f2a Compare January 30, 2020 00:19

t-b mentioned this pull request Mar 9, 2021

Issues from pynwb/hdmf AllenInstitute/MIES#874

Closed

t-b added 2 commits April 13, 2021 13:51

src/pynwb/file.py: Rewrite the file_create_date setting code

f71e31a

Using an if/elif chain is easier to understand.

t-b force-pushed the add-file-create-data-appending branch from 3d70398 to 3fb6358 Compare April 13, 2021 11:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add file create data appending #1163

Add file create data appending #1163

t-b commented Jan 29, 2020

rly Jan 30, 2020

t-b Jan 30, 2020

t-b Jan 30, 2020

rly Jan 30, 2020

t-b Jan 30, 2020

t-b Feb 7, 2020

oruebel Mar 1, 2020

t-b Mar 2, 2020

t-b Apr 13, 2021

bendichter commented Nov 3, 2020

t-b commented Nov 3, 2020

bendichter commented Nov 3, 2020

Add file create data appending #1163

Are you sure you want to change the base?

Add file create data appending #1163

Conversation

t-b commented Jan 29, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bendichter commented Nov 3, 2020

t-b commented Nov 3, 2020

bendichter commented Nov 3, 2020