Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add read for neurodata_types, e.g., Container, TimeSeries #91

Open
wants to merge 81 commits into
base: add_read
Choose a base branch
from

Conversation

oruebel
Copy link
Contributor

@oruebel oruebel commented Sep 8, 2024

This PR builds on #85. I made this a separate PR to make it a) easier to review and b) because this addresses the specific problem of reading a typed Container (e.g., a TimeSeries) and I wanted to make it easier to see the changes related to reading typed objects. Ultimately, the goal of this PR is to enable us to read any object with an assigned neurodata_type (for which we have a corresponding class) from an NWB file.

How does read look like?

To read a type (whether a whole NWBFile or a single TimeSeries) , we need to know: 1) the full name of the type, 2) the path of the object in the file, and 3) the io object for reading.

Variant 1: User specifies name of type

auto readElectricalSeries = AQNWB::NWB::RegisteredType::create("core::ElectricalSeries", path, io);

Variant 2: Users provides the type class as a template

auto readElectricalSeries =  NWB::RegisteredType::create<AQNWB::NWB::ElectricalSeries>(path, io);

or simply create the instance of the class ourselves:

auto readElectricalSeries = AQNWB::NWB::ElectricalSeries(path, io);

Variant 3: We read the full name of type from the file directly

auto readTS = AQNWB::NWB::RegisteredType::create(dataPath, io);

In this variant we read the full name of the type from the NWB file directly, which is a combination of the attributes namespace and neurodata_type that are stored for each type.

How does this work

I added more details about the design for this on a new page in the developer docs.

Main changes / TODO

  • AddedRegisteredType as common base class for all neurodata_types defined in the NWB (or HDMF) schema, i.e., for both Dataset and Group types
  • Updated all NWB/HDMF type classes to register with the RegisteredType registry
  • Updated Data, VectorData, and ElementIdentifiers to provide the standard constructor required by RegisteredType. I.e., these types now also have the io and their path stored.
  • Implemented class registry in RegisterType to manage all types based on the namespace and type-name
  • Implemented factory methods to create any registered type (e.g., TimeSeries) based on the name (e.g,. "core::TimeSeries"), io object, and path
  • Added unit tests for the RegisteredType class
  • Added developer docs on how to implement a subtype of RegisteredClass (i.e., a new data type class)
  • Updated SpikeEventSeries, e.g., to remove
    /**
    * @brief The neurodataType of the SpikeEventSeries.
    */
    std::string neurodataType = "SpikeEventSeries";
  • ElectrodeTable is not a type in NWB yet, as such we need to write with DynamicTable as the neurodata_type instead. Updated the RegisteredType class to allow manual overwrite of the neurodata_type when the classname is not the same as the neurodata_type.
  • Fix Inconsistent use of paths #103
    • Added mergePaths function in Utils.hpp as a more reliable way to merge HDF5 paths
    • Updated all Container classes to use the new ``mergePaths` method and remove trailing "/" in path definitions
    • Updated all helper functions in Utils.hpp to be static inline to avoid linker problem with multiple definitions of the functions and still be able to inline the functions (i.e. without having to move to a Utils.cpp
  • Added new macro DEFINE_FIELD in RegisteredType to simplify the creation of getter functions that return ReadDataWrapper object for lazy read of a dataset or attribute
    • Updated Doxygen documentation settings to allow expansion of the DEFINE_FIELD macro to create documentation for the generated getter methods also in the docs
  • Added methods to check whether a field actually exists
    • Added ReadDataWrapper.exists
    • Added BaseIO::attributeExists and HDF5IO::attributeExists (we already had BaseIO::objectExists for datasets/groups but not for attributes)
  • Add functionality to allow us to get dynamically defined typed groups (e.g,. to get TimeSeries from /acquisition)
    • Add BaseIO::getGroupObjects and HDF5IO::getGroupObjects to allow us to get a list of all contents of a group
    • Added BaseIO::findTypes to allow us to search for all objects of a given set of types. This function only uses functions defined on BaseIO so it is backend independent and does not need to be overridden in HDF5IO
    • Add example to show how to use findTypes on read
  • Cleanup of naming convention in compliance with Update member variable names to avoid shadowing #102. Replace member variables with m_ naming convention:
    • ElectrodeTable
    • ElectrodeGroup
    • SpikeEventSeries
    • ElectricalSeries
    • Device
    • DynamicTable
    • VectorData
    • Data
    • TimeSeries
    • RegisteredType
    • ReadIO
    • DataBlock, DataBlockGeneric
    • ReadDataWrapper classes
  • Update all Container classes to remove stored data values as attributes and replace with read getter functions to retrieve ReadDataWrapper objects for lazy read of datasets and attributes. These can be created with the new DEFINE_FIELD macro.
    • TimeSeries
    • ElectricalSeries
    • SpikeEventSeries
    • Device
    • NWBFile
    • ElectrodeTable
    • DynamicTable
    • Data
    • VectorData
    • ...
  • Support opening of a file in HDF5IO in read-only mode (probably using SWMR read)
  • Implement read for compound data and references
  • Add unit tests for reading
  • Test that reading subsets of larger arrays works as expected
  • Add not in developer test docs to clarify that tests on PR's are running on a temporary merge commit to test that the PR functions correctly when merged with the target branch (see Add data read #85 (comment))

@oruebel
Copy link
Contributor Author

oruebel commented Sep 8, 2024

@stephprince please take a look. This is an accompanying PR to #85 to implement data read, but this PR specifically focuses on read for typed objects (e.g., a TimeSeries) where we need to construct a corresponding API class in AqNWB, instead of reading a single dataset or attribute. As far as I can tell this PR works, but it looks like there are still some bugs in reading attributes in #85.

@oruebel oruebel marked this pull request as ready for review September 9, 2024 22:43
@oruebel
Copy link
Contributor Author

oruebel commented Sep 9, 2024

@stephprince this PR is good to review.

Comment on lines 172 to 174
// TODO: creating a new I/O makes the read fail.
// std::shared_ptr<BaseIO> readio = createIO("HDF5", path);
// readio->open();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stephprince it looks like there is an issue with how HDF5IO opens an existing file. Read works fine when I use the existing HDF5IO object that was used for write, but when I close the file and then open it again reading any data from the file fails. When I tried this in the testBase.cpp it corrupted the file (i.e,. I think it may be opened in "write" mode instead of "read" or "append").

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stephprince this may be related to the issue you encountered on Windows

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, I will look into that.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am having trouble reproducing this on the main branch.

If I add these lines to the end of testBase.cpp I am able to create a new IO object and read the data back in.

    io->close();

    // test reopening the file to read the data back in
    std::shared_ptr<BaseIO> readio = createIO("HDF5", path);
    readio->open();
    double* tsBuffer2 = new double[numSamples];
    std::unique_ptr<BaseRecordingData> tsetDset2 =
        readio->getDataSet(dataPath + "/timestamps");
    std::unique_ptr<HDF5::HDF5RecordingData> tsH5Dataset2(
        dynamic_cast<HDF5::HDF5RecordingData*>(tsetDset2.release()));
    readH5DataBlock(tsH5Dataset2->getDataSet(), timestampsType, tsBuffer2);
    std::vector<double> tsRead2(tsBuffer2, tsBuffer2 + numSamples);
    delete[] tsBuffer2;
    REQUIRE(tsRead2 == timestamps);

Do you remember if there were any different steps or conditions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for confirming. A simple test is to just uncomment the two lines here where the readio object is being created and then use the readio object instead of the original io object in the code that follows. We'd also need to call io.close first since we don't have a way yet to open as read-only with SWMR-read.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not positive since there were some code changes since these comments, but I think the main issue was that readDataWrapper was being used here instead of readElectricalSeriesData obtained from the read_io object.

auto readElectricalSeriesData = readElectricalSeries->dataLazy();
DataBlock<float> readDataValues = readDataWrapper->values<float>();

This test example now works using the read_io object, but the file is being re-opened in read/write mode. I will add an option to open in read-only mode and demonstrate that here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the main issue was that readDataWrapper was being used here instead of readElectricalSeriesData

Makes sense. Thanks for catching that mistake.

I will add an option to open in read-only mode and demonstrate that here.

For read-only mode we should also allow SWMR read mode.

@oruebel
Copy link
Contributor Author

oruebel commented Sep 22, 2024

@stephprince just FYI. I pushed a few more updates here. The main structural changes are: 1) I created a DEFINE_FIELD macro to make it easier to create standardized getter methods for reading datasets/attributes and 2) I added a mergePaths method to standardize the handling of paths a bit more.

Next, I'll need to:

  • add all the DEFINE_FIELD definitions to allow us to read all the fields
  • add methods to allow us to retrieve the paths for dynamically created Groups (e.g., ElectricalSeries) to make it easier to also get to the objects that are not at fixed locations in the file. As long as a user knows the paths, we can already open these, but we don't yet have a way for the user to find the paths.
  • Once that is done, we'll need to add a lot of tests. I'm sure there is still some debugging ahead to make sure reading all the different types of data and slicing works as expected.

@oruebel
Copy link
Contributor Author

oruebel commented Sep 23, 2024

  • add methods to allow us to retrieve the paths for dynamically created Groups (e.g., ElectricalSeries) to make it easier to also get to the objects that are not at fixed locations in the file. As long as a user knows the paths, we can already open these, but we don't yet have a way for the user to find the paths.

@stephprince I implemented this now as follows:

  • Added BaseIO::getGroupObjects and HDF5IO::getGroupObjects to allow us to get a list of all contents of a group
  • Added BaseIO::findTypes to allow us to search for all objects of a given set of types. This function only uses functions defined on BaseIO so it is backend independent and does not need to be overridden in HDF5IO
  • Added example to show how to use findTypes on read

@oruebel
Copy link
Contributor Author

oruebel commented Sep 23, 2024

@stephprince if it's easier for you to review, I'd be fine with merging this back to #85

@stephprince
Copy link
Collaborator

I pushed a few more updates here. The main structural changes are: 1) I created a DEFINE_FIELD macro to make it easier to create standardized getter methods for reading datasets/attributes and 2) I added a mergePaths method to standardize the handling of paths a bit more.

Added BaseIO::getGroupObjects and HDF5IO::getGroupObjects to allow us to get a list of all contents of a group
Added BaseIO::findTypes to allow us to search for all objects of a given set of types. This function only uses functions defined on BaseIO so it is backend independent and does not need to be overridden in HDF5IO
Added example to show how to use findTypes on read

@oruebel sounds great!

if it's easier for you to review, I'd be fine with merging this back to #85

Can we keep it separate for now? I think I had read most of the docs and examples for this one and I want to keep track of where I left off.

@oruebel
Copy link
Contributor Author

oruebel commented Sep 23, 2024

Can we keep it separate for now? I think I had read most of the docs and examples for this one and I want to keep track of where I left off.

Sounds good. Whichever works best for you is fine with me.

Copy link
Collaborator

@stephprince stephprince left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good and the docs explained things very well, mainly some very small suggestions for clarity.

I still need to look into the issue with HDF5IO opening an existing file.

Also, once the small suggestions are updated, if it's easier to combine the two read PRs I'm good with that.

docs/pages/devdocs/registered_types.dox Outdated Show resolved Hide resolved
docs/pages/devdocs/registered_types.dox Outdated Show resolved Hide resolved
docs/pages/devdocs/registered_types.dox Outdated Show resolved Hide resolved
docs/pages/devdocs/registered_types.dox Show resolved Hide resolved
docs/pages/userdocs/read.dox Outdated Show resolved Hide resolved
docs/pages/devdocs/registered_types.dox Show resolved Hide resolved
docs/pages/userdocs/read.dox Outdated Show resolved Hide resolved
docs/pages/userdocs/read.dox Outdated Show resolved Hide resolved
docs/pages/userdocs/read.dox Outdated Show resolved Hide resolved
docs/pages/userdocs/read.dox Show resolved Hide resolved
@oruebel
Copy link
Contributor Author

oruebel commented Oct 23, 2024

Also, once the small suggestions are updated, if it's easier to combine the two read PRs I'm good with that.

@stephprince I added the changes you suggested and synced the PR with the add_read branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants