Skip to content

Commit

Permalink
Add user docs for data read
Browse files Browse the repository at this point in the history
  • Loading branch information
oruebel committed Sep 1, 2024
1 parent 970f4c8 commit 26fbae9
Show file tree
Hide file tree
Showing 5 changed files with 185 additions and 10 deletions.
4 changes: 4 additions & 0 deletions docs/Doxyfile.in
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,10 @@ EXTRACT_ALL = YES
RECURSIVE = YES
OUTPUT_DIRECTORY = "@DOXYGEN_OUTPUT_DIRECTORY@"

# Also show private members in the docs,
EXTRACT_PRIVATE = YES
# HIDE_UNDOC_MEMBERS = YES

# Enable Markdown support
MARKDOWN_SUPPORT = YES

Expand Down
1 change: 1 addition & 0 deletions docs/pages/1_userdocs.dox
Original file line number Diff line number Diff line change
Expand Up @@ -6,4 +6,5 @@
*
* - \subpage user_install_page
* - \subpage hdf5io
* - \subpage read_page
*/
145 changes: 145 additions & 0 deletions docs/pages/userdocs/read.dox
Original file line number Diff line number Diff line change
@@ -0,0 +1,145 @@
/**
* \page read_page Reading data
*
* \section read_design_example Example
*
* \subsection read_design_example_create Create a NWB file as usual
*
* \paragraph read_design_example_step_1 Setup mock data for write
*
* \snippet tests/examples/test_ecephys_data_read.cpp example_read_mockdata_snippet
*
* \paragraph read_design_example_step_1_2 Create the NWBFile and record data
*
* \snippet tests/examples/test_ecephys_data_read.cpp example_read_create_file_snippet
*
* \subsection read_design_example_read_during Read data during recording
*
* \paragraph read_design_example_laxy_read Lazy data access
*
* All data read is implemented lazily, i.e., AqNWB does not load data into memory
* until we make a request to do so. To access data lazily, datasets and attributes are
* wrapped via \ref AQNWB::ReadDatasetWrapper and \ref AQNWB::ReadAttributeWrapper, respectively.
* The \ref AQNWB::NWB::Container "Container" object that owns the dataset/attribute then
* provides accessor methods to get access to the dataset/attribute. Here, we
* access the ``data`` dataset of the \ref AQNWB::NWB::ElectricalSeries "ElectricalSeries".
*
* \snippet tests/examples/test_ecephys_data_read.cpp example_read_get_data_wrapper_snippet
*
* \paragraph read_design_example_load_data Read data into memory
*
* To access the data values of a data, we can then use \ref AQNWB::ReadDatasetWrapper::valuesGeneric "valuesGeneric"
* and \ref AQNWB::ReadDatasetWrapper::values "values" methods, which load the data as generic (untyped) or typed
* data, respectively.
*
* \snippet tests/examples/test_ecephys_data_read.cpp example_read_get_datablock_snippet
*
* The data is here represented as a \ref AQNWB::DataBlock "DataBlock", which stores the data as 1-dimensionsal
* vector along with the shape of the data. E.g, here we validate the data against the original mock data:
*
* \snippet tests/examples/test_ecephys_data_read.cpp example_read_validate_datablock_snippet
*
* \paragraph read_design_example_boostarray Accessing multi-dimensional data as Boost multi-array
*
* To ease interaction with mutli-dimensional data, e.g., the ``(time x channel)`` data of our
* \ref AQNWB::NWB::ElectricalSeries "ElectricalSeries", we can use the
* AQNWB::DataBlock::as_multi_array "DataBlock.as_multi_array" method to construct a
* ``boost::const_multi_array_ref``.
*
* \snippet tests/examples/test_ecephys_data_read.cpp example_read_get_boostarray_snippet
*
* Using boost multi-array simplifies access and interaction with the data as a multi-dimensional array.
* Here we use this again to validate the data we loaded against the original mock, like we did above.
*
* \snippet tests/examples/test_ecephys_data_read.cpp example_read_validate_boostarray_snippet
*
* Next we stop the recording and close the file so we can show how we can read from the file
* we just created.
*
* \snippet tests/examples/test_ecephys_data_read.cpp example_read_finish_recording_snippet
*
* \subsection read_design_example_read_posthoc Read data from an existing file
*
* To read from an existing file we simply need to create the I/O object and the construct the
* Container object we want to read.
*
* \snippet tests/examples/test_ecephys_data_read.cpp example_read_only_snippet
*
* \section read_design_sec Software Design
*
* \subsection read_design_wrapper_container Container
*
* The \ref AQNWB::NWB::Container "Container" class (e.g., \ref AQNWB::NWB::ElectricalSeries "ElectricalSeries"
* or \ref AQNWB::NWB::NWBFile "NWBFile") is responsible for exposing read access to it's
* specific datasets and attributes by providing appropriate access functions, which return
* \ref AQNWB::ReadDatasetWrapper "ReadDatasetWrapper" or \ref AQNWB::ReadAttributeWrapper "ReadAttributeWrapper"
* objects for lazily reading from the dataset/attribute.
*
* \subsection read_design_wrapper_propos ReadDatasetWrapper and ReadAttributeWrapper
*
* The \ref AQNWB::ReadDatasetWrapper "ReadDatasetWrapper" stores a shared pointer
* \ref AQNWB::ReadDatasetWrapper::io "ReadDatasetWrapper.io" to the I/O object and the
* \ref AQNWB::ReadDatasetWrapper::dataPath "ReadDatasetWrapper.dataPath" to the dataset.
*
* The \ref AQNWB::ReadDatasetWrapper::valuesGeneric "valuesGeneric" method then allows us
* to read all or parts of the dataset into memory as ``std::any``. This function uses
* the \ref AQNWB::BaseIO::readDataset "readDataset" method of the I/O backend
* (e.g., \ref AQNWB::HDF5::HDF5IO::readDataset "HDF5IO.readDataset" to load the data,
* such that the I/O backend takes care of allocating the memory for the
* appropriate data type.
*
* We can retrieve data directly with the appropriate type by using the templated
* \ref AQNWB::ReadDatasetWrapper::values "values" function instead, which
* uses \ref AQNWB::ReadDatasetWrapper::valuesGeneric "valuesGeneric" and then
* automatically casts the data to a typed \ref AQNWB::DataBlock "DataBlock"
* instead of returning an untyped \ref AQNWB::DataBlock "DataBlockGeneric".
*
* \note
* \ref AQNWB::ReadAttributeWrapper "ReadAttributeWrapper" works much in the same
* way as \ref AQNWB::ReadDatasetWrapper "ReadDatasetWrapper" but does not support
* slicing, i.e., attributes are always loaded fully into memory since attributes
* are intended for small data only.
*
* \subsection read_design_data_block DataBlockGeneric and DataBlock
*
* \ref AQNWB::ReadDatasetWrapper::valuesGeneric "ReadDatasetWrapper.valuesGeneric"
* return the data as a \ref AQNWB::DataBlockGeneric "DataBlockGeneric"
* object, which stores the \ref AQNWB::DataBlockGeneric::data "data" as ``std::any``
* along with the \ref AQNWB::DataBlockGeneric::shape "shape" of the data.
*
* To cast the data to the appropriate specific type (e.g., ``float``) we can then create a
* \ref AQNWB::DataBlock "DataBlock" with the appropriate data type via the
* \ref AQNWB::DataBlock::fromGeneric "DataBlock.fromGeneric". \ref AQNWB::DataBlock "DataBlock"
* is templated on the specific, i.e., we call ``DataBlock<float>.fromGeneric(myGenericDataBlock)``.
* \ref AQNWB::DataBlock "DataBlock" then stores the \ref AQNWB::DataBlock::data "data" as an
* appropriately typed 1-dimensions ``std::vector`` along wit the \ref AQNWB::DataBlock::shape "shape"

Check failure on line 115 in docs/pages/userdocs/read.dox

View workflow job for this annotation

GitHub Actions / Check for spelling errors

wit ==> with
* of the data.
*
* To simplify access to multi-dimensional data, we can then represent the data
* as ``BOOST::multi_array``. The AQNWB::DataBlock::as_multi_array "DataBlock.as_multi_array"
* convenience method generates a ``boost::const_multi_array_ref<DTYPE, NDIMS>`` for us.
* Here the ```DTYPE`` template parameter is the same as for the \ref AQNWB::DataBlock "DataBlock"
* (so that we don't have to specify it again), and the ``NDIMS`` template parameter
* is the number of dimensions (which is the same as \ref AQNWB::DataBlock::shape "shape.size()").
* I.e., since we are in a strongly typed language, we here need to know the ``DTYPE`` and the
* ``NDIMS`` at compile time.
*
* \note
* The \ref AQNWB::DataBlock::fromGeneric "DataBlock.fromGeneric" and
* \ref AQNWB::DataBlock::as_multi_array "DataBlock.as_multi_array" use casting
* and referencing to transform the data without making additional copies
* of the data.
*
* \subsection read_design_wrapper_io I/O
*
* The I/O backend is responsible for implementing the actual
* \ref AQNWB::BaseIO::readDataset "readDataset" and \ref AQNWB::BaseIO::readAttribute "readAttribute"
* methods used for reading data from disk. The methods are also responsible for
* allocating appropriate memory with the respective data type. The functions
* return the data as \ref AQNWB::DataBlockGeneric "DataBlockGeneric", which
* stores the data as untyped std::any. The user can then cast the
* data to the appropriate type via the templated \ref AQNWB::DataBlock "DataBlock" class.
*
*/


12 changes: 8 additions & 4 deletions src/hdf5/HDF5IO.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -312,8 +312,9 @@ class HDF5IO : public BaseIO
* of the readData method.
*
* @tparam T The data type of the dataset or attribute.
* @tparam HDF5TYPE HDF5 Dataset or Attribute type, usually determined from
* dataSource
* @param dataSource The HDF5 data source (dataset or attribute).
* @param predType The HDF5 data type.
* @param numElements The number of elements to read.
* @param memspace The memory dataspace (optional).
* @param dataspace The file dataspace (optional).
Expand All @@ -336,11 +337,10 @@ class HDF5IO : public BaseIO
* <H5Cpp.h> in the HDF5IO.hpp header file.
*
* @tparam T The data type of the dataset or attribute.
* @tparam HDF5TYPE HDF5 Dataset or Attribute type, usually determined from
* dataSource
* @param dataSource The HDF5 data source (dataset or attribute).
* @param predType The HDF5 data type.
* @param numElements The number of elements to read.
* @param memspace The memory dataspace (optional).
* @param dataspace The file dataspace (optional).
*
* @return A vector containing the data.
*/
Expand All @@ -350,6 +350,8 @@ class HDF5IO : public BaseIO
/**
* @brief Reads a variable-length string from an HDF5 dataset or attribute.
*
* @tparam HDF5TYPE HDF5 Dataset or Attribute type, usually determined from
* dataSource
* @param dataSource The HDF5 data source (dataset or attribute).
* @param numElements The number of elements to read.
* @param memspace The memory dataspace (optional).
Expand All @@ -371,6 +373,8 @@ class HDF5IO : public BaseIO
* here, rather than defining default parameters directly, to avoid having to
* include <H5Cpp.h> in the HDF5IO.hpp header file.
*
* @tparam HDF5TYPE HDF5 Dataset or Attribute type, usually determined from
* dataSource
* @param dataSource The HDF5 data source (dataset or attribute).
* @param numElements The number of elements to read.
*
Expand Down
33 changes: 27 additions & 6 deletions tests/examples/test_ecephys_data_read.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ TEST_CASE("ElectricalSeriesReadExample", "[ecephys]")
{
SECTION("ecephys data read example")
{
// [example_read_mockdata_snippet]
// setup mock data for writing
SizeType numSamples = 100;
SizeType numChannels = 2;
Expand All @@ -40,7 +41,9 @@ TEST_CASE("ElectricalSeriesReadExample", "[ecephys]")
mockDataTransposed[s][c] = mockData[c][s];
}
}
// [example_read_mockdata_snippet]

// [example_read_create_file_snippet]
// setup io object
std::string path = getTestFilePath("ElectricalSeriesReadExample.h5");
std::shared_ptr<BaseIO> io = createIO("HDF5", path);
Expand Down Expand Up @@ -70,18 +73,19 @@ TEST_CASE("ElectricalSeriesReadExample", "[ecephys]")
ch, numSamples, mockData[ch].data(), mockTimestamps.data());
}
io->flush();
// [example_read_create_file_snippet]

// Illustrate reading the ElecticalSeries.data back
std::string electricalSeriesDataPath = electricalSeries->dataPath();
std::string electricalSeriesPath = electricalSeries->getPath();
REQUIRE(electricalSeriesDataPath == (electricalSeriesPath + "/data"));

// [example_read_get_data_wrapper_snippet]
// Get a ReadDataseWrapper for lazy reading of ElectricalSeries.data
auto readDataWrapper = electricalSeries->dataLazy();
// [example_read_get_data_wrapper_snippet]

// [example_read_get_datablock_snippet]
// Read the full ElectricalSeries.data back
DataBlock<float> dataValues = readDataWrapper->values<float>();
// [example_read_get_datablock_snippet]

// [example_read_validate_datablock_snippet]
// Check that the data we read has the expected size and shape
REQUIRE(dataValues.data.size() == (numSamples * numChannels));
REQUIRE(dataValues.shape[0] == numSamples);
Expand All @@ -97,9 +101,14 @@ TEST_CASE("ElectricalSeriesReadExample", "[ecephys]")
REQUIRE_THAT(selectedRange,
Catch::Matchers::Approx(mockDataTransposed[t]).margin(1));
}
// [example_read_validate_datablock_snippet]

// [example_read_get_boostarray_snippet]
// Use the boost multi-array feature to simply interaction with data
// Create a 2D boost::const_multi_array_ref<float, 2> multidimensional array
auto boostMulitArray = dataValues.as_multi_array<2>();
// [example_read_get_boostarray_snippet]

// [example_read_validate_boostarray_snippet]
// Iterate through all the time steps again, but now using the boost array
for (SizeType t = 0; t < numSamples; t++) {
// Access [t, :], i.e., get a 1D array with the data
Expand All @@ -112,11 +121,22 @@ TEST_CASE("ElectricalSeriesReadExample", "[ecephys]")
REQUIRE_THAT(row_t_vector,
Catch::Matchers::Approx(mockDataTransposed[t]).margin(1));
}
// [example_read_validate_boostarray_snippet]

// [example_read_getpath_snippet]
// Reading the ElecticalSeries.data back (during the recording)
std::string electricalSeriesDataPath = electricalSeries->dataPath();
std::string electricalSeriesPath = electricalSeries->getPath();
REQUIRE(electricalSeriesDataPath == (electricalSeriesPath + "/data"));
// [example_read_getpath_snippet]

// [example_read_finish_recording_snippet]
// Stop the recording
io->stopRecording();
io->close();
// [example_read_finish_recording_snippet]

// [example_read_only_snippet]
// Open an I/O for reading
std::shared_ptr<BaseIO> readio = createIO("HDF5", path);

Expand All @@ -129,6 +149,7 @@ TEST_CASE("ElectricalSeriesReadExample", "[ecephys]")

// Now we can read the data in the same way we did during write
auto readElectricalSeriesData = electricalSeries->dataLazy();
// [example_read_only_snippet]

// TODO Actually loading the data causes a segfault
// DataBlock<float> readDataValues = readDataWrapper->values<float>();
Expand Down

0 comments on commit 26fbae9

Please sign in to comment.