Add electrical series and update recording process #25

stephprince · 2024-05-23T18:32:45Z

These changes add the ability to add ElectricalSeries to the NWB file and to "run" a recording by writing data in rows and blocks to the TimeSeries data and timestamps datasets.

Some related changes that came up when adding these features:

refactored tests from single aq-nwb_test.cpp file to multiple test files
moved NWBRecordingEngine from NWBFile and renamed as NWBRecording
added a Channel that will need to be adapted for acquisition system / recording equipment to specify conversion factors, electrode names, etc.

I will add issues for any related topics that came up when adding these.

src/BaseIO.cpp

src/Utils.hpp

src/hdf5/HDF5IO.cpp

src/nwb/NWBFile.cpp

oruebel · 2024-05-24T07:19:18Z

src/nwb/NWBFile.cpp

-        ElectrodeGroup(elecPath, io, "description", "unknown", device);
-    elecGroup.initialize();
-  }
+  timeseriesData.clear();


In what case would timeseriesData not be empty, and what would happen if it is not empty? Do we need an error-check here to check that timeseriesData is empty?

I think the situation when timeseriesData would not be empty is if a recording is temporarily stopped and then restarted but writing to the same file? I have an open issue to add the ability to have multiple recording periods within the same file.

src/nwb/NWBRecording.cpp

oruebel · 2024-05-24T07:31:32Z

src/nwb/NWBRecording.hpp

+  /**
+   * @brief Holds integer sample numbers for writing.
+   */
+  std::unique_ptr<int[]> sampleBuffer = nullptr;
+
+  /**
+   * @brief Holds scaled samples for writing.
+   */
+  std::unique_ptr<float[]> scaledBuffer = nullptr;
+
+  /**
+   * @brief Holds integer samples for writing.
+   */
+  std::unique_ptr<int16_t[]> intBuffer = nullptr;


Could you explain how these buffers are being used. Also a few specific questions:

It seems the buffers are limited in size to MAX_BUFFER_SIZE = 40960;. Could you explain why that specific size and what happens if the caller wants to add more values but the buffer is already full?

It looks like there are 3 buffers with specific types of int, float and int16_t. Could you clarify why those specific data types and what each of the buffers is for?

Does NWBRecording need to be templated on the data type to support writing on non-int and non-float data? If we don't need to, then I think avoiding templates would be nice, but at the same time, we don't want to restrict the API to just be able to write int and float. As a template this would look something like:

template<typename DataValueType> class NWBRecording std::unique_ptr<DataValueType[]> dataValueBugger = nullprt;

So when we create an NWBRecording we would then need to say NWB::NWBRecording<float> nwbRecording

The buffers are being used to hold data copied from the record thread, the different ones are for timestamps vs. data values.

It seems the buffers are limited in size to MAX_BUFFER_SIZE = 40960;. Could you explain why that specific size and what happens if the caller wants to add more values but the buffer is already full?

The max size I copied from the OpenEphys plugin, I'm not sure if there is an exact reason for it. I've added a catch to deal with the case where the caller wants to add more values but the buffer is too small.

It looks like there are 3 buffers with specific types of int, float and int16_t. Could you clarify why those specific data types and what each of the buffers is for?

I also copied the data types from OpenEphys' implementation. The int16 buffer was for the data, it was mentioned that was used to keep the files as small as possible. The float buffer was for the timestamps. I removed the int sampleBuffer for now since I was not using it. I think it was used for keeping track of sample numbers written across different channels or data types so we may need to add something like that back in.

Does NWBRecording need to be templated on the data type to support writing on non-int and non-float data?

I'm not sure, what other data types might we be writing the data blocks/rows with? Currently these are specifically used for writeTimeSeriesData but if we want to have a generalized NWBRecording::writeData method then maybe we need to consider other data types.

I also copied the data types from OpenEphys' implementation. The int16 buffer was for the data, it was mentioned that was used to keep the files as small as possible.

I don't think that we can assume that all acquisition system will want to record in int16 (presumably using some offset for conversoin to float). I think we will probably need to allow for more than just int16 here.

oruebel · 2024-05-24T07:35:04Z

src/nwb/NWBRecording.hpp

+   * @param experimentNumber The experiment number.
+   * @param recordingNumber The recording number.
+   */
+  Status openFiles(const std::string& rootFolder,


The functions are called openFiles and closeFiles but it seems like right now these open and return exactly one file. Is the logic in the functions going to change to open multiple files. If so, should std::unique_ptr<NWBFile> nwbfile; be a vector of unique_ptr objects instead?

I updated the names to reflect that we are only working with single files for now. Is there an example of when we would want to have multiple files open within the same NWBRecording at once?

From an acquisition system perspective, you want to avoid dependencies between different data streams when possible, e.g., to avoid other data streams being corrupted if one fails and to avoid performance issues due to competition between data streams. One possible way would be for the user to route each datastream to a separate file

oruebel · 2024-05-24T07:44:22Z

src/nwb/NWBFile.cpp

+    electricalSeries->data =
+        createRecordingData(BaseDataType::I16,
+                            SizeArray {0, channelGroup.size()},
+                            SizeArray {CHUNK_XSIZE},
+                            electricalSeries->getPath() + "/data");
+    io->createDataAttributes(electricalSeries->getPath(),
+                             channelGroup[0].getConversion(),
+                             -1.0f,
+                             "volts");
+
+    electricalSeries->timestamps =
+        createRecordingData(BaseDataType::F64,
+                            SizeArray {0},
+                            SizeArray {CHUNK_XSIZE},
+                            electricalSeries->getPath() + "/timestamps");
+    io->createTimestampsAttributes(electricalSeries->getPath());


Why is the logic for creating the data and timetamps and all the attributes that are associated with them outside of the TimeSeries? It seems that with this, a user could initalize a TimeSeries but unless they manually create the additional fields for data etc, themselves, the NWB file would not be valid? It seems that TimeSeries (or the corresponding subtypes) would need to provide the logic for setting up any fields that they require. E.g., should this be part of the initialize method? And if not, should there be another function, e.g., TimeSeries.initalizeRecording to setup the datasets for recording?

Good point, I think having all of the data and timestamp creation logic in the initialize method makes sense. I had meant to go back and move this logic inside the TimeSeries once it was working but had missed it. Right now startRecording is handling the general dataset setup process and I've updated it so that fields required by the different datatypes (e.g. data and timestamps in TimeSeries) should all be setup within their respective initialize methods.

oruebel · 2024-05-24T07:46:14Z

src/nwb/base/TimeSeries.cpp

+  io->createCommonNWBAttributes(path, "core", neurodataType, description);
+  io->createAttribute(comments, path, "comments");
+}


Is the plan to add all the other fields that TimeSeries has, e.g., conversion, continuity, sync etc. later? If so, could you create an issue for that (if we don't have one yet) https://nwb-schema.readthedocs.io/en/latest/format.html#timeseries

Yes, I can make an issue to add the other fields. Within TimeSeries.data I have the conversion, resolution, and unit fields, but need to add continuity and offset. The other ones we are missing for TimeSeries would be control, control_description, and sync.

I was wondering if we want to save sync information? The schema suggests it is for archival purposes after timestamp data is calculated, but maybe there is a case where we want to save it during acquisition.

This also brings up another question about if we ever want to use starting_time and rate if the data have a constant sampling rate? Some of the test files I am creating raise best practice violations in nwbinspector since the time series have a constant sampling rate, but since this will likely only be really known after data acquisition I was thinking we want to stick with timestamps.

src/nwb/base/TimeSeries.hpp

oruebel

Thanks for the PR! I have not looked at all the code in detail yet, but I added a first pass of questions and comments. Overall, I think the approach looks reasonable. I think the main part that is not clear to me at first glance is 1) how the initialization process works (i.e., the question of ElectricalSeries not creating all the datasets it needs) and 2) how a user can define what recordings they want to do (currently this seems hard-coded to a single ElectricalRecording). I understand that the answer for some of this may be that this is going to change later and that we need to make separate issues, but it would be helpful to understand what needs to be separate issues and how we would want to address them.

stephprince · 2024-06-06T22:41:31Z

src/BaseIO.cpp

  if (description != "")
    createAttribute(description, path, "description");
  return Status::Success;
 }

+Status BaseIO::createDataAttributes(const std::string& path,


move createDataAttributes and createTimestampsAttributes to TimeSeries initialization

oruebel

Approving to help with the next round of updates

stephprince added 28 commits May 16, 2024 10:17

add timeseries, electricalseries classes

dbf018c

add convenience functions for data and timestamp attributes

2c221fe

add getter for dynamictable colnames

f44d63b

add electricalseries creation to start of recording

b6f7f01

add channel info class

9d116d8

add channels as input to nwbfile recording initialization

88c1082

add channel info to tests and separate test files

c9e2674

fix formatting

e0aea7a

add dataRow dataset writing

2511f5c

add getters for channel class

937b5fe

split out RecordingEngine and add draft of writing functionality

d5cdf13

add continuous data writing

fd321c9

add tests for continuous data writing

4544365

move channel input from initialization to electrode adding

0a8c3c2

update cmakelists to include new tests

16d8077

fix formatting

55cb76a

use H5SClose when closing dataspace in reference attribute creation

5cc9a4a

clean up todo items

0c3b660

write electrode table datasets after adding elecs to table object

b01df6c

fix data and timestamps unit values

d181f11

fix writing timestamps data to timeseries

44d2d75

clean up todo comments

b528154

add factory io object creation method

a747c6d

update conversion factors

d474880

update startRecording test to not have empty timeseries

9938a71

set timestamps interval to 1 to match nwb schema

86efbfe

fix formatting

680ea8c

remove DynamicTableRegion references

1b5522c

stephprince mentioned this pull request May 23, 2024

Fill in missing tests #27

Open