Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add electrical series and update recording process #25

Merged
merged 40 commits into from
Jun 11, 2024

Conversation

stephprince
Copy link
Collaborator

These changes add the ability to add ElectricalSeries to the NWB file and to "run" a recording by writing data in rows and blocks to the TimeSeries data and timestamps datasets.

Some related changes that came up when adding these features:

  • refactored tests from single aq-nwb_test.cpp file to multiple test files
  • moved NWBRecordingEngine from NWBFile and renamed as NWBRecording
  • added a Channel that will need to be adapted for acquisition system / recording equipment to specify conversion factors, electrode names, etc.

I will add issues for any related topics that came up when adding these.

src/Utils.hpp Show resolved Hide resolved
src/hdf5/HDF5IO.cpp Outdated Show resolved Hide resolved
src/nwb/NWBFile.cpp Outdated Show resolved Hide resolved
ElectrodeGroup(elecPath, io, "description", "unknown", device);
elecGroup.initialize();
}
timeseriesData.clear();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In what case would timeseriesData not be empty, and what would happen if it is not empty? Do we need an error-check here to check that timeseriesData is empty?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the situation when timeseriesData would not be empty is if a recording is temporarily stopped and then restarted but writing to the same file? I have an open issue to add the ability to have multiple recording periods within the same file.

src/nwb/NWBRecording.cpp Outdated Show resolved Hide resolved
Comment on lines 70 to 83
/**
* @brief Holds integer sample numbers for writing.
*/
std::unique_ptr<int[]> sampleBuffer = nullptr;

/**
* @brief Holds scaled samples for writing.
*/
std::unique_ptr<float[]> scaledBuffer = nullptr;

/**
* @brief Holds integer samples for writing.
*/
std::unique_ptr<int16_t[]> intBuffer = nullptr;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain how these buffers are being used. Also a few specific questions:

  • It seems the buffers are limited in size to MAX_BUFFER_SIZE = 40960;. Could you explain why that specific size and what happens if the caller wants to add more values but the buffer is already full?
  • It looks like there are 3 buffers with specific types of int, float and int16_t. Could you clarify why those specific data types and what each of the buffers is for?
  • Does NWBRecording need to be templated on the data type to support writing on non-int and non-float data? If we don't need to, then I think avoiding templates would be nice, but at the same time, we don't want to restrict the API to just be able to write int and float. As a template this would look something like:
template<typename DataValueType>
class NWBRecording
    std::unique_ptr<DataValueType[]> dataValueBugger = nullprt;

So when we create an NWBRecording we would then need to say NWB::NWBRecording<float> nwbRecording

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The buffers are being used to hold data copied from the record thread, the different ones are for timestamps vs. data values.

It seems the buffers are limited in size to MAX_BUFFER_SIZE = 40960;. Could you explain why that specific size and what happens if the caller wants to add more values but the buffer is already full?

The max size I copied from the OpenEphys plugin, I'm not sure if there is an exact reason for it. I've added a catch to deal with the case where the caller wants to add more values but the buffer is too small.

It looks like there are 3 buffers with specific types of int, float and int16_t. Could you clarify why those specific data types and what each of the buffers is for?

I also copied the data types from OpenEphys' implementation. The int16 buffer was for the data, it was mentioned that was used to keep the files as small as possible. The float buffer was for the timestamps. I removed the int sampleBuffer for now since I was not using it. I think it was used for keeping track of sample numbers written across different channels or data types so we may need to add something like that back in.

Does NWBRecording need to be templated on the data type to support writing on non-int and non-float data?

I'm not sure, what other data types might we be writing the data blocks/rows with? Currently these are specifically used for writeTimeSeriesData but if we want to have a generalized NWBRecording::writeData method then maybe we need to consider other data types.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also copied the data types from OpenEphys' implementation. The int16 buffer was for the data, it was mentioned that was used to keep the files as small as possible.

I don't think that we can assume that all acquisition system will want to record in int16 (presumably using some offset for conversoin to float). I think we will probably need to allow for more than just int16 here.

* @param experimentNumber The experiment number.
* @param recordingNumber The recording number.
*/
Status openFiles(const std::string& rootFolder,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The functions are called openFiles and closeFiles but it seems like right now these open and return exactly one file. Is the logic in the functions going to change to open multiple files. If so, should std::unique_ptr<NWBFile> nwbfile; be a vector of unique_ptr objects instead?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the names to reflect that we are only working with single files for now. Is there an example of when we would want to have multiple files open within the same NWBRecording at once?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From an acquisition system perspective, you want to avoid dependencies between different data streams when possible, e.g., to avoid other data streams being corrupted if one fails and to avoid performance issues due to competition between data streams. One possible way would be for the user to route each datastream to a separate file

Comment on lines 135 to 150
electricalSeries->data =
createRecordingData(BaseDataType::I16,
SizeArray {0, channelGroup.size()},
SizeArray {CHUNK_XSIZE},
electricalSeries->getPath() + "/data");
io->createDataAttributes(electricalSeries->getPath(),
channelGroup[0].getConversion(),
-1.0f,
"volts");

electricalSeries->timestamps =
createRecordingData(BaseDataType::F64,
SizeArray {0},
SizeArray {CHUNK_XSIZE},
electricalSeries->getPath() + "/timestamps");
io->createTimestampsAttributes(electricalSeries->getPath());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the logic for creating the data and timetamps and all the attributes that are associated with them outside of the TimeSeries? It seems that with this, a user could initalize a TimeSeries but unless they manually create the additional fields for data etc, themselves, the NWB file would not be valid? It seems that TimeSeries (or the corresponding subtypes) would need to provide the logic for setting up any fields that they require. E.g., should this be part of the initialize method? And if not, should there be another function, e.g., TimeSeries.initalizeRecording to setup the datasets for recording?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I think having all of the data and timestamp creation logic in the initialize method makes sense. I had meant to go back and move this logic inside the TimeSeries once it was working but had missed it. Right now startRecording is handling the general dataset setup process and I've updated it so that fields required by the different datatypes (e.g. data and timestamps in TimeSeries) should all be setup within their respective initialize methods.

Comment on lines +25 to +27
io->createCommonNWBAttributes(path, "core", neurodataType, description);
io->createAttribute(comments, path, "comments");
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the plan to add all the other fields that TimeSeries has, e.g., conversion, continuity, sync etc. later? If so, could you create an issue for that (if we don't have one yet) https://nwb-schema.readthedocs.io/en/latest/format.html#timeseries

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I can make an issue to add the other fields. Within TimeSeries.data I have the conversion, resolution, and unit fields, but need to add continuity and offset. The other ones we are missing for TimeSeries would be control, control_description, and sync.

I was wondering if we want to save sync information? The schema suggests it is for archival purposes after timestamp data is calculated, but maybe there is a case where we want to save it during acquisition.

This also brings up another question about if we ever want to use starting_time and rate if the data have a constant sampling rate? Some of the test files I am creating raise best practice violations in nwbinspector since the time series have a constant sampling rate, but since this will likely only be really known after data acquisition I was thinking we want to stick with timestamps.

Copy link
Contributor

@oruebel oruebel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! I have not looked at all the code in detail yet, but I added a first pass of questions and comments. Overall, I think the approach looks reasonable. I think the main part that is not clear to me at first glance is 1) how the initialization process works (i.e., the question of ElectricalSeries not creating all the datasets it needs) and 2) how a user can define what recordings they want to do (currently this seems hard-coded to a single ElectricalRecording). I understand that the answer for some of this may be that this is going to change later and that we need to make separate issues, but it would be helpful to understand what needs to be separate issues and how we would want to address them.

if (description != "")
createAttribute(description, path, "description");
return Status::Success;
}

Status BaseIO::createDataAttributes(const std::string& path,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move createDataAttributes and createTimestampsAttributes to TimeSeries initialization

Copy link
Contributor

@oruebel oruebel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving to help with the next round of updates

@stephprince stephprince merged commit 615f993 into main Jun 11, 2024
3 of 5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants