-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add electrical series and update recording process #25
Conversation
ElectrodeGroup(elecPath, io, "description", "unknown", device); | ||
elecGroup.initialize(); | ||
} | ||
timeseriesData.clear(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In what case would timeseriesData
not be empty, and what would happen if it is not empty? Do we need an error-check here to check that timeseriesData
is empty?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the situation when timeseriesData would not be empty is if a recording is temporarily stopped and then restarted but writing to the same file? I have an open issue to add the ability to have multiple recording periods within the same file.
src/nwb/NWBRecording.hpp
Outdated
/** | ||
* @brief Holds integer sample numbers for writing. | ||
*/ | ||
std::unique_ptr<int[]> sampleBuffer = nullptr; | ||
|
||
/** | ||
* @brief Holds scaled samples for writing. | ||
*/ | ||
std::unique_ptr<float[]> scaledBuffer = nullptr; | ||
|
||
/** | ||
* @brief Holds integer samples for writing. | ||
*/ | ||
std::unique_ptr<int16_t[]> intBuffer = nullptr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you explain how these buffers are being used. Also a few specific questions:
- It seems the buffers are limited in size to
MAX_BUFFER_SIZE = 40960;
. Could you explain why that specific size and what happens if the caller wants to add more values but the buffer is already full? - It looks like there are 3 buffers with specific types of
int
,float
andint16_t
. Could you clarify why those specific data types and what each of the buffers is for? - Does
NWBRecording
need to be templated on the data type to support writing on non-int and non-float data? If we don't need to, then I think avoiding templates would be nice, but at the same time, we don't want to restrict the API to just be able to write int and float. As a template this would look something like:
template<typename DataValueType>
class NWBRecording
std::unique_ptr<DataValueType[]> dataValueBugger = nullprt;
So when we create an NWBRecording
we would then need to say NWB::NWBRecording<float> nwbRecording
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The buffers are being used to hold data copied from the record thread, the different ones are for timestamps vs. data values.
It seems the buffers are limited in size to MAX_BUFFER_SIZE = 40960;. Could you explain why that specific size and what happens if the caller wants to add more values but the buffer is already full?
The max size I copied from the OpenEphys plugin, I'm not sure if there is an exact reason for it. I've added a catch to deal with the case where the caller wants to add more values but the buffer is too small.
It looks like there are 3 buffers with specific types of int, float and int16_t. Could you clarify why those specific data types and what each of the buffers is for?
I also copied the data types from OpenEphys' implementation. The int16
buffer was for the data, it was mentioned that was used to keep the files as small as possible. The float
buffer was for the timestamps. I removed the int
sampleBuffer for now since I was not using it. I think it was used for keeping track of sample numbers written across different channels or data types so we may need to add something like that back in.
Does NWBRecording need to be templated on the data type to support writing on non-int and non-float data?
I'm not sure, what other data types might we be writing the data blocks/rows with? Currently these are specifically used for writeTimeSeriesData
but if we want to have a generalized NWBRecording::writeData
method then maybe we need to consider other data types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also copied the data types from OpenEphys' implementation. The
int16
buffer was for the data, it was mentioned that was used to keep the files as small as possible.
I don't think that we can assume that all acquisition system will want to record in int16
(presumably using some offset for conversoin to float). I think we will probably need to allow for more than just int16
here.
src/nwb/NWBRecording.hpp
Outdated
* @param experimentNumber The experiment number. | ||
* @param recordingNumber The recording number. | ||
*/ | ||
Status openFiles(const std::string& rootFolder, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The functions are called openFiles
and closeFiles
but it seems like right now these open and return exactly one file. Is the logic in the functions going to change to open multiple files. If so, should std::unique_ptr<NWBFile> nwbfile;
be a vector of unique_ptr objects instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated the names to reflect that we are only working with single files for now. Is there an example of when we would want to have multiple files open within the same NWBRecording at once?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From an acquisition system perspective, you want to avoid dependencies between different data streams when possible, e.g., to avoid other data streams being corrupted if one fails and to avoid performance issues due to competition between data streams. One possible way would be for the user to route each datastream to a separate file
src/nwb/NWBFile.cpp
Outdated
electricalSeries->data = | ||
createRecordingData(BaseDataType::I16, | ||
SizeArray {0, channelGroup.size()}, | ||
SizeArray {CHUNK_XSIZE}, | ||
electricalSeries->getPath() + "/data"); | ||
io->createDataAttributes(electricalSeries->getPath(), | ||
channelGroup[0].getConversion(), | ||
-1.0f, | ||
"volts"); | ||
|
||
electricalSeries->timestamps = | ||
createRecordingData(BaseDataType::F64, | ||
SizeArray {0}, | ||
SizeArray {CHUNK_XSIZE}, | ||
electricalSeries->getPath() + "/timestamps"); | ||
io->createTimestampsAttributes(electricalSeries->getPath()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the logic for creating the data
and timetamps
and all the attributes that are associated with them outside of the TimeSeries
? It seems that with this, a user could initalize a TimeSeries
but unless they manually create the additional fields for data
etc, themselves, the NWB file would not be valid? It seems that TimeSeries
(or the corresponding subtypes) would need to provide the logic for setting up any fields that they require. E.g., should this be part of the initialize
method? And if not, should there be another function, e.g., TimeSeries.initalizeRecording
to setup the datasets for recording?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, I think having all of the data
and timestamp
creation logic in the initialize
method makes sense. I had meant to go back and move this logic inside the TimeSeries once it was working but had missed it. Right now startRecording
is handling the general dataset setup process and I've updated it so that fields required by the different datatypes (e.g. data
and timestamps
in TimeSeries) should all be setup within their respective initialize
methods.
io->createCommonNWBAttributes(path, "core", neurodataType, description); | ||
io->createAttribute(comments, path, "comments"); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the plan to add all the other fields that TimeSeries
has, e.g., conversion, continuity, sync etc. later? If so, could you create an issue for that (if we don't have one yet) https://nwb-schema.readthedocs.io/en/latest/format.html#timeseries
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I can make an issue to add the other fields. Within TimeSeries.data
I have the conversion, resolution, and unit fields, but need to add continuity and offset. The other ones we are missing for TimeSeries
would be control, control_description, and sync.
I was wondering if we want to save sync information? The schema suggests it is for archival purposes after timestamp data is calculated, but maybe there is a case where we want to save it during acquisition.
This also brings up another question about if we ever want to use starting_time
and rate
if the data have a constant sampling rate? Some of the test files I am creating raise best practice violations in nwbinspector since the time series have a constant sampling rate, but since this will likely only be really known after data acquisition I was thinking we want to stick with timestamps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! I have not looked at all the code in detail yet, but I added a first pass of questions and comments. Overall, I think the approach looks reasonable. I think the main part that is not clear to me at first glance is 1) how the initialization process works (i.e., the question of ElectricalSeries
not creating all the datasets it needs) and 2) how a user can define what recordings they want to do (currently this seems hard-coded to a single ElectricalRecording). I understand that the answer for some of this may be that this is going to change later and that we need to make separate issues, but it would be helpful to understand what needs to be separate issues and how we would want to address them.
if (description != "") | ||
createAttribute(description, path, "description"); | ||
return Status::Success; | ||
} | ||
|
||
Status BaseIO::createDataAttributes(const std::string& path, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move createDataAttributes
and createTimestampsAttributes
to TimeSeries initialization
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving to help with the next round of updates
These changes add the ability to add
ElectricalSeries
to the NWB file and to "run" a recording by writing data in rows and blocks to theTimeSeries
data and timestamps datasets.Some related changes that came up when adding these features:
aq-nwb_test.cpp
file to multiple test filesNWBRecordingEngine
fromNWBFile
and renamed asNWBRecording
Channel
that will need to be adapted for acquisition system / recording equipment to specify conversion factors, electrode names, etc.I will add issues for any related topics that came up when adding these.