[Feature]: Streaming DANDI:000541 takes a long time #1889
Labels
category: enhancement
improvements of code or code behavior
priority: low
alternative solution already working and/or relevant to only specific user(s)
Milestone
What would you like to see added to PyNWB?
From @dysprague: When looping through all files in dandiset 000541 and extracting the NeuroPAL images, it takes ~33 minutes. There are 21 files that are on the order of ~2 GB. This is a lot slower than the other dandisets that also have NeuroPAL images (e.g., 000714, 000692, and 000776). This problem exists for streaming with both PyNWB and MatNWB.
It is actually faster to download and open the file than stream it on my computer and connection.
I suspect it has to do with the fact that this dandiset has one set of 960
PlaneSegmentation
tables for the "CalciumSeriesSegmentation"ImageSegmentation
group, another set of 960 for the "CalciumSeriesSegmentationdNMF"ImageSegmentation
group, and another set of 960 for the "NeuronIDs/ImageSegmentation" group. Each table represents the segmentation at a particular time point. That is a lot of groups.Is your feature request related to a problem?
No response
What solution would you like?
Provide a recommendation for how to reorganize this data for more efficient streaming. I can do this but I need to look more closely into what is changing across tables / ImageSegmentation groups. It is possible that this can all be combined into a single (or two) PlaneSegmentation table with a column for time sample.
Do you have any interest in helping implement the feature?
Yes.
Code of Conduct
The text was updated successfully, but these errors were encountered: