[Feature]: Streaming DANDI:000541 takes a long time #1889

rly · 2024-04-12T20:43:23Z

What would you like to see added to PyNWB?

From @dysprague: When looping through all files in dandiset 000541 and extracting the NeuroPAL images, it takes ~33 minutes. There are 21 files that are on the order of ~2 GB. This is a lot slower than the other dandisets that also have NeuroPAL images (e.g., 000714, 000692, and 000776). This problem exists for streaming with both PyNWB and MatNWB.

It is actually faster to download and open the file than stream it on my computer and connection.

I suspect it has to do with the fact that this dandiset has one set of 960 PlaneSegmentation tables for the "CalciumSeriesSegmentation" ImageSegmentation group, another set of 960 for the "CalciumSeriesSegmentationdNMF" ImageSegmentation group, and another set of 960 for the "NeuronIDs/ImageSegmentation" group. Each table represents the segmentation at a particular time point. That is a lot of groups.

Is your feature request related to a problem?

No response

What solution would you like?

Provide a recommendation for how to reorganize this data for more efficient streaming. I can do this but I need to look more closely into what is changing across tables / ImageSegmentation groups. It is possible that this can all be combined into a single (or two) PlaneSegmentation table with a column for time sample.

Do you have any interest in helping implement the feature?

Yes.

Code of Conduct

I agree to follow this project's Code of Conduct
Have you checked the Contributing document?
Have you ensured this change was not already requested?

The text was updated successfully, but these errors were encountered:

rly added category: enhancement improvements of code or code behavior priority: low alternative solution already working and/or relevant to only specific user(s) labels Apr 12, 2024

rly self-assigned this Apr 12, 2024

rly modified the milestones: Future, 2.8.0 Apr 12, 2024

rly mentioned this issue Apr 30, 2024

[Bug]: Oddly long load times for no discernible reason. NeurodataWithoutBorders/matnwb#567

Open

2 tasks

stephprince modified the milestones: 2.8.0, 2.9.0 Jul 23, 2024

rly modified the milestones: 2.9.0, Next Major Release - 3.0 Sep 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature]: Streaming DANDI:000541 takes a long time #1889

[Feature]: Streaming DANDI:000541 takes a long time #1889

rly commented Apr 12, 2024

[Feature]: Streaming DANDI:000541 takes a long time #1889

[Feature]: Streaming DANDI:000541 takes a long time #1889

Comments

rly commented Apr 12, 2024

What would you like to see added to PyNWB?

Is your feature request related to a problem?

What solution would you like?

Do you have any interest in helping implement the feature?

Code of Conduct