Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]: Detect when an NWB file has no domain-specific neurodata types #481

Open
2 tasks done
bendichter opened this issue Aug 2, 2024 · 1 comment
Open
2 tasks done
Labels
category: enhancement improvements of code or code behavior priority: low alternative solution already working and/or relevant to only specific user(s)

Comments

@bendichter
Copy link
Contributor

What would you like to see added to the NWBInspector?

Occasionally, we get NWB file submissions on DANDI that contain all of their data in TimeSeries objects. These datasets generally contain doman-specific data that should use e.g. SpatialSeries and ElectricalSeries, but use the generic type. This means the files don't have the associated metadata and can't be correctly categorized by DANDI.

See e.g. dandi/helpdesk#159

Is there a way of detecting and flagging this?

(request from @satra)

Do you have any interest in helping implement the feature?

No.

Code of Conduct

@bendichter bendichter added the category: enhancement improvements of code or code behavior label Aug 2, 2024
@CodyCBakerPhD
Copy link
Contributor

The basic principle of asserting a certain complexity/diversity of file contents is appealing, but determining good thresholds for how to reliably detect it might be tricky

Another consideration is to avoid false positives on 'split' file strategies such as Spyglass, where multiple NWB files sharing the same session IDs / start times constitute an 'NWB folder' (a viewing strategy facilitated by Neurosift), and where each file may only have a small number of contents

It would help if we could find any other similar examples on DANDI to help craft some specific assertions common across the datasets

Otherwise, like Satra said on Slack, this seems like something more appropriate for curation by the scientific core combined with outreach to help them make the file better

@stephprince stephprince added the priority: low alternative solution already working and/or relevant to only specific user(s) label Nov 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: enhancement improvements of code or code behavior priority: low alternative solution already working and/or relevant to only specific user(s)
Projects
None yet
Development

No branches or pull requests

3 participants