Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposed Workflow Changes to NWB GUIDE #58

Closed
garrettmflynn opened this issue Mar 22, 2023 · 10 comments
Closed

Proposed Workflow Changes to NWB GUIDE #58

garrettmflynn opened this issue Mar 22, 2023 · 10 comments
Assignees

Comments

@garrettmflynn
Copy link
Member

garrettmflynn commented Mar 22, 2023

As discussed in our first meeting for NWB GUIDE, we can't actually generate anything about missing metadata for the conversion without accessing files. After completing a test conversion on minimal data (#52), I realized that this includes subject metadata and, as such, filling out these details would be best after specifying source data—not before as outlined in the Figma and the Catalyst Neuro Slack channel.

Additionally, we'd discussed expanding the scope of a configuration over time:

Eventually we will have a concept of project -> experiment -> session where an “experiment” can be a different rig configuration within the same project and in this case each experiment would have its own NWBConverter

The Proposal

Consequently, it seems like we'd want to break up the workflow into three uncoupled steps:

  1. Conversion Design: Create and validate a conversion pipeline using data from a model subject/session while only specifying the required metadata
  2. Run Conversion: Apply a validated conversion pipeline across multiple sessions/subjects in an experiment, while specifying a parent project (if known)
  3. NWB File Management: Group the results of (2) into a project and manage experiments as necessary

For (3), we would want to track source and output NWB files and, as such, could additionally manage other proposed aspects of NWB GUIDE (e.g. DANDI publication) independently from the conversion steps.

Discussion

In the short term, this updated workflow would allow us to continue allowing users to convert single subjects as the validation step for Step 1, while also allowing us to seamlessly add Steps 2 and 3 when necessary. And since these steps are designed to be uncoupled, we'll be able to develop and critique them independently.

These suggestions are significantly different than both outlined roadmaps on Google Docs + Slack and could be uncalibrated given my lack of familiarity with the usual conversion process—so let me know if you have any feedback.

Once this proposal make sense, I can update the Figma accordingly.

@garrettmflynn garrettmflynn self-assigned this Mar 22, 2023
@CodyCBakerPhD
Copy link
Collaborator

CodyCBakerPhD commented Mar 22, 2023

Overall sounds like a good adjustment

Some things to consider in a revised roadmap...

Iterating over multiple sessions

A dataset can consist of multiple experiment types

Each experiment is defined by a separate NWBConverter (choice of interfaces, structure of source and differing presence of metadata)

Each experiment can have 1 or more sessions with the same subject, and each session may not use every single data interface associated with the converter (the data_interface_classes in the NWBConverter are all 'optional' in a sense) but this gives rise to different metadata availability on a per session basis

This is kind of why extending from single-session conversion to iteration over an experiment is going to be challenging to design - we've always solved it in code using series of if statements and pre-compiled session-wise mappings (especially for things like subject info) - but it's also really important for us to solve it so yes, whatever changes we can make to the workflow to make it easier to design, we should. I think your approach outlined here would be a good idea

@CodyCBakerPhD
Copy link
Collaborator

File management

For NWB output, I think we can always follow a DANDI-like structure even if the user doesn't immediately want to upload - explore https://dandiarchive.org/dandiset/000003/0.210812.1448/files?location= and others to familiarize with what that means in terms of folder structure (and we can also maybe just use that for default naming of the NWB files too)

@bendichter
Copy link
Collaborator

@garrettmflynn While we are talking about this, we are working on a feature that seems relevant here. It doesn't really have a name yet- let's call it "path expansion." Users would be able to provide generic paths for source data such as:

source_data = dict(
    phy=dict(file_path="my_data/{subject_id}/{session_id}"),
    spikeglx=dict(folder_path="other_directory/{subject_id}/{session_id}"),
)

We will look through the files and folders within the "my_data" directory and find all matches. The strings will use the fstring syntax to specify the form of the IDs, which are parsed using the parse library. This is sort of like a simplified regex language. We will unpack all matches, combine matches across interfaces, and auto-populate subsequent forms with the metadata we are able to extract from the path(to start, just subject_id and session_id), placing them in the corresponding place in the metadata for that session. This approach will be valuable for conversions that have many sessions, where it wouldn't be convenient to manually enter paths for each one.

See a draft PR for this feature here. The syntax looks like this right now. We still need to work out cross-platform support and testing on real data directories.

Once it is ready, this step should also go before the subjects step, since it will potentially give us a list of subject_ids we can use to pre-populate the subject metadata page.

At this stage, it might make sense to create a Figma page for this step.

@bendichter
Copy link
Collaborator

I don't really understand your proposal. Can you mock it up in Figma?

@garrettmflynn
Copy link
Member Author

Sounds good. I'll mock up the proposal with all of the above suggestions.

@garrettmflynn
Copy link
Member Author

garrettmflynn commented Mar 23, 2023

@bendichter Just updated the Figma with an "Updated Flow" section to show what this proposal could look like in practice. Many of the components are still rough—but I think it'll get the workflow changes across sufficiently.

There are quite a few modals (popups with information and forms), so it might be more useful to view using Present mode and following the Updated Flow.

@CodyCBakerPhD
Copy link
Collaborator

@garrettmflynn Still open? Or has this all been implemented now

@garrettmflynn
Copy link
Member Author

I haven't been basing any design decisions specifically on this proposal—though some things may have made their way through.

So I wouldn't say it's closed, but there might be a better way to categorize/tag this

@CodyCBakerPhD
Copy link
Collaborator

there might be a better way to categorize/tag this

Yeah I'd say it would be more helpful to open a new issue with an up to date summary based on what we have right now; from what I read from this convo it all seems implemented to me, but if I'm missing something it's hard to see just from combing through the evolving discussion

@garrettmflynn
Copy link
Member Author

Closed since #308 and #173 are practical ways to implement a good chunk of the workflow proposed here. The rest is a bit unrealistic based on my learnings over the last few months.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants