Proposed Workflow Changes to NWB GUIDE #58

garrettmflynn · 2023-03-22T16:16:12Z

As discussed in our first meeting for NWB GUIDE, we can't actually generate anything about missing metadata for the conversion without accessing files. After completing a test conversion on minimal data (#52), I realized that this includes subject metadata and, as such, filling out these details would be best after specifying source data—not before as outlined in the Figma and the Catalyst Neuro Slack channel.

Additionally, we'd discussed expanding the scope of a configuration over time:

Eventually we will have a concept of project -> experiment -> session where an “experiment” can be a different rig configuration within the same project and in this case each experiment would have its own NWBConverter

The Proposal

Consequently, it seems like we'd want to break up the workflow into three uncoupled steps:

Conversion Design: Create and validate a conversion pipeline using data from a model subject/session while only specifying the required metadata
Run Conversion: Apply a validated conversion pipeline across multiple sessions/subjects in an experiment, while specifying a parent project (if known)
NWB File Management: Group the results of (2) into a project and manage experiments as necessary

For (3), we would want to track source and output NWB files and, as such, could additionally manage other proposed aspects of NWB GUIDE (e.g. DANDI publication) independently from the conversion steps.

Discussion

In the short term, this updated workflow would allow us to continue allowing users to convert single subjects as the validation step for Step 1, while also allowing us to seamlessly add Steps 2 and 3 when necessary. And since these steps are designed to be uncoupled, we'll be able to develop and critique them independently.

These suggestions are significantly different than both outlined roadmaps on Google Docs + Slack and could be uncalibrated given my lack of familiarity with the usual conversion process—so let me know if you have any feedback.

Once this proposal make sense, I can update the Figma accordingly.

CodyCBakerPhD · 2023-03-22T17:53:21Z

Overall sounds like a good adjustment

Some things to consider in a revised roadmap...

Iterating over multiple sessions

A dataset can consist of multiple experiment types

Each experiment is defined by a separate NWBConverter (choice of interfaces, structure of source and differing presence of metadata)

Each experiment can have 1 or more sessions with the same subject, and each session may not use every single data interface associated with the converter (the data_interface_classes in the NWBConverter are all 'optional' in a sense) but this gives rise to different metadata availability on a per session basis

This is kind of why extending from single-session conversion to iteration over an experiment is going to be challenging to design - we've always solved it in code using series of if statements and pre-compiled session-wise mappings (especially for things like subject info) - but it's also really important for us to solve it so yes, whatever changes we can make to the workflow to make it easier to design, we should. I think your approach outlined here would be a good idea

CodyCBakerPhD · 2023-03-22T17:54:55Z

File management

For NWB output, I think we can always follow a DANDI-like structure even if the user doesn't immediately want to upload - explore https://dandiarchive.org/dandiset/000003/0.210812.1448/files?location= and others to familiarize with what that means in terms of folder structure (and we can also maybe just use that for default naming of the NWB files too)

bendichter · 2023-03-22T20:28:04Z

@garrettmflynn While we are talking about this, we are working on a feature that seems relevant here. It doesn't really have a name yet- let's call it "path expansion." Users would be able to provide generic paths for source data such as:

source_data = dict(
    phy=dict(file_path="my_data/{subject_id}/{session_id}"),
    spikeglx=dict(folder_path="other_directory/{subject_id}/{session_id}"),
)

We will look through the files and folders within the "my_data" directory and find all matches. The strings will use the fstring syntax to specify the form of the IDs, which are parsed using the parse library. This is sort of like a simplified regex language. We will unpack all matches, combine matches across interfaces, and auto-populate subsequent forms with the metadata we are able to extract from the path(to start, just subject_id and session_id), placing them in the corresponding place in the metadata for that session. This approach will be valuable for conversions that have many sessions, where it wouldn't be convenient to manually enter paths for each one.

See a draft PR for this feature here. The syntax looks like this right now. We still need to work out cross-platform support and testing on real data directories.

Once it is ready, this step should also go before the subjects step, since it will potentially give us a list of subject_ids we can use to pre-populate the subject metadata page.

At this stage, it might make sense to create a Figma page for this step.

bendichter · 2023-03-22T20:29:35Z

I don't really understand your proposal. Can you mock it up in Figma?

garrettmflynn · 2023-03-22T21:07:51Z

Sounds good. I'll mock up the proposal with all of the above suggestions.

garrettmflynn · 2023-03-23T15:52:33Z

@bendichter Just updated the Figma with an "Updated Flow" section to show what this proposal could look like in practice. Many of the components are still rough—but I think it'll get the workflow changes across sufficiently.

There are quite a few modals (popups with information and forms), so it might be more useful to view using Present mode and following the Updated Flow.

CodyCBakerPhD · 2023-05-06T16:42:06Z

@garrettmflynn Still open? Or has this all been implemented now

garrettmflynn · 2023-05-06T16:45:28Z

I haven't been basing any design decisions specifically on this proposal—though some things may have made their way through.

So I wouldn't say it's closed, but there might be a better way to categorize/tag this

CodyCBakerPhD · 2023-05-06T16:48:26Z

there might be a better way to categorize/tag this

Yeah I'd say it would be more helpful to open a new issue with an up to date summary based on what we have right now; from what I read from this convo it all seems implemented to me, but if I'm missing something it's hard to see just from combing through the evolving discussion

garrettmflynn · 2023-08-22T00:06:44Z

Closed since #308 and #173 are practical ways to implement a good chunk of the workflow proposed here. The rest is a bit unrealistic based on my learnings over the last few months.

garrettmflynn self-assigned this Mar 22, 2023

garrettmflynn mentioned this issue Mar 23, 2023

Separate successful conversions from drafts #54

Closed

garrettmflynn closed this as completed Aug 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposed Workflow Changes to NWB GUIDE #58

Proposed Workflow Changes to NWB GUIDE #58

garrettmflynn commented Mar 22, 2023 •

edited

Loading

CodyCBakerPhD commented Mar 22, 2023 •

edited

Loading

CodyCBakerPhD commented Mar 22, 2023

bendichter commented Mar 22, 2023

bendichter commented Mar 22, 2023

garrettmflynn commented Mar 22, 2023

garrettmflynn commented Mar 23, 2023 •

edited

Loading

CodyCBakerPhD commented May 6, 2023

garrettmflynn commented May 6, 2023

CodyCBakerPhD commented May 6, 2023

garrettmflynn commented Aug 22, 2023

Proposed Workflow Changes to NWB GUIDE #58

Proposed Workflow Changes to NWB GUIDE #58

Comments

garrettmflynn commented Mar 22, 2023 • edited Loading

The Proposal

Discussion

CodyCBakerPhD commented Mar 22, 2023 • edited Loading

Iterating over multiple sessions

CodyCBakerPhD commented Mar 22, 2023

File management

bendichter commented Mar 22, 2023

bendichter commented Mar 22, 2023

garrettmflynn commented Mar 22, 2023

garrettmflynn commented Mar 23, 2023 • edited Loading

CodyCBakerPhD commented May 6, 2023

garrettmflynn commented May 6, 2023

CodyCBakerPhD commented May 6, 2023

garrettmflynn commented Aug 22, 2023

garrettmflynn commented Mar 22, 2023 •

edited

Loading

CodyCBakerPhD commented Mar 22, 2023 •

edited

Loading

garrettmflynn commented Mar 23, 2023 •

edited

Loading