Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BEP044: Within-stimuli conditions #153

Open
adelavega opened this issue Feb 15, 2019 · 45 comments
Open

BEP044: Within-stimuli conditions #153

adelavega opened this issue Feb 15, 2019 · 45 comments

Comments

@adelavega
Copy link

stim_file columns in event files allow users to specify which stimuli files are associated with an event onset:

stim_file | OPTIONAL. Represents the location of the stimulus file (image, video, sound etc.) presented at the given onset time. ...

However, what this does not allow for is the specification of sub-conditions that occur during a long-running stimulus.

For example, in ds001545 a video file is presented which spans the entirety of the run. However, within each run/video there are 6 distinct conditions.

For example:

onset duration trial_type stim_file
6 90 Intact A cond1_run-01.mp4
105 90 Scramble Fix C cond1_run-01.mp4
204 90 Scramble Rnd B V1 cond1_run-01.mp4
303 90 Scramble Fix C cond1_run-01.mp4
402 90 Intact A cond1_run-01.mp4
501 90 Scramble Rnd B V2 cond1_run-01.mp4

IMO, the above example is invalid as the stim_file only has a single onset.
The following is an event file which has all the necessary information (note I'm having to guess when the onset of the stim_file is, it could actually be 0).

onset duration trial_type stim_file
6 540 n/a cond1_run-01.mp4
6 90 Intact A n/a
105 90 Scramble Fix C n/a
204 90 Scramble Rnd B V1 n/a
303 90 Scramble Fix C n/a
402 90 Intact A n/a
501 90 Scramble Rnd B V2 n/a

However, this is ambiguous as the conditions are only implied to occur during stimulus presentation due to the duration of the first row.

@tyarkoni suggests adding optional but strongly encouraged stim_onset and stim_offset columns. These would denote onsets within a stimulus.

@yarikoptic
Copy link
Collaborator

I would have made it

onset duration trial_type stim_file
6 540 Movie starts cond1_run-01.mp4
6 90 Intact A cond1_run-01.mp4
105 90 Scramble Fix C cond1_run-01.mp4
204 90 Scramble Rnd B V1 cond1_run-01.mp4
303 90 Scramble Fix C cond1_run-01.mp4
402 90 Intact A cond1_run-01.mp4
501 90 Scramble Rnd B V2 cond1_run-01.mp4

stim_onset/stim_offset - I guess could be added but would have redundant information which could be computed (and validated to not go beyond stimuli duration) from "Movie starts" for that stimuli and corresponding onset and duration. And we all know what happens when there is redundancy ;)

As for the hierarchical description of events -- isn't there https://bids-specification.readthedocs.io/en/latest/99-appendices/03-hed.html ? (never used it myself though)

@tyarkoni
Copy link

I'm not crazy about either of the solutions proposed above because, while both compliant with the current spec, neither one eliminates the fundamental ambiguity here, which is that you don't know which part of the clip is being presented. It also is kind of problematic from a BIDS-StatsModel standpoint, because it will cause almost all users to have to drop a Filter transformation into their model just to weed out the first row, since nobody is going to want that in their model.

The benefit of having optional stim_onsetand stim_offset columns is those would eliminate the ambiguity in question without making most model specifications more complex. What I don't like about this proposal is that the extra columns are essentially metadata—there's virtually no situation under which they would be treated like other non-mandatory columns (i.e., as containing design-relevant information).

The more I think about this, the more I lean towards maybe keeping the current approach and not codifying this at all in the _events.tsv files. Maybe the solution is to require a supplementary metadata file for the stimulus files that contains the onsets. I.e., cond1_run-01.mp4 would have to have a cond1_run-01.json file that has fields PresentationOnset and PresentationOffset. But even that isn't sufficient, because presentation onset/offset can vary not just by stimulus, but also by event...

Should we just say this is in the 20% (really more like 1%) and not worry about it?

@yarikoptic
Copy link
Collaborator

BTW, ... Do they actually would need to filter then out? Why don't you want them to model that entire "super" condition as well? If there are different movie cuts, you might want them explicitly in the model, even if only to absorb transition (if it visible) between different stimuli. If there is only one big one for the entire run - well, it will largely be your constant. If there design disbalance and stimuli files have subtle unique features to them (differently trimmed, color scheme, audio volume level), having them modeled might save us from one other possible retraction.

The only problem I see is if all the trials follow each other in such a way that model becomes degenerate if the whole stim file condition is present too.
So, overall, it might be specific design related.

The only cons is that may be those stimuli onset and duration are actually of interest to other tools, not just the linear model, so they would need to recompute them as well. But it shouldn't be too hard.

As for extra unused meta data - I would say the more the merrier. My main concern is the fear of it being redundant and this requiring "manual" recomputation if I find that eg I need to fix onset. Then I will forget and the stimuli onset value will no longer be valid

@adelavega
Copy link
Author

I would agree this probably falls into the 1% as the majority of experiments don't have sub-conditions within a stimuli. And so in 90% of cases, the mention of a stimulus indicates a complete presentation, so this is such a rare situation its probably not worth putting in the spec itself.

I still think it might be worth clarifying that including a stimulus in stim_file does not necessary indicate that the stimulus is played from the beginning (which is what I thought on first read).

@satra
Copy link
Collaborator

satra commented Feb 16, 2019

i'm not sure this is 1%. in many standard experiments, there are sub conditions. for example in experiments that involve showing faces/objects there are often sub categories: emotions, types of objects, types of faces (human faces/animal faces). in fact the modified hariri task is a perfect example of this, and gets used by emotion/mood researchers a lot.

i don't think we should reinvent ontologies of stimuli (e.g., paradigms - https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3682219/, audio - https://research.google.com/audioset/ontology/index.html, images - https://bioportal.bioontology.org/ontologies/BIM).

but provide a way where stimulus properties can be encoded appropriately.

The more I think about this, the more I lean towards maybe keeping the current approach and not codifying this at all in the _events.tsv files. Maybe the solution is to require a supplementary metadata file for the stimulus files that contains the onsets. I.e., cond1_run-01.mp4 would have to have a cond1_run-01.json file that has fields PresentationOnset and PresentationOffset.

i like the idea of a json going alongside a stimulus file, but this json should be able to reflect timed objects inside it.

@satra
Copy link
Collaborator

satra commented Feb 16, 2019

just to follow up:

  1. so in case of the hariri task, trial_type can represent the most dominant trial_type (faces, objects for example, or for mood researchers neutral/angry/etc). then stimulus properties could somehow represent not only details like cropping/full frame, colorspace, etc., but also ontological objects like this is an image/video of a face.

  2. so in case of a movie, events.tsv could simply say i showed this clip for 240s. the stimulus event should have a json file that can encode many different types of extracted events within the clip.

  3. another option is to allow multiple events files, and any model has to refer to a specific event file (may be we allow composition of event files).

@tyarkoni
Copy link

@satra by "sub-conditions" here we're not talking about hierarchical organization, we're talking about a temporal subset of a single file. Codifying hierarchical structures is IMO not in scope, but in any case presents no particular challenge from an events.tsv perspective, because you can just put the filename for each event in the stim_file column, and the analyst is welcome to do whatever they want with that. The case we're talking about is where you have, say, an 8-minute movie file identified as the stim_file, but the presentation starts halfway through that clip. In such cases the analyst needs to have some way to know that the onset of the presentation isn't synced with the onset of the event. But this seems like an edge case (indeed, I'm pretty sure this is the first BIDS dataset we've run into where it's an issue), so the proposal is to just let it be.

@tyarkoni
Copy link

  1. then stimulus properties could somehow represent not only details like cropping/full frame, colorspace, etc.

I think this is analogous to the movie example, but I still think it's an edge case. Situations where researchers dynamically crop images are likely to be pretty rare; in most cases, the cropping will have been done in advance, and what's in the stimuli/ folder will be what was presented to the subject.

I think a reasonable way to update the spec is to strongly encourage users to provide files in stimuli/ that are as close as possible to the ones participants actually experienced. That means temporally or spatially cropping movies and images if needed. But I agree with @adelavega that we should also explicitly say that there is no actual guarantee that the contents of stimuli map perfectly onto what participants experienced.

@satra
Copy link
Collaborator

satra commented Feb 16, 2019

@tyarkoni - sorry i misunderstood the within stimuli conditions, so please ignore the ontological variations (although see last paragraph below).

for movie, i'm thinking of things like commercial clips that are shown, and i'm sure that certain clips cannot be shared.

for movies as an example are you saying i can extract faces, then specific emotions on those faces, and then encode both face and face+emotion in the events file, kind of a redundant stimulus list. all possible events in trial_type and then the analyst figures out which trials are of interest? for many of our tasks, that would work pretty well.

@tyarkoni
Copy link

for movie, i'm thinking of things like commercial clips that are shown, and i'm sure that certain clips cannot be shared.

I don't know that we can do anything about this, short of asking people to provide a description of where/how to obtain stimuli that can't be publicly shared. I don't think it's worth trying to codify this—there's too much variability in what that procurement process could look like.

for movies as an example are you saying i can extract faces, then specific emotions on those faces, and then encode both face and face+emotion in the events file, kind of a redundant stimulus list.

Sure, you can create arbitrary columns in events.tsv that code anything you like. Aside from stim_file, you could add columns for face_id, face_gender, face_age, face_emotion_rater1, face_emotion_rater2, face_emotion_avg, and anything else you like. The expectation is that you then put descriptions of columns in the data dictionary in the JSON sidecar, though I believe this is non-mandatory right now.

@adelavega adelavega reopened this Feb 16, 2019
@Remi-Gau
Copy link
Collaborator

This is an old one.

I wonder if HED tags can help with such issue. @VisLab do you have some opinion on this?

@VisLab
Copy link
Member

VisLab commented Sep 14, 2023

As it turns out the HED Working Group has been discussing this very issue and some of our members will weigh in shortly with a concrete proposal --- @neuromechanist @dorahermes @tpatpa @dungscout96 @monique2208 @makeig

@dorahermes
Copy link
Member

Yes, I agree that HED tags can come in useful here and probably tackle this issue. When working through an example it seems like this may be a relatively larger contribution with some added machine readable files in the /stimuli/ folder. When starting to work a visual images and movie example with @neuromechanist it seems that there would be a need for community input for review and other examples such as e.g. auditory, motor, electrical stimulation, etc as well. This seems to perhaps go to the scope of a potential BEP. Should we open a separate GitHub issue to discuss whether to open a BEP or continue here?

@neuromechanist could share a preliminary google doc (not BEP yet, just the examples we were working through) if that would help give an idea?

Tagging some people who previously contributed to this discussion for input: @adelavega @tyarkoni @yarikoptic @satra @Remi-Gau

@Remi-Gau
Copy link
Collaborator

if we are talking about a BEP to help organize stimuli then there is overlap with : #751

@neuromechanist
Copy link
Member

neuromechanist commented Sep 16, 2023

Reading here and #751 resonates closely with the challenges we are exploring for including image and movie annotations into a couple of massive datasets we are working on.
@dorahermes, and @tpatpa are working on the annotation of the Natural Scene Dataset, and @smakeig, @dungscout96, and I are working toward Healthy Brain Network's movie annotation.

In both projects, we see the need for top-level annotation files that would be used in the downstream *_events.tsv.

In this Google Doc, we are exploring the possibility of a file such as stimuli/stimuli.tsv to hold a list of the stimulus files and possible annotations (stimuli/stimuli.tsv is very similar to stims.tsv discussed in #751).

A sample stimuli.tsv file would look like this:

stim_file type NSD_id COCO_id first_COCO_description HED
nsd02951.png still_image 2951 262145 “an open market full of people and piles of vegetables.” ((Item-count, High), Ingestible-object)), (Background-view, ((Human, Body, Agent-trait/Adult), Outdoors, Furnishing, Natural-feature/Sky, Urban, Man-made-object))"

If the stimulus file has a time-varying context (such as a movie), a separate *_stimulus.tsv will hold the annotations. The structure of *_stimulus.tsv would be very similar to *_events.tsv with onset, duration fields, etc.
In any case, including the stim_file name in the *_events.tsv's stim_file column would link the task events (*_events.tsv) and stimulus annotation (stimuli.tsv and *_stimulus.tsv).

We believe this method will make the annotation of stimulus files more reusable; researchers can reuse the stimulus files and select the stimuli.tsv rows (and *_stimulus.tsv files) of their choice for their new studies.
Also, reusing the dataset with alternate annotations for the same stimulus files would be as straightforward as adding a column to *_stimulus.tsv or replacing the whole file with a new one.

We appreciate your thoughts and comments on the Google Doc, as well as here. Our use cases are limited to a couple of visual and audiovisual stimuli. Many other stimulation types may require other arrangements. We appreciate that you also include examples of other stimulus types, if possible.

@dorahermes
Copy link
Member

@bids-standard/maintainers would be great to hear your thoughts on whether this is worthy of a small BEP, thank you!

@Remi-Gau
Copy link
Collaborator

Maybe not a BEP but several small orthogonal pull requests?

I can try to bring it up at the next maintainers meeting.

@neuromechanist
Copy link
Member

neuromechanist commented Dec 20, 2023

Following hed-standard/hed-python#810, it seems that expanding the _events.tsv files, with what was called subconditions in the first post of the issue, is a remodeler issue. Nevertheless, the remodeler would require rules and guidelines to remodel the _events.tsv with the contents of the stimuli/ directory.

As described in the HED issue above and also in the GDoc we are drafting for this issue, there could be two variations of this issue:

  1. Column-only extension for still stimuli, so that only specific columns (and annotations) would be added to the _events.tsv.
  2. Row extension with the possibility of column extension, in which the contents of a specific stimulus file will be merged with the contents of the _events.tsv.

A working example for the second case, which is the main focus of this issue, is the following scenario:
In the CMI Healthy Brain Network project, subjects watch the Present movie during fMRI and EEG sessions, among other tasks (see a sample of the EEG-BIDS dataset).

The events for the Present movie are limited to the start and stop of the video:

onset duration sample value event_code
0.000 0.002 0 9999 9999
2.034 0.002 1017 video_start 84
205.098 0.002 102549 video_stop 104

However, it is clear that a movie contains far more events, and researchers would desire to provide their annotations based on their application. As a straightforward example, we identified the shot transition events and quantified the Log Luminance Ratio of this shot transition. The file included in the dataset as stimuli/the_present_stimulus-LogLumRatio.tsv:

onset duration shot_number LLR
0 n/a video_start video_start
0 7.25 1 n/a
7.25 3.542 2 -1.557820733
10.792 5.208 3 0.3358234903
16 5 4 -0.03306866929
21 4.208 5 -0.2070276568
... ... ... ...
165.25 6.667 55 -0.2270603551
171.917 31.292 56 0.1188704433
203.208 n/a video_stop video_stop

To merge the _stimulus.tsv into the _events.tsv after the initial import process (i.e., remodeling the events table) into EEGLAB, I have made a function that:

  1. gets the EEG structure, the _stimulus.tsv, and the names of the columns for extension,
  2. finds the common event names (here, video_start and video_stop) between the value column and the mentioned columns for extension,
  3. compares/corrects the timelines of the common events,
  4. merges the events of the _stimulus.tsv
  5. recreates EEG.event structure

This implementation is far from perfect, but it could serve as a working example of the implications of this mechanism for large and very large datasets. The Healthy Brain Network Project spans over 7000 subjects with EEG and fMRI, and this mechanism will help dynamically use event annotations based on the research's use case.

@adelavega
Copy link
Author

I haven't had time to look at the entire proposal in detail, but overall the concept of annotating stimuli seperately from the _events.tsv file seems like a reasonable proposal, as it allows for the inclusion of detailed stimuli annotations, without fundumentally changing the way _events.tsv works

@neuromechanist
Copy link
Member

neuromechanist commented Apr 12, 2024

Following 4/12's conversations with @Remi-Gau, @adelavega, @yarikoptic, @arnodelorme and @dungscout96, there is quite an enthusiasm to provide structure for the stimuli/ directory.

@yarikoptic and I jotted on the Google Doc to modify the suggestions to a (directory-less) BIDS naming structure, which also follows the ideas in #751.

Based on the Google Doc example, here is a draft suggestion:

stim-present_???.mp4|mkv|jpg|png
stim-present_???.json 
[stim-present_annot-loglum_events.tsv]
[stim-present_annot-loglum_events.json]
…
stimuli.tsv
stimuli.json
  • The stim- prefix distinguishes the files from the sub- files, indicating that the stimulus files are independent/seperate of the subjects. The ??? suffix follows the common principles rule but needs to be decided.
  • The events.tsv file accommodates annotating time-varying stimulus files (that is, within-stimuli conditions).
  • The annot- provides the opportunity to have different annotations (and events.tsv files) per single stimulus file.
  • Similar to participants.tsv, the stimuli.tsv contains a list of the stimulus files, with optional columns.
  • Similar to participant_id, a stim_id points to unique stimulus files. It is up to the user/tools to decide which annotations should be used for the respective stim_id.

TODO:

  • PR to add annot and stim entities.
  • Decide on the ??? suffix. (media?!)
  • Create an example stimuli/ directory with the suggested structure.
  • PR to suggest the stimuli/ directory structure (potentially as a continuation of the stimuli BEP #751 BEP, a new BEP, or an ENH).

CC @VisLab, @dorahermes, and @monique2208 for comment.

@adelavega
Copy link
Author

adelavega commented Apr 12, 2024

Looks good, but I'm concerned that mandating stimuli have a specific name would make this backwards incompatible w/ existing datasets (which name stimuli files whatever they want, and just refer to them in the _events.tsv files)

It's a minor concern, but it just seems slightly out of scope to mandate a new way to name stimuli files. Would this required overall even if you do not have annotations?

@adelavega
Copy link
Author

Seems like there was discussion regarding the top level stim- prefix here: #751

@VisLab
Copy link
Member

VisLab commented Apr 12, 2024

Looks good, but I'm concerned that mandating stimuli have a specific name would make this backwards incompatible w/ existing datasets (which name stimuli files whatever they want, and just refer to them in the _events.tsv files)

Not sure the proposal has to be backwards incompatible:

Now: events.tsv with stim_file column value xxx/yyy.zzz implies a file in ./stimuli/xxx/yyy.zzz.

Potential proposal: the above stays the same... but...

In the ./stimuli/stimuli.tsv file, the row for this file has first column value: ./stimuli/xxx/yyy.zzz and other columns can appear as defined in ./stimuli/stimuli.json file.

Suppose that the stimulus file is a movie with annotations then in ./stimuli/xxx directory there can be a yyy_arbitrarystuff_annot.tsv and yyy_arbitrarystuff_annot.json that are intrepreted as annotations for yyy.zzz. (Multiple raters may be available.)

The directory structure within the ./stimuli folder can be arbitrary as it is now.

@neuromechanist
Copy link
Member

Current contenders for the stimuli modality suffix include:

  1. _stimulus (example: stim-the-present_stimulus.mp4)
  2. _media (example: stim-the-present_media.mp4)
  3. _stream (example: stim-the-present_stream.mp4)

Feel free to let me know if you have any other suggestions and which one you prefer, so I can update the list.

@adelavega
Copy link
Author

_stimulus seems oddly redundant with the stim- prefix, otherwise I slightly prefer _media but have no strong opinions.

@yarikoptic
Copy link
Collaborator

In the spirit of the future BIDS 2.0 with e.g.

@neuromechanist
Copy link
Member

Ok,sounds great. It seems that proposing stim and annotentities have a good support. I'll make a pull request for them.

The suffix may need more consideration. Currently, _media seems to have more appeal.

Just a note that there is already _stim suffix for individual stimulus files defined under physio data type. But, I believe that these two use cases have little relation to each other.

@neuromechanist
Copy link
Member

neuromechanist commented Apr 19, 2024

Also, should we convert this issue to a BEP? Converting to BEP hopefully makes the enhancements more visible and maintainable (although, it will also require more work).

Talking to @yarikoptic and @dorahermes, they both seem to support a BEP for this issue.

@neuromechanist
Copy link
Member

neuromechanist commented May 6, 2024

Added PR #1814 to add stimulus and annotation entities and the stim_id column.

The next steps would require inputs for:

  • suffix: (_media)
  • multi-track stimuli (choose one of the currently available part-, chunk- or split- entities)
  • Contents of the stimuli.tsv
  • Decision if all should be a series of PRs or a consolidated BEP

@monique2208
Copy link
Contributor

It would be great to have this formalized! We have a large number of datasets where we present the same short movie as a localizer. Having one general annotation file which could apply to all of these datasets would really help with the analysis, it would remove a lot of redundancy in the event files and and I think it would provide something interesting to share on its own.

@yarikoptic
Copy link
Collaborator

[ ] suffix: (_media)

Our suffixes so far can correspond to a number of things, but most typically quite specific to "data modality", so here we might want to be more specific too, e.g. have _audio, _video, _audiovideo (or _audio+video in some future BIDS where + would be allowed there) even though most often could be discerned from an extension (but not necessarily).

@effigies
Copy link
Collaborator

effigies commented Aug 6, 2024

+1 for _audio, _video and _audiovideo. It would make it easy to set permitted extensions for audio and video separately, and then just take the intersection for audiovideo.

@neuromechanist
Copy link
Member

@yarikoptic, @dorahermes and I will meet on Tuesday 8/13 at 10 am PT to discuss the progress and the next steps. Please reach out to me if you want to join the conversation and I'll share the meeting details.

@neuromechanist
Copy link
Member

neuromechanist commented Aug 6, 2024

[ ] suffix: (_media)

Our suffixes so far can correspond to a number of things, but most typically quite specific to "data modality", so here we might want to be more specific too, e.g. have _audio, _video, _audiovideo.

Probably we should include _image too. Agreed that with separated entities, checking the file extensions are much easier.

@neuromechanist
Copy link
Member

@yarikoptic, @dorahermes, @TheChymera, and I joined the meeting. We agreed that the broad scope of the changes (including adding a prefix, a couple of entities, and suffixes) and their usability in several fields (EEG, fMRI, EEG, ...) justifies requesting a BEP.

@bids-maintenance, could you help raise this issue and elevate it to a BEP?

A couple of other discussion points during the meeting were: 1) Adopting _part for multi-part stimulus files, 2) discussion about the stimulus type and how it should be documented in the stimuli.tsv, 3) file suffixes, 4) whether to allow json-only files when the stimulus files are not present (for example to describe device and conditions that presented the stimulus files), and 5) to resolve the concluded comments and reviews in the Google Doc.

The main discussion is on this Google Doc. The next meeting will be on August 27th at 10 a.m. PT.

@yarikoptic
Copy link
Collaborator

FTR: requesting BEP044 for this effort:

@neuromechanist
Copy link
Member

The second meeting with @yarikoptic, @dorahermes, @VisLab, and I was held on 8/27, with discussions on removing _part from the proposed specs, the range of suffixes to consider, and mechanisms to allow stimulus-id and annotations without the original files present.

@neuromechanist
Copy link
Member

We are officially BEP044. Congratulations to everyone for their hard work and persistence on this issue and topic 🎉.

@bids-maintenance, @adelavega, do you mind updating the issue name to reflect the BEP number? Thanks a lot

@adelavega adelavega changed the title Within-stimuli conditions BEP044: Within-stimuli conditions Sep 4, 2024
@neuromechanist
Copy link
Member

We made significant progress with the biweekly meetings. Starting next week, September 17th, the meetings will be at 9 am ET/ 3 pm CET, biweekly on Tuesdays, to ensure a more suitable timing for everyone. Please let me know if you would like to join.

@neuromechanist
Copy link
Member

We discussed provisioning part- for mutli-part stimulus files that share the same stimulus ID. Possible examples are longer movies that would be divided in parts/sections for computational efficiency (or to test potential effects of each section). We will update the current examples for this BEP and also add stimulus and annotations from the Forrest Gump movie study to the example set. @yarikoptic, @dorahermes, and I were present for this meeting on October 15th.

@dorahermes
Copy link
Member

We discussed provisioning part- for mutli-part stimulus files that share the same stimulus ID.

It would be useful to have some discussion on the use of part- for this use case. The part- entity is currently defined as "This entity is used to indicate which component of the complex representation of the MRI signal is represented in voxel data." Is it possible to add to this definition or if this a hard conflict? @effigies @Remi-Gau @adelavega

@yarikoptic
Copy link
Collaborator

IMHO it would be ok to offer relaxation of the semantic here to make it not MRI specific and reuse part similarly in other modalities. It is just that validation specification might become trickier since for movies label wouldn't come from controlled vocab like in MRI case.

@effigies
Copy link
Collaborator

There is the existing split-<index> entity:

Definition: In the case of long data recordings that exceed a file size of 2Gb, .fif files are conventionally split into multiple parts. Each of these files has an internal pointer to the next file. This is important when renaming these split recordings to the BIDS convention.

Instead of a simple renaming, files should be read in and saved under their new names with dedicated tools like MNE-Python, which will ensure that not only the filenames, but also the internal file pointers, will be updated.

It is RECOMMENDED that .fif files with multiple parts use the split- entity to indicate each part. If there are multiple parts of a recording and the optional scans.tsv is provided, all files MUST be listed separately in scans.tsv and the entries for the acq_time column in scans.tsv MUST all be identical, as described in Scans file.

This seems to fit the use-case, although the definition should be made less hyper-specific.


As to part, there's no technical reason it couldn't be used differently. Historically the SG has been reluctant to allow overlap of path components, e.g., BEP001's flip angle was changed from fa-<index> to flip-<index> to avoid potential confusion with the BEP016 suffix _FA (fractional anisotropy), as it was proposed at the time.

@neuromechanist
Copy link
Member

We considered available terms with similar meanings, including, chunk, split and part, with part being the most inclusive term for different use cases. One thought was that split as used in its original entity definition, conveys that the files are contiguous (not overlapping, yet immediately coming after each other).

We should, however, allow for overlapping or any other configuration that researchers would need (such as select book chapters, positive/negative valence) under the same stimulus id. In the Forrest Gump movie example, there is a few-second overlap between each of the eight parts and the next/previous part, so the stimulus and the annotations are not split per se, but rather parts.

I'd also like to ask if we should consider introducing a new entity instead of part to avoid confusion in BIDS terms?

Today, @dorahermes, @VisLab, and I reviewed the examples bids-standard/bids-examples#433, with the most recent changes and also added HED annotation for a couple of events.json files. We also discussed stim-<label>.json sidecar and how annotation can be included for images and time-varying stimuli (like videos). It seems that the document is on track to be finalized in the next couple of meetings.

@neuromechanist
Copy link
Member

neuromechanist commented Nov 12, 2024

@dorahermes, @VisLab and I discussed the remaining work for this BEP:

  • Add Forrest Gump Examples
  • Resolve BEP document comments. Send the BEP to collaborators for review and final comments
  • Final edits to the Examples and convert the Examples Draft PR to a PR
  • Assemble a list of the changes/additions to be made on the Specifications.
  • Make the BEP PR
  • Add an example with proper HED annotations (@VisLab)
  • Add an example with annotation of 1k images from NSD (@dorahermes)

We do not plan to hold any more biweekly meetings. Thanks to all who contributed to and supported this effort 🙌🏼.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants