-
Notifications
You must be signed in to change notification settings - Fork 168
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH] Define channel column for events and Delimiter field for column descriptions #1483
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Perhaps we could extend the example here? --> https://bids-specification.readthedocs.io/en/latest/common-principles.html#tabular-files
I would be up for an added example in https://bids-specification.readthedocs.io/en/latest/modality-specific-files/task-events.html, but I think it would be cluttering in common principles. @christinerogers Would you be able to put together a small example of how |
52e80eb
to
3c6b978
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #1483 +/- ##
=======================================
Coverage 87.83% 87.83%
=======================================
Files 16 16
Lines 1356 1356
=======================================
Hits 1191 1191
Misses 165 165 ☔ View full report in Codecov by Sentry. |
Sure @effigies - @Andesha and i will get back to you on that Also, point taken about cluttering common principles, though how about a simple "See also" link, @effigies ? Just so the pointer is there to help the next person embedding a data structure to find supported formats. Would help avoid divergence across modalities and major headaches for the bids-validator and other software platforms at the end of the day. |
If this is still awaiting an example, here is a link to an HED-SCORE example undergoing review of how channel would be used in an EEG dataset in the BIDS examples repository. |
Thanks for that. It looks like that has both comma separation and dashes with unclear meaning. Would this be considered valid under the current proposal and everybody agrees What tools are generating and consuming these columns at this point? |
Thanks for noting. This is actually something that should be corrected in this example. This example is an EEG set with a bipolar montage, so each channel is a difference between 2 channels, hence the channel name is typically
The artifact and seizure examples were translated to BIDS from the TUH EEG corpus . This is a large dataset where EEG data were annotated by experts. I use custom code to process the channel column. Perhaps @christinerogers can comment on other tools that are generating and consuming these. |
agreed, I would make this REQUIRED, because otherwise it may lead to confusion (i.e., when documenting that an event is related to channels that are nowhere to be found in the data). |
That sounds pretty painful to validate, especially trying to write as a declarative rule instead of custom code. We need to take each entry in a channel column, check to see if there's a specified delimiter (falling back to comma) in the |
I will add an example within the next days/weeks and merge this. We have agreed on this at the BIDS derivatives meeting Copenhagen 2023. |
3c6b978
to
df07a15
Compare
looping @dorahermes and @Andesha in here --- Dora does this work for you as well? |
This is adding "Delimiter" as a valid sidecar field for any column in any TSV. What is the obligation of the validator for a column that declares a "Delimiter"? If I have channels that include I think at best this should be a warning, because otherwise the complexity of writing a validation rule in schema that covers all edge cases is going to be painful. |
Or we make the However, the majority of people in #1146 was in favor of a (in my opinion only "seemingly") simpler solution that is currently proposed in this PR.
you mean that IF
THEN
???
yes, but currently defined columns other than
I think if you wanted to specify a channel |
I would not be opposed to requiring DELIMITER if column is to be a list.
You are right in that it will be clearer and it is better to avoid coupling
requirements from multiple files.
…On Thu, Nov 2, 2023, 2:20 PM Stefan Appelhoff ***@***.***> wrote:
If I have channels that include F,1 and I specify a channel column with no
delimiter because I do not use lists, I think the proposed behavior above
would cause us to parse this as two channels (F and 1) which do not exist
in channels.tsv.
Or we make the Delimiter metadata REQUIRED. If it's not declared, then
channels cells are not parsed as arrays.
However, the majority of people in #1146
<#1146> was in
favor of a (in my opinion only "seemingly") simpler solution that is
currently proposed in this PR.
------------------------------
I think at best this should be a warning, because otherwise the complexity
of writing a validation rule in schema that covers all edge cases is going
to be painful.
you mean that IF
1. channels is a column in channels.tsv
2. cells in the channels column contain ,
3. no Delimiter is declared in channels.json under the channels key
THEN
1. we warn
???
This is adding "Delimiter" as a valid sidecar field for any column in any
TSV.
yes, but defined columns other than channels do not declare that their
cells are ,-separated arrays by default ... so how I read it, for all non-
channels-columns, you would HAVE TO declare Delimiter to parse cells as
arrays.
—
Reply to this email directly, view it on GitHub
<#1483 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAJCJOQA2EFUKAFTQ3F66RTYCQFCVAVCNFSM6AAAAAAXS6CDWGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTOOJRGU2TGOJVGY>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
when talking about validation, what exactly do you want to validate?
is that it? If you expected an error instead of a warning, we need to tune the language to include something like.
Was there something else you wanted to validate that I am not thinking of right now? I tagged some folks for reviews. |
Yes, and the check you describe is the one I'm thinking of. The wording would indicate to me that an error needs to be raised if the check fails. I should clarify though: I don't want to validate this at all because it's a pain that requires aggregating data from a minimum of three files (events.tsv, events.json, channels.tsv), handling If XEG users/developers are happy to handle this without the validator producing a warning or error, then that works for me. The size of the audience for this whose use wouldn't be mediated by a tool seems like it's probably small? |
yes.
given that we do not have said language in the text currently, the situation is slightly ambiguous. We are neither requiring, nor recommending that each listed channel be in I am fine not validating it, which probably effectively means that we make it OPTIONAL whether or not listed channels are in I would, however, be in favor of adding REQUIRED language to the spec, but putting off validating to a future timepoint (potentially never), or non-bids-validator tools. Please react with a thumbs-up here if you agree, and then I'll add the language. Or comment otherwise. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made a minor comment but should not block merging.
Oh wow, I really thought I'd read the language that they have to line up. My brain is failing me... I will leave it to your judgment whether to add the language, but I would treat a MUST as a commitment to eventually either validate it or relax the rule. If we don't add the MUST then I think it can only ever at most be a warning for backwards compatibility's sake, but the difficulty of the validation might justify limiting to a warning. |
Thanks, I am fine with a warning then. It seems reasonable to me. Waiting some days with this for potential further reviews -- if none are coming in, I will merge it, as this has been discussed at length. |
Could you wait until Sunday to merge. I'd like to look at more carefully
but won't have access to a computer until then. Thx
…On Tue, Nov 7, 2023, 7:13 AM Stefan Appelhoff ***@***.***> wrote:
Thanks, I am fine with a warning then. It seems reasonable to me. Waiting
some days with this for potential further reviews -- if none are coming in,
I will merge it, as this has been discussed at length.
—
Reply to this email directly, view it on GitHub
<#1483 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAJCJOS3IAQ6YXP6B5ABGRLYDJFY7AVCNFSM6AAAAAAXS6CDWGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTOOJYHAZTONBRG4>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
@@ -10,6 +10,11 @@ TaskEvents: | |||
response_time: optional | |||
HED: optional | |||
stim_file: optional | |||
channel: | |||
level: optional | |||
description_addendum: | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't completely understand why this is in the task.yaml file. Is task.yaml where it determines if tabular data applies to a particular modality?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
task.yaml
is an arbitrary name. It could have been called events.yaml
, but the rule schema.rules.tabular_data.task.TaskEvents
defines what are valid columns in an _events.tsv
file. The filename construction (and hence what modalities apply) are defined in the filename rules:
bids-specification/src/schema/rules/files/raw/task.yaml
Lines 2 to 19 in 426de56
events: | |
suffixes: | |
- events | |
extensions: | |
- .tsv | |
- .json | |
datatypes: | |
- beh | |
- eeg | |
- ieeg | |
- meg | |
- nirs | |
entities: | |
subject: required | |
session: optional | |
task: required | |
acquisition: optional | |
run: optional |
bids-specification/src/schema/rules/files/raw/task.yaml
Lines 2 to 19 in 426de56
events: | |
suffixes: | |
- events | |
extensions: | |
- .tsv | |
- .json | |
datatypes: | |
- beh | |
- eeg | |
- ieeg | |
- meg | |
- nirs | |
entities: | |
subject: required | |
session: optional | |
task: required | |
acquisition: optional | |
run: optional |
@@ -574,7 +574,7 @@ Delimiter: | |||
name: Delimiter | |||
display_name: Delimiter | |||
description: | | |||
If a column may be interpreted as a list of values, the character that | |||
If rows in a column may be interpreted as a lists of values, the character that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be better as:
"If a column entry may be interpreted as..."
Also, does the Delimiter only apply to "channel" column or to any column as the above implies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Delimiter will from now on apply to any column.
onset duration trial_type response_time stim_file channel artifact | ||
1.23 0.65 start 1.435 images/red_square.jpg n/a n/a | ||
5.65 0.65 stop 1.739 images/blue_square.jpg n/a n/a | ||
12.1 2.35 n/a n/a n/a F,1|F,2|Cz sweat |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "sweat" was "musc" in a later example and the "artifact" column was called "annot". I think the same example should be continued along.
Thus, the channels related to the event in the third row of the example | ||
are called `F,1`, `F,2`, and `Cz`. | ||
|
||
1. The example contains a column called `artifact`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that the two examples that relate to this should have this column either called "artifact" or "annot".
@VisLab Some of your comments are marked "Outdated". I'm not sure if you're using a link to a specific commit, or just need to refresh. Can you check that you're reviewing the latest version? |
Yeah, I see the outdated, but I think I am looking at the latest version.
I tried to resolve what I think was fixed. I think that your updates now
make it clear that the entries are only treated as lists if the Delimiter
field is specified.
My concerns are:
1. Does Delimiter only apply to the "channel" column. (Some places not
clear.)
2. Where is the right place to enforce modality constraints for using
channel column as a special column.
3. The examples with the annot or artifact columns should use rectified.
Thanks!
…On Sat, Nov 11, 2023 at 12:17 PM Chris Markiewicz ***@***.***> wrote:
@VisLab <https://github.com/VisLab> Some of your comments are marked
"Outdated". I'm not sure if you're using a link to a specific commit, or
just need to refresh. Can you check that you're reviewing the latest
version?
—
Reply to this email directly, view it on GitHub
<#1483 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAJCJOURYQR64DF3677YE3LYD66KRAVCNFSM6AAAAAAXS6CDWGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMBWHA4DEOBWGU>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
BTW --- I think you should go ahead and merge this PR. If any clarifications are needed, they can be added in an additional PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review @VisLab, I am merging this now.
.tsv
cells #1446 for extensive discussion.Closes #1446.