-
Notifications
You must be signed in to change notification settings - Fork 169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH] BEP031 - New columns to participants.tsv file #816
[ENH] BEP031 - New columns to participants.tsv file #816
Conversation
src/03-modality-agnostic-files.md
src/03-modality-agnostic-files.md
Outdated
and `handedness`. We RECOMMEND to make use of these columns, and | ||
in case that you do use them, we RECOMMEND to use the following values | ||
for them: | ||
When different from `homo sapiens`, `participants.tsv` SHOULD include a `species` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't this be MUST, given that all of BIDS assumes humans at this point.
When different from `homo sapiens`, `participants.tsv` SHOULD include a `species` | |
When different from `homo sapiens`, `participants.tsv` MUST include a `species` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know how we would validate a MUST
, here. Also, there are rodent datasets that may not have this column in this form at this point, so we would be breaking backwards compatibility if we could validate. What about:
The RECOMMENDED `species` column MUST be a binomial species name from the
[NCBI Taxonomy](https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi).
For backwards compatibility, if `species` is absent, the participant is assumed to be
`homo sapiens`.
Also, REQUIRE-ing a species name from NCBI Taxonomy
feels like it's going to be difficult to validate, as we will need to either query the database or maintain a list of accepted names, updating the validator as new use cases arise... Is there a validation plan?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @effigies for the suggestion, I think assuming homo sapiens
if the column is omitted is a strong incentive without breaking backward compatibility. I would be in favor of that.
However, I had not thought about the validation, querying the database seems like the best option to not have to maintain an up-to-date list in the validator but it may be difficult to implement. Are there similar requirements elsewhere in the spec? Would the alternative of “SHOULD” or “strongly RECOMMENDED” be advisable?
Also, thinking about it, I think I should add examples other than homo sapiens
like mus musculus
and rattus norvegicus
in the description.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@effigies - regarding validation, i will raise the same issue here as the other post. i'm not sure we actually validate values for example for sex or anything that could have levels or enumerations.
I don't know how we would validate a MUST
while one could at least detect presence, i agree that keeping with the current perspective of the participants.tsv being a recommended file, we can keep things recommended instead of required.
species does get a little complicated, especially for animals, as you start going into species + genotype notions. here is our generic participant at a timepoint model in dandi: https://github.com/dandi/dandischema/blob/master/dandischema/models.py#L642 (technically all of those properties could come into play, with some being more important for animal studies).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
src/03-modality-agnostic-files.md
Outdated
- `strain_rrid`: research resource identifier ([RRID](https://scicrunch.org/resources/Organisms/search)) | ||
of the strain of the species | ||
|
||
- `diagnosis`: string value describing the diagnosis of the participant. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i don't know if this has to be a string value. in many datasets on openneuro diagnosis/dx is present and can be an enumerated type. also, this is one place, where one can have multiple designations depending on the study. we should allow for some notion of that, or simply remove diagnosis from this file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The aim of this PR is to add new columns to describe animal properties. In that context, I agree that diagnosis may be out of scope.
For context, I added the columns following this discussion #779 (comment), #779 (comment) and #779 (comment) because we also introduced pathology in samples.tsv
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Following your suggestion, I removed the diagnosis column in 7146144 as being out of scope for this PR (animal properties).
src/03-modality-agnostic-files.md
Outdated
@@ -197,6 +201,15 @@ for them: | |||
- for "ambidextrous", use one of these values: `ambidextrous`, `a`, `A`, | |||
`AMBIDEXTROUS`, `Ambidextrous` | |||
|
|||
- `strain`: string value indicating the strain of the species |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Examples for each of these would be useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to clarify, do you mean example directly in the description, like above for handedness with ambidextrous
, a
, A
, etc, or an example of particpants.tsv
for an animal.
I did not change the example of participants.tsv
below which is an example for human. I thought having complete examples for both human and animal would maybe be too much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps a few examples from: https://www.jax.org/jax-mice-and-services/find-and-order-jax-mice/most-popular-jax-mice-strains
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Examples added in b451671.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a more general and perhaps controversial comment, that I'd like to hear opinions on.
Up until now BIDS has been mostly about humans and their brain+behavior data. I assume that >80% of users will continue to come to BIDS from this angle.
Adding coverage for animal ephys, and microscopy into BIDS is great, and I fully support it. However is it a good idea to introduce very specific concepts (such as "strain") that are not used in 80% of BIDS use-cases*
into a very general section that discusses things like age, sex, and handedness (which are ubiquitous)? I feel like this would be better discussed in a dedicated, separate section of the spec (animal ephys / microscopy respectively), as it avoids clutter and potential confusion / annoyance with the spec.
*
(this is my assumption/prediction, feel free to challenge).
src/03-modality-agnostic-files.md
Outdated
[NCBI Taxonomy](https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi). | ||
|
||
Commonly used *optional* columns in `participants.tsv` files are `age`, `sex`, | ||
`handedness`, `strain`, `strain_rrid` and `diagnosis`. We RECOMMEND to make use |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could group be added to this list as its used in the example below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question, I'm not sure we should because group
is also used below to illustrate how to describe an additional column that is not part of the optional ones in the participants.json
example. From my understanding, group
may have different meanings depending on the study and it would be hard to define it with a general example
Thanks @sappelhoff for your feedback! The main goal of this PR is to promote We foresee that the BIDS-specification for microscopy will be adopted by many users, some have already started converting their datasets or working on compatible acquisition workflows, but I would not be able to quantify such adoption at this point. I'm not against moving those field into the dedicated microscopy and animal ephys sections if we want to keep the general section more general. The main drawback being duplication in the specification. |
Hi @effigies and @sappelhoff, I can move them to the microscopy part if it makes more sense to have them per modalities rather than in the “general” participants file. In that case, this PR would only add the species columns to participants. |
tagging @Remi-Gau to chime in instead of me :-) |
Tempted to keep those |
agreed! if we only put them in the modality-specific part, we will have to repeat it for several future modalities and it's going to get more complex to handle the specs... |
Thank you everyone for your feedback! |
Hi @bids-standard/maintainers, I also wanted to mention that it may have an impact on another opened PR (#827) as it deals with TSV columns. |
I am still +1 for merge. Any seconds? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks good to me. I can handle adding the new columns to the TSV columns PR.
Since the last meaningful commit was in August, I think we can merge. @sappelhoff are there any blocks I'm forgetting? |
Nope, I think this can go in 👍 |
* Add template. * Add first column. * Add columns. * More terms. * Fill name field in all files. * More work. Note that there are three very different uses of "name" columns, and two of them are equally common, so I chose not to specify any of them as the "canonical" definition. * Add remaining definitions. * Add macro to render column tables. * Fix YAML file. * Consolidate suffixes file. * Remove old individual files. * Move columns file. * Fix things up a bit. * Add columns I missed for modality-agnostic TSV files. * Support n/a for duration. * Apply suggestions from code review Co-authored-by: Chris Markiewicz <[email protected]> Co-authored-by: Stefan Appelhoff <[email protected]> * Code formatting in stim_file definition. * Allow numbers and strings for value. * Update src/schema/objects/columns.yaml Co-authored-by: Stefan Appelhoff <[email protected]> * Allow n/a for "z" column. Addresses https://github.com/bids-standard/bids-specification/pull/827/files#r723280787. * Describe meanings of x, y, and z columns. Addresses https://github.com/bids-standard/bids-specification/pull/827/files#r723283314. * Allow n/a for status column. Addresses https://github.com/bids-standard/bids-specification/pull/827/files#r723269382. * Add participant_id to participants.tsv table and append info for other IDs. * Split type definitions into channels and electrodes versions. * Update definitions for group based on file type. * Split reference column definition. * Clean up name_channels definition. * Draft new columns from #816 * Add new columns to table. * Remove list items. * Update src/04-modality-specific-files/04-intracranial-electroencephalography.md Co-authored-by: Stefan Appelhoff <[email protected]> * Apply suggestions from code review Co-authored-by: Chris Markiewicz <[email protected]> * Use two underscores to delineate multiply-defined columns. * Remove text that is now in table. * Update src/schema/objects/columns.yaml Co-authored-by: Chris Markiewicz <[email protected]> * Add sections to README on columns file and on reused terms. * Add EDF info to acq_time definition. * Remove hardcoded tables. * Remove unused links. Co-authored-by: Chris Markiewicz <[email protected]> Co-authored-by: Stefan Appelhoff <[email protected]>
Dear BIDS community,
Context
As part of the development of the Microscopy BEP (BEP031 @mariehbourget, @jcohenadad) and Animal Ephys BEP (BEP032 @SylvainTakerkart, @JuliaSprenger), the “sample” entity was introduced along with a
samples.tsv
file describing properties of the samples in PR #812.To better describe animal participants, new properties of the subjects also need to be included in
participants.tsv
.Contribution
The purpose of this PR is to add new recommended column (
species
) and optional columns (strain
,strain_rrid
anddiagnosis
) to the participants.tsv file.See issue #779 for related discussions on this topic.