Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SCHEMA] Add TSV column files #827

Merged
merged 45 commits into from
Nov 9, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
45 commits
Select commit Hold shift + click to select a range
84bfd06
Add template.
tsalo Apr 10, 2021
066e2a2
Add first column.
tsalo Apr 10, 2021
10816d7
Add columns.
tsalo Apr 10, 2021
19901cd
More terms.
tsalo Apr 13, 2021
2a48ef6
Fill name field in all files.
tsalo Jul 9, 2021
2451b9a
More work.
tsalo Jul 9, 2021
de00cea
Merge branch 'master' into tsv-columns-schema
tsalo Aug 31, 2021
54458cd
Add remaining definitions.
tsalo Aug 31, 2021
b5754e7
Add macro to render column tables.
tsalo Aug 31, 2021
42d6e08
Fix YAML file.
tsalo Aug 31, 2021
034c6a8
Consolidate suffixes file.
tsalo Sep 20, 2021
b0b2c3f
Merge branch 'master' into tsv-columns-schema
tsalo Oct 5, 2021
4988676
Remove old individual files.
tsalo Oct 5, 2021
26e8ae9
Move columns file.
tsalo Oct 5, 2021
4615684
Fix things up a bit.
tsalo Oct 5, 2021
27168f5
Add columns I missed for modality-agnostic TSV files.
tsalo Oct 5, 2021
de9511a
Support n/a for duration.
tsalo Oct 6, 2021
59aea34
Apply suggestions from code review
tsalo Oct 6, 2021
a1148c0
Code formatting in stim_file definition.
tsalo Oct 6, 2021
02568cf
Allow numbers and strings for value.
tsalo Oct 6, 2021
c3f2558
Update src/schema/objects/columns.yaml
tsalo Oct 6, 2021
dc227d6
Allow n/a for "z" column.
tsalo Oct 6, 2021
d831c6c
Describe meanings of x, y, and z columns.
tsalo Oct 6, 2021
3955ed4
Allow n/a for status column.
tsalo Oct 6, 2021
9b4e8b9
Merge branch 'tsv-columns-schema' of https://github.com/tsalo/bids-sp…
tsalo Oct 6, 2021
0cb74ee
Add participant_id to participants.tsv table and append info for othe…
tsalo Oct 6, 2021
8d39711
Split type definitions into channels and electrodes versions.
tsalo Oct 6, 2021
e7db250
Update definitions for group based on file type.
tsalo Oct 6, 2021
3218e8e
Split reference column definition.
tsalo Oct 6, 2021
3918690
Clean up name_channels definition.
tsalo Oct 11, 2021
552b4ae
Draft new columns from #816
tsalo Oct 13, 2021
84241d4
Add new columns to table.
tsalo Oct 13, 2021
bd8c933
Merge branch 'master' into tsv-columns-schema
tsalo Oct 13, 2021
9161079
Remove list items.
tsalo Oct 13, 2021
a7356eb
Update src/04-modality-specific-files/04-intracranial-electroencephal…
tsalo Oct 26, 2021
fd54609
Apply suggestions from code review
tsalo Oct 26, 2021
efa6fa1
Use two underscores to delineate multiply-defined columns.
tsalo Oct 26, 2021
0a4880f
Remove text that is now in table.
tsalo Oct 26, 2021
48656fc
Update src/schema/objects/columns.yaml
tsalo Oct 26, 2021
7ff360a
Merge branch 'tsv-columns-schema' of https://github.com/tsalo/bids-sp…
tsalo Oct 26, 2021
de5b5f6
Add sections to README on columns file and on reused terms.
tsalo Oct 26, 2021
ba84f44
Merge branch 'master' into tsv-columns-schema
tsalo Oct 26, 2021
4ba9c25
Add EDF info to acq_time definition.
tsalo Oct 26, 2021
b510b7d
Remove hardcoded tables.
tsalo Nov 9, 2021
d562986
Remove unused links.
tsalo Nov 9, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
109 changes: 35 additions & 74 deletions src/03-modality-agnostic-files.md
Original file line number Diff line number Diff line change
Expand Up @@ -179,40 +179,21 @@ For backwards compatibility, if `species` is absent, the participant is assumed
`homo sapiens`.

Commonly used *optional* columns in `participants.tsv` files are `age`, `sex`,
`handedness`, `strain` and `strain_rrid`. We RECOMMEND to make use
`handedness`, `strain`, and `strain_rrid`. We RECOMMEND to make use
of these columns, and in case that you do use them, we RECOMMEND to use the
following values for them:

- `age`: numeric value in years (float or integer value)

- `sex`: string value indicating phenotypical sex, one of "male", "female",
"other"

- for "male", use one of these values: `male`, `m`, `M`, `MALE`, `Male`

- for "female", use one of these values: `female`, `f`, `F`, `FEMALE`,
`Female`

- for "other", use one of these values: `other`, `o`, `O`, `OTHER`,
`Other`

- `handedness`: string value indicating one of "left", "right",
"ambidextrous"

- for "left", use one of these values: `left`, `l`, `L`, `LEFT`, `Left`

- for "right", use one of these values: `right`, `r`, `R`, `RIGHT`,
`Right`

- for "ambidextrous", use one of these values: `ambidextrous`, `a`, `A`,
`AMBIDEXTROUS`, `Ambidextrous`

- `strain`: for species different from `homo sapiens`, string value indicating
the strain of the species, for example: `C57BL/6J`.

- `strain_rrid`: for species different from `homo sapiens`, research resource identifier
([RRID](https://scicrunch.org/resources/Organisms/search)) of the strain of the species,
for example: `RRID:IMSR_JAX:000664`.
{{ MACROS___make_columns_table(
{
"participant_id": ("REQUIRED", "There MUST be exactly one row for each participant."),
"species": "RECOMMENDED",
"age": "RECOMMENDED",
"sex": "RECOMMENDED",
"handedness": "RECOMMENDED",
"strain": "RECOMMENDED",
"strain_rrid": "RECOMMENDED",
}
) }}
Comment on lines +186 to +196
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it accurate to say that this macro represents a BIDS rule for the `participants.tsv" file? If so, should this be described in the schema somewhere?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the tables do represent rules, but those rules are too complicated to be translated directly from the tables. For more information on the types of mechanisms we need to support in the schema to accurately codify the specification's rules, please see #620.


Throughout BIDS you can indicate missing values with `n/a` (for "not
available").
Expand Down Expand Up @@ -279,32 +260,17 @@ samples.json

The purpose of this file is to describe properties of samples, indicated by the `sample` entity.
This file is REQUIRED if `sample-<label>` is present in any file name within the dataset.
If this file exists, it MUST contain the three following columns:

- `sample_id`: MUST consist of `sample-<label>` values identifying one row
for each sample

- `participant_id`: MUST consist of `sub-<label>`

- `sample_type`: MUST consist of sample type values, either `cell line`, `in vitro differentiated cells`,
`primary cell`, `cell-free sample`, `cloning host`, `tissue`, `whole organisms`, `organoid` or
`technical sample` from [ENCODE Biosample Type](https://www.encodeproject.org/profiles/biosample_type)

Other optional columns MAY be used to describe the samples.
Each sample MUST be described by one and only one row.
tsalo marked this conversation as resolved.
Show resolved Hide resolved

Commonly used *optional* columns in `samples.tsv` files are `pathology` and
`derived_from`. We RECOMMEND to make use of these columns, and in case that
you do use them, we RECOMMEND to use the following values for them:

- `pathology`: string value describing the pathology of the sample or type of control.
When different from `healthy`, pathology SHOULD be specified in `samples.tsv`.
The pathology MAY instead be specified in [Sessions files](./03-modality-agnostic-files.md#sessions-file)
in case it changes over time.

- `derived_from`: `sample-<label>` key/value pair from which a sample is derived from,
for example a slice of tissue (`sample-02`) derived from a block of tissue (`sample-01`),
as illustrated in the example below.
{{ MACROS___make_columns_table(
{
"sample_id": ("REQUIRED", "The combination of `sample_id` and `participant_id` MUST be unique."),
"participant_id": ("REQUIRED", "The combination of `sample_id` and `participant_id` MUST be unique."),
"sample_type": "REQUIRED",
"pathology": "RECOMMENDED",
"derived_from": "RECOMMENDED",
}
) }}

`samples.tsv` example:

Expand Down Expand Up @@ -429,25 +395,12 @@ Some recordings consist of multiple parts, that span several files,
for example through `echo-`, `part-`, or `split-` entities.
Such recordings MUST be documented with one row per file.

Relative paths to files should be used under a compulsory `filename` header.

If acquisition time is included it should be listed under the `acq_time` header.
Acquisition time refers to when the first data point in each run was acquired.
Furthermore, if this header is provided, the acquisition times of all files that
belong to a recording MUST be identical.

Datetime should be expressed as described in [Units](./02-common-principles.md#units).

For anonymization purposes all dates within one subject should be shifted by a
randomly chosen (but consistent across all recordings) number of days.
This way relative timing would be preserved, but chances of identifying a
person based on the date and time of their scan would be decreased.
Dates that are shifted for anonymization purposes SHOULD be set to the year 1925
or earlier to clearly distinguish them from unmodified data.
Note that some data formats do not support arbitrary recording dates.
For example, the [EDF](https://www.edfplus.info/)
data format can only contain recording dates after 1985.
Shifting dates is RECOMMENDED, but not required.
{{ MACROS___make_columns_table(
{
"filename": ("REQUIRED", "There MUST be exactly one row for each file."),
"acq_time": "OPTIONAL",
}
) }}

Additional fields can include external behavioral measures relevant to the
scan.
Expand Down Expand Up @@ -485,6 +438,14 @@ These files MUST include a `session_id` column and describe each session by one
Column names in `sessions.tsv` files MUST be different from group level participant key column names in the
[`participants.tsv` file](./03-modality-agnostic-files.md#participants-file).

{{ MACROS___make_columns_table(
{
"session_id": ("REQUIRED", "There MUST be exactly one row for each session."),
"acq_time": "OPTIONAL",
tsalo marked this conversation as resolved.
Show resolved Hide resolved
"pathology": "RECOMMENDED",
}
) }}

`_sessions.tsv` example:

```Text
Expand Down
34 changes: 19 additions & 15 deletions src/04-modality-specific-files/02-magnetoencephalography.md
Original file line number Diff line number Diff line change
Expand Up @@ -221,24 +221,28 @@ The columns of the Channels description table stored in `*_channels.tsv` are:

MUST be present **in this specific order**:

| **Column name** | **Requirement level** | **Description** |
| --------------- | --------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| name | REQUIRED | Channel name (for example, MRT012, MEG023). |
| type | REQUIRED | Type of channel; MUST use the channel types listed below. Note that the type MUST be in upper-case. |
| units | REQUIRED | Physical unit of the value represented in this channel, for example, `V` for Volt, or `fT/cm` for femto Tesla per centimeter (see [Units](../02-common-principles.md#units)). |
{{ MACROS___make_columns_table(
{
"name__channels": "REQUIRED",
"type__channels": "REQUIRED",
"units": "REQUIRED",
}
) }}

SHOULD be present:

| **Column name** | **Requirement level** | **Description** |
| ------------------ | --------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| description | OPTIONAL | Brief free-text description of the channel, or other information of interest. See examples below. |
| sampling_frequency | OPTIONAL | Sampling rate of the channel in Hz. |
| low_cutoff | OPTIONAL | Frequencies used for the high-pass filter applied to the channel in Hz. If no high-pass filter applied, use `n/a`. |
| high_cutoff | OPTIONAL | Frequencies used for the low-pass filter applied to the channel in Hz. If no low-pass filter applied, use `n/a`. Note that hardware anti-aliasing in A/D conversion of all MEG/EEG electronics applies a low-pass filter; specify its frequency here if applicable. |
| notch | OPTIONAL | Frequencies used for the notch filter applied to the channel, in Hz. If no notch filter applied, use `n/a`. |
| software_filters | OPTIONAL | List of temporal and/or spatial software filters applied (for example, "SSS", `"SpatialCompensation"`). Note that parameters should be defined in the general MEG sidecar .json file. Indicate `n/a` in the absence of software filters applied. |
| status | OPTIONAL | Data quality observed on the channel `(good/bad)`. A channel is considered `bad` if its data quality is compromised by excessive noise. Description of noise type SHOULD be provided in `[status_description]`. |
| status_description | OPTIONAL | Freeform text description of noise or artifact affecting data quality on the channel. It is meant to explain why the channel was declared bad in `[status]`. |
{{ MACROS___make_columns_table(
{
"description": "OPTIONAL",
"sampling_frequency": "OPTIONAL",
"low_cutoff": "OPTIONAL",
"high_cutoff": "OPTIONAL",
"notch": "OPTIONAL",
"software_filters": "OPTIONAL",
"status": "OPTIONAL",
"status_description": "OPTIONAL",
}
) }}

Example:

Expand Down
60 changes: 34 additions & 26 deletions src/04-modality-specific-files/03-electroencephalography.md
Original file line number Diff line number Diff line change
Expand Up @@ -216,24 +216,28 @@ The columns of the Channels description table stored in `*_channels.tsv` are:

MUST be present **in this specific order**:

| **Column name** | **Requirement level** | **Description** |
| --------------- | --------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| name | REQUIRED | Channel name (for example, FC1, Cz) |
| type | REQUIRED | Type of channel; MUST use the channel types listed below. Note that the type MUST be in upper-case. |
| units | REQUIRED | Physical unit of the value represented in this channel, for example, `V` for Volt, or `fT/cm` for femto Tesla per centimeter (see [Units](../02-common-principles.md#units)). |
{{ MACROS___make_columns_table(
{
"name__channels": "REQUIRED",
"type__channels": "REQUIRED",
"units": "REQUIRED",
}
) }}

SHOULD be present:

| **Column name** | **Requirement level** | **Description** |
| ------------------ | --------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| description | OPTIONAL | Free-form text description of the channel, or other information of interest. See examples below. |
| sampling_frequency | OPTIONAL | Sampling rate of the channel in Hz. |
| reference | OPTIONAL | Name of the reference electrode(s) (not needed when it is common to all channels, in that case it can be specified in `*_eeg.json` as `EEGReference`). |
| low_cutoff | OPTIONAL | Frequencies used for the high-pass filter applied to the channel in Hz. If no high-pass filter applied, use `n/a`. |
| high_cutoff | OPTIONAL | Frequencies used for the low-pass filter applied to the channel in Hz. If no low-pass filter applied, use `n/a`. Note that hardware anti-aliasing in A/D conversion of all EEG electronics applies a low-pass filter; specify its frequency here if applicable. |
| notch | OPTIONAL | Frequencies used for the notch filter applied to the channel, in Hz. If no notch filter applied, use `n/a`. |
| status | OPTIONAL | Data quality observed on the channel (`good`, `bad`). A channel is considered `bad` if its data quality is compromised by excessive noise. Description of noise type SHOULD be provided in `status_description`. |
| status_description | OPTIONAL | Free-form text description of noise or artifact affecting data quality on the channel. It is meant to explain why the channel was declared bad in `status`. |
{{ MACROS___make_columns_table(
{
"description": "OPTIONAL",
"sampling_frequency": "OPTIONAL",
"reference__eeg": "OPTIONAL",
"low_cutoff": "OPTIONAL",
"high_cutoff": "OPTIONAL",
"notch": "OPTIONAL",
"status": "OPTIONAL",
"status_description": "OPTIONAL",
}
) }}

Restricted keyword list for field `type` in alphabetic order (shared with the
MEG and iEEG modality; however, only the types that are common in EEG data are listed here).
Expand Down Expand Up @@ -288,20 +292,24 @@ file MUST be specified as well**. The order of the required columns in the

MUST be present **in this specific order**:

| **Column name** | **Requirement level** | **Description** |
| --------------- | --------------------- | ----------------------------------- |
| name | REQUIRED | Name of the electrode. |
| x | REQUIRED | Recorded position along the x-axis. |
| y | REQUIRED | Recorded position along the y-axis. |
| z | REQUIRED | Recorded position along the z-axis. |
{{ MACROS___make_columns_table(
{
"name__electrodes": "REQUIRED",
"x": "REQUIRED",
"y": "REQUIRED",
"z": "REQUIRED",
}
) }}

SHOULD be present:

| **Column name** | **Requirement level** | **Description** |
| --------------- | --------------------- | ---------------------------------------------------------------------- |
| type | RECOMMENDED | Type of the electrode (for example, cup, ring, clip-on, wire, needle). |
| material | RECOMMENDED | Material of the electrode (for example, Tin, Ag/AgCl, Gold). |
| impedance | RECOMMENDED | Impedance of the electrode, units MUST be in `kOhm`. |
{{ MACROS___make_columns_table(
{
"type__electrodes": "RECOMMENDED",
"material": "RECOMMENDED",
"impedance": "RECOMMENDED",
}
) }}

### Example `electrodes.tsv`

Expand Down
Loading