Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SCHEMA] Add TSV column files #827

Merged
merged 45 commits into from
Nov 9, 2021
Merged
Show file tree
Hide file tree
Changes from 16 commits
Commits
Show all changes
45 commits
Select commit Hold shift + click to select a range
84bfd06
Add template.
tsalo Apr 10, 2021
066e2a2
Add first column.
tsalo Apr 10, 2021
10816d7
Add columns.
tsalo Apr 10, 2021
19901cd
More terms.
tsalo Apr 13, 2021
2a48ef6
Fill name field in all files.
tsalo Jul 9, 2021
2451b9a
More work.
tsalo Jul 9, 2021
de00cea
Merge branch 'master' into tsv-columns-schema
tsalo Aug 31, 2021
54458cd
Add remaining definitions.
tsalo Aug 31, 2021
b5754e7
Add macro to render column tables.
tsalo Aug 31, 2021
42d6e08
Fix YAML file.
tsalo Aug 31, 2021
034c6a8
Consolidate suffixes file.
tsalo Sep 20, 2021
b0b2c3f
Merge branch 'master' into tsv-columns-schema
tsalo Oct 5, 2021
4988676
Remove old individual files.
tsalo Oct 5, 2021
26e8ae9
Move columns file.
tsalo Oct 5, 2021
4615684
Fix things up a bit.
tsalo Oct 5, 2021
27168f5
Add columns I missed for modality-agnostic TSV files.
tsalo Oct 5, 2021
de9511a
Support n/a for duration.
tsalo Oct 6, 2021
59aea34
Apply suggestions from code review
tsalo Oct 6, 2021
a1148c0
Code formatting in stim_file definition.
tsalo Oct 6, 2021
02568cf
Allow numbers and strings for value.
tsalo Oct 6, 2021
c3f2558
Update src/schema/objects/columns.yaml
tsalo Oct 6, 2021
dc227d6
Allow n/a for "z" column.
tsalo Oct 6, 2021
d831c6c
Describe meanings of x, y, and z columns.
tsalo Oct 6, 2021
3955ed4
Allow n/a for status column.
tsalo Oct 6, 2021
9b4e8b9
Merge branch 'tsv-columns-schema' of https://github.com/tsalo/bids-sp…
tsalo Oct 6, 2021
0cb74ee
Add participant_id to participants.tsv table and append info for othe…
tsalo Oct 6, 2021
8d39711
Split type definitions into channels and electrodes versions.
tsalo Oct 6, 2021
e7db250
Update definitions for group based on file type.
tsalo Oct 6, 2021
3218e8e
Split reference column definition.
tsalo Oct 6, 2021
3918690
Clean up name_channels definition.
tsalo Oct 11, 2021
552b4ae
Draft new columns from #816
tsalo Oct 13, 2021
84241d4
Add new columns to table.
tsalo Oct 13, 2021
bd8c933
Merge branch 'master' into tsv-columns-schema
tsalo Oct 13, 2021
9161079
Remove list items.
tsalo Oct 13, 2021
a7356eb
Update src/04-modality-specific-files/04-intracranial-electroencephal…
tsalo Oct 26, 2021
fd54609
Apply suggestions from code review
tsalo Oct 26, 2021
efa6fa1
Use two underscores to delineate multiply-defined columns.
tsalo Oct 26, 2021
0a4880f
Remove text that is now in table.
tsalo Oct 26, 2021
48656fc
Update src/schema/objects/columns.yaml
tsalo Oct 26, 2021
7ff360a
Merge branch 'tsv-columns-schema' of https://github.com/tsalo/bids-sp…
tsalo Oct 26, 2021
de5b5f6
Add sections to README on columns file and on reused terms.
tsalo Oct 26, 2021
ba84f44
Merge branch 'master' into tsv-columns-schema
tsalo Oct 26, 2021
4ba9c25
Add EDF info to acq_time definition.
tsalo Oct 26, 2021
b510b7d
Remove hardcoded tables.
tsalo Nov 9, 2021
d562986
Remove unused links.
tsalo Nov 9, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
74 changes: 33 additions & 41 deletions src/03-modality-agnostic-files.md
Original file line number Diff line number Diff line change
Expand Up @@ -175,31 +175,15 @@ Each participant MUST be described by one and only one row.
Commonly used *optional* columns in `participant.tsv` files are `age`, `sex`,
and `handedness`. We RECOMMEND to make use of these columns, and
in case that you do use them, we RECOMMEND to use the following values
for them:
for them.

- `age`: numeric value in years (float or integer value)

- `sex`: string value indicating phenotypical sex, one of "male", "female",
"other"

- for "male", use one of these values: `male`, `m`, `M`, `MALE`, `Male`

- for "female", use one of these values: `female`, `f`, `F`, `FEMALE`,
`Female`

- for "other", use one of these values: `other`, `o`, `O`, `OTHER`,
`Other`

- `handedness`: string value indicating one of "left", "right",
"ambidextrous"

- for "left", use one of these values: `left`, `l`, `L`, `LEFT`, `Left`

- for "right", use one of these values: `right`, `r`, `R`, `RIGHT`,
`Right`

- for "ambidextrous", use one of these values: `ambidextrous`, `a`, `A`,
`AMBIDEXTROUS`, `Ambidextrous`
{{ MACROS___make_columns_table(
{
"age": "RECOMMENDED",
"sex": "RECOMMENDED",
"handedness": "RECOMMENDED",
}
) }}

Throughout BIDS you can indicate missing values with `n/a` (for "not
available").
Expand Down Expand Up @@ -266,6 +250,18 @@ samples.json

The purpose of this file is to describe properties of samples, indicated by the `sample` entity.
This file is REQUIRED if `sample-<label>` is present in any file name within the dataset.
Each sample MUST be described by one and only one row.

{{ MACROS___make_columns_table(
{
"sample_id": "REQUIRED",
"participant_id": "REQUIRED",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably wrong place to discuss this, but I don't get this (screenshot):

image

What if we have several samples for one participant?

participant_id    sample_id
sub-01    brain
sub-01    leg

Then there are two rows referring to sub-01, which should not happen according to the description of participant_id.

Maybe I am misunderstanding something.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi! I may be able to help here (my apologies if I'm missing some context for this PR).

sample_id and samples.tsv were introduced in PR #812.

In samples.tsv, the "unique" identifier per row is sample_id.

So the descriptions were:
sample_id: MUST consist of sample-<label> values identifying one row for each sample
and
participant_id: MUST consist of sub-<label> (without mention of one row per participant).

The resulting table for participants with more than 1 sample each looks like this:

sample_id participant_id
sample-01 sub-01
sample-02 sub-01
sample-03 sub-01
sample-04 sub-02
sample-05 sub-02

Copy link
Collaborator

@effigies effigies Oct 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems we're getting participants.tsv-specific text in samples.tsv table, and more generally we're combining the term description and its usage in a specific file.

Here I think the correct fix is probably to make participant_id say something like "A participant identifier of the form sub-<label>." and then the participants.tsv macro can add text that says "There MUST be exactly one row for each participant."

Edit: I see participant_id is not in the participants.tsv macro. Seems fine for now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the problem should be addressed by 0cb74ee. WDYT?

tsalo marked this conversation as resolved.
Show resolved Hide resolved
"sample_type": "REQUIRED",
"pathology": "RECOMMENDED",
"derived_from": "RECOMMENDED",
}
) }}

If this file exists, it MUST contain the three following columns:
tsalo marked this conversation as resolved.
Show resolved Hide resolved

- `sample_id`: MUST consist of `sample-<label>` values identifying one row
Expand All @@ -278,7 +274,6 @@ If this file exists, it MUST contain the three following columns:
`technical sample` from [ENCODE Biosample Type](https://www.encodeproject.org/profiles/biosample_type)

Other optional columns MAY be used to describe the samples.
Each sample MUST be described by one and only one row.
tsalo marked this conversation as resolved.
Show resolved Hide resolved

Commonly used *optional* columns in `samples.tsv` files are `pathology` and
`derived_from`. We RECOMMEND to make use of these columns, and in case that
Expand Down Expand Up @@ -416,22 +411,12 @@ Some recordings consist of multiple parts, that span several files,
for example through `echo-`, `part-`, or `split-` entities.
Such recordings MUST be documented with one row per file.

Relative paths to files should be used under a compulsory `filename` header.

If acquisition time is included it should be listed under the `acq_time` header.
Acquisition time refers to when the first data point in each run was acquired.
Furthermore, if this header is provided, the acquisition times of all files that
belong to a recording MUST be identical.

Datetime should be expressed as described in [Units](./02-common-principles.md#units).

For anonymization purposes all dates within one subject should be shifted by a
randomly chosen (but consistent across all recordings) number of days.
This way relative timing would be preserved, but chances of identifying a
person based on the date and time of their scan would be decreased.
Dates that are shifted for anonymization purposes SHOULD be set to the year 1925
or earlier to clearly distinguish them from unmodified data.
Shifting dates is RECOMMENDED, but not required.
{{ MACROS___make_columns_table(
{
"filename": "REQUIRED",
"acq_time": "OPTIONAL",
}
) }}

Additional fields can include external behavioral measures relevant to the
scan.
Expand Down Expand Up @@ -469,6 +454,13 @@ These files MUST include a `session_id` column and describe each session by one
Column names in `sessions.tsv` files MUST be different from group level participant key column names in the
[`participants.tsv` file](./03-modality-agnostic-files.md#participants-file).

{{ MACROS___make_columns_table(
{
"session_id": "REQUIRED",
"acq_time": "OPTIONAL",
tsalo marked this conversation as resolved.
Show resolved Hide resolved
}
) }}

`_sessions.tsv` example:

```Text
Expand Down
21 changes: 21 additions & 0 deletions src/04-modality-specific-files/02-magnetoencephalography.md
Original file line number Diff line number Diff line change
Expand Up @@ -227,6 +227,14 @@ MUST be present **in this specific order**:
| type | REQUIRED | Type of channel; MUST use the channel types listed below. Note that the type MUST be in upper-case. |
| units | REQUIRED | Physical unit of the value represented in this channel, for example, `V` for Volt, or `fT/cm` for femto Tesla per centimeter (see [Units](../02-common-principles.md#units)). |

{{ MACROS___make_columns_table(
{
"name_channels": "REQUIRED",
tsalo marked this conversation as resolved.
Show resolved Hide resolved
"type": "REQUIRED",
"units": "REQUIRED",
}
) }}

SHOULD be present:

| **Column name** | **Requirement level** | **Description** |
Expand All @@ -240,6 +248,19 @@ SHOULD be present:
| status | OPTIONAL | Data quality observed on the channel `(good/bad)`. A channel is considered `bad` if its data quality is compromised by excessive noise. Description of noise type SHOULD be provided in `[status_description]`. |
| status_description | OPTIONAL | Freeform text description of noise or artifact affecting data quality on the channel. It is meant to explain why the channel was declared bad in `[status]`. |

{{ MACROS___make_columns_table(
{
"description": "OPTIONAL",
"sampling_frequency": "OPTIONAL",
"low_cutoff": "OPTIONAL",
"high_cutoff": "OPTIONAL",
"notch": "OPTIONAL",
"software_filters": "OPTIONAL",
"status": "OPTIONAL",
"status_description": "OPTIONAL",
}
) }}

Example:

```Text
Expand Down
38 changes: 38 additions & 0 deletions src/04-modality-specific-files/03-electroencephalography.md
Original file line number Diff line number Diff line change
Expand Up @@ -222,6 +222,14 @@ MUST be present **in this specific order**:
| type | REQUIRED | Type of channel; MUST use the channel types listed below. Note that the type MUST be in upper-case. |
| units | REQUIRED | Physical unit of the value represented in this channel, for example, `V` for Volt, or `fT/cm` for femto Tesla per centimeter (see [Units](../02-common-principles.md#units)). |

{{ MACROS___make_columns_table(
{
"name_channels": "REQUIRED",
tsalo marked this conversation as resolved.
Show resolved Hide resolved
"type": "REQUIRED",
"units": "REQUIRED",
}
) }}

SHOULD be present:

| **Column name** | **Requirement level** | **Description** |
Expand All @@ -235,6 +243,19 @@ SHOULD be present:
| status | OPTIONAL | Data quality observed on the channel (`good`, `bad`). A channel is considered `bad` if its data quality is compromised by excessive noise. Description of noise type SHOULD be provided in `status_description`. |
| status_description | OPTIONAL | Free-form text description of noise or artifact affecting data quality on the channel. It is meant to explain why the channel was declared bad in `status`. |

{{ MACROS___make_columns_table(
{
"description": "OPTIONAL",
"sampling_frequency": "OPTIONAL",
"reference": "OPTIONAL",
"low_cutoff": "OPTIONAL",
"high_cutoff": "OPTIONAL",
"notch": "OPTIONAL",
"status": "OPTIONAL",
"status_description": "OPTIONAL",
}
) }}

Restricted keyword list for field `type` in alphabetic order (shared with the
MEG and iEEG modality; however, only the types that are common in EEG data are listed here).
Note that upper-case is REQUIRED:
Expand Down Expand Up @@ -295,6 +316,15 @@ MUST be present **in this specific order**:
| y | REQUIRED | Recorded position along the y-axis. |
| z | REQUIRED | Recorded position along the z-axis. |

{{ MACROS___make_columns_table(
{
"name_electrodes": "REQUIRED",
"x": "REQUIRED",
"y": "REQUIRED",
"z": "REQUIRED",
}
) }}

SHOULD be present:

| **Column name** | **Requirement level** | **Description** |
Expand All @@ -303,6 +333,14 @@ SHOULD be present:
| material | RECOMMENDED | Material of the electrode (for example, Tin, Ag/AgCl, Gold). |
| impedance | RECOMMENDED | Impedance of the electrode, units MUST be in `kOhm`. |

{{ MACROS___make_columns_table(
{
"type": "RECOMMENDED",
tsalo marked this conversation as resolved.
Show resolved Hide resolved
"material": "RECOMMENDED",
"impedance": "RECOMMENDED",
}
) }}

### Example `electrodes.tsv`

See also the corresponding [`electrodes.tsv` example](#example-channelstsv).
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -234,6 +234,16 @@ MUST be present **in this specific order**:
| low_cutoff | REQUIRED | Frequencies used for the low pass filter applied to the channel in Hz. If no low pass filter was applied, use `n/a`. Note that anti-alias is a low pass filter, specify its frequencies here if applicable. |
| high_cutoff | REQUIRED | Frequencies used for the high pass filter applied to the channel in Hz. If no high pass filter applied, use `n/a`. |

{{ MACROS___make_columns_table(
{
"name_channels": "REQUIRED",
"type": "REQUIRED",
"units": "REQUIRED",
"low_cutoff": "REQUIRED",
"high_cutoff": "REQUIRED",
}
) }}

SHOULD be present:

| **Column name** | **Requirement level** | **Description** |
Expand All @@ -246,6 +256,18 @@ SHOULD be present:
| status | OPTIONAL | Data quality observed on the channel (good/bad). A channel is considered bad if its data quality is compromised by excessive noise. Description of noise type SHOULD be provided in `[status_description]`. |
| status_description | OPTIONAL | Freeform text description of noise or artifact affecting data quality on the channel. It is meant to explain why the channel was declared bad in `[status]`. |

{{ MACROS___make_columns_table(
{
"reference": "OPTIONAL",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this makes references to the EEG section here ... but we are in the iEEG section of the spec - so these references don't make sense here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The definitions for EEG and iEEG are really different, so I decided to split it into two schema objects. I don't love splitting definitions for the same column in the same file across modalities, so if there's any way to write a single definition that works for both that would be awesome. Do you think it's possible?

"group": "OPTIONAL",
"sampling_frequency": "OPTIONAL",
"description": "OPTIONAL",
"notch": "OPTIONAL",
"status": "OPTIONAL",
"status_description": "OPTIONAL",
}
) }}

**Example** `sub-01_channels.tsv`:

```Text
Expand Down Expand Up @@ -343,6 +365,16 @@ MUST be present **in this specific order**:
| z | REQUIRED | Z position. If electrodes are in 2D space this should be a column of `n/a` values. |
| size | REQUIRED | Surface area of the electrode, units MUST be in `mm^2`. |

{{ MACROS___make_columns_table(
{
"name_electrodes": "REQUIRED",
"x": "REQUIRED",
tsalo marked this conversation as resolved.
Show resolved Hide resolved
"y": "REQUIRED",
"z": "REQUIRED",
tsalo marked this conversation as resolved.
Show resolved Hide resolved
"size": "REQUIRED",
}
) }}

SHOULD be present:

| **Column name** | **Requirement level** | **Description** |
Expand All @@ -352,6 +384,15 @@ SHOULD be present:
| group | RECOMMENDED | The group that the electrode is a part of. Note that any group specified here should match a group specified in `_channels.tsv`. |
| hemisphere | RECOMMENDED | The hemisphere in which the electrode is placed, one of `['L' or 'R']` (use capital). |

{{ MACROS___make_columns_table(
{
"material": "RECOMMENDED",
"manufacturer": "RECOMMENDED",
"group": "RECOMMENDED",
tsalo marked this conversation as resolved.
Show resolved Hide resolved
"hemisphere": "RECOMMENDED",
}
) }}

MAY be present:

| **Column name** | **Requirement level** | **Description** |
Expand All @@ -360,6 +401,14 @@ MAY be present:
| impedance | OPTIONAL | Impedance of the electrode, units MUST be in `kOhm`. |
| dimension | OPTIONAL | Size of the group (grid/strip/probe) that this electrode belongs to. Must be of form `[AxB]` with the smallest dimension first (for example, `[1x8]`). |

{{ MACROS___make_columns_table(
{
"type": "OPTIONAL",
"impedance": "OPTIONAL",
"dimension": "OPTIONAL",
}
) }}

Example:

```Text
Expand Down
18 changes: 18 additions & 0 deletions src/04-modality-specific-files/05-task-events.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,18 @@ and OPTIONAL columns:
| value | OPTIONAL | [string][] or [number][] | Marker value associated with the event (for example, the value of a TTL trigger that was recorded at the onset of the event). |
| HED | OPTIONAL | [string][] | Hierarchical Event Descriptor (HED) tag. See [Appendix III](../99-appendices/03-hed.md) for details. |

{{ MACROS___make_columns_table(
{
"onset": "REQUIRED",
"duration": "REQUIRED",
"sample": "OPTIONAL",
"trial_type": "OPTIONAL",
"response_time": "OPTIONAL",
"value": "OPTIONAL",
"HED": "OPTIONAL",
}
) }}

<sup>5</sup> Note for MRI data:
If any acquired scans have been discarded before forming the imaging data file,
ensure that an `onset` of 0 corresponds to the time the first image was stored.
Expand Down Expand Up @@ -141,6 +153,12 @@ but they should be stored in the `/stimuli` folder.
| --------------- | --------------------- | ------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| stim_file | OPTIONAL | [string][] | Represents the location of the stimulus file (such as an image, video, or audio file) presented at the given onset time. The values under the `stim_file` column correspond to a path relative to the folder `/stimuli`. For example `images/cat03.jpg` will be translated to `/stimuli/images/cat03.jpg`. |

{{ MACROS___make_columns_table(
{
"stim_file": "OPTIONAL",
}
) }}

### Stimuli databases

References to existing databases can also be encoded using additional columns.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -113,6 +113,14 @@ following naming conventions SHOULD be used for the column names:
| respiratory | continuous breathing measurement |
| trigger | continuous measurement of the scanner trigger signal |

{{ MACROS___make_columns_table(
{
"cardiac": "OPTIONAL",
"respiratory": "OPTIONAL",
"trigger": "OPTIONAL",
}
) }}

For any other data to be specified in columns, the column names can be chosen
as deemed appropriate by the researcher.

Expand Down
11 changes: 11 additions & 0 deletions src/04-modality-specific-files/09-positron-emission-tomography.md
Original file line number Diff line number Diff line change
Expand Up @@ -364,6 +364,17 @@ The `time` column MUST always be the first column.
| `hplc_recovery_fractions` | REQUIRED if `MetaboliteRecoveryCorrectionApplied` is `true` | HPLC recovery fractions (the fraction of activity that gets loaded onto the HPLC) | Unit of recovery fractions (for example, `"unitless"`) |
| `whole_blood_radioactivity` | REQUIRED if `WholeBloodAvail` is `true` | Radioactivity in whole blood samples | Unit of radioactivity measurements in whole blood samples (for example, `"kBq/mL"`) |

{{ MACROS___make_columns_table(
{
"time": "REQUIRED",
"plasma_radioactivity": "REQUIRED if `PlasmaAvail` is `true`",
"metabolite_parent_fraction": "REQUIRED if `MetaboliteAvail` is `true`",
"metabolite_polar_fraction": "RECOMMENDED if `MetaboliteAvail` is `true`",
"hplc_recovery_fractions": "REQUIRED if `MetaboliteRecoveryCorrectionApplied` is `true`",
"whole_blood_radioactivity": "REQUIRED if `WholeBloodAvail` is `true`",
}
) }}

As with all [tabular files](../02-common-principles.md#tabular-files),
additional columns MAY be defined in `_blood.json`.
For clarity, it is RECOMMENDED to include the above column definitions in `_blood.json`,
Expand Down
10 changes: 10 additions & 0 deletions src/05-derivatives/03-imaging.md
Original file line number Diff line number Diff line change
Expand Up @@ -448,6 +448,16 @@ These TSV lookup tables contain the following columns:
| color | OPTIONAL | Hexadecimal. Label color for visualization |
| mapping | OPTIONAL | Corresponding integer label in the standard BIDS label lookup |

{{ MACROS___make_columns_table(
{
"index": "REQUIRED",
"name_segmentations": "REQUIRED",
"abbreviation": "OPTIONAL",
"color": "OPTIONAL",
"mapping": "OPTIONAL",
}
) }}

An example, custom `dseg.tsv` that defines three labels:

```Text
Expand Down
Loading