Skip to content

Commit

Permalink
Merge branch 'master' into bep022
Browse files Browse the repository at this point in the history
  • Loading branch information
markmikkelsen authored Apr 8, 2024
2 parents 37620c9 + 6c52828 commit 25c1ed2
Show file tree
Hide file tree
Showing 15 changed files with 148 additions and 79 deletions.
2 changes: 1 addition & 1 deletion Release_Protocol.md
Original file line number Diff line number Diff line change
Expand Up @@ -278,5 +278,5 @@ Update the following files in the BIDS website repository (https://github.com/bi

### 12. Sharing news of the release

Please share news of the release on the [identified platforms](https://docs.google.com/spreadsheets/d/16SAGK3zG93WM2EWuoZDcRIC7ygPc5b7PDNGpFyC3obA/edit#gid=0).
Please share news of the release on the [identified platforms](https://github.com/bids-standard/bids-specification?tab=readme-ov-file#BIDS-communication-channels).
Please use our previous release posts as a guide.
2 changes: 1 addition & 1 deletion pdf_build_src/remove_admonitions.py
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ def remove_admonitions(
counter += 1
continue

if not line.startswith(indent):
if line != "\n" and not line.startswith(indent):
is_admonition = False

if is_admonition:
Expand Down
11 changes: 11 additions & 0 deletions pdf_build_src/tests/data/expected/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,3 +19,14 @@ Collapsible admonitions start with 3 questions marks (`???`).

Collapsible admonitions that will be shown as expanded
start with 3 questions marks and a plus sign (`???+`).



Let's see

- [`UK biobank`](https://github.com/bids-standard/bids-examples/tree/master/genetics_ukbb)
- foo bar [`UK biobank`](https://github.com/bids-standard/bids-examples/tree/master/genetics_ukbb)

More of the admonition

And here we resume normal thing.
13 changes: 13 additions & 0 deletions pdf_build_src/tests/data/input/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,3 +25,16 @@ come in different type. In aaddtion of the classical admonitions show above you

Collapsible admonitions that will be shown as expanded
start with 3 questions marks and a plus sign (`???+`).



!!! example "non ordered list should be handle propeler"

Let's see

- [`UK biobank`](https://github.com/bids-standard/bids-examples/tree/master/genetics_ukbb)
- foo bar [`UK biobank`](https://github.com/bids-standard/bids-examples/tree/master/genetics_ukbb)

More of the admonition

And here we resume normal thing.
94 changes: 71 additions & 23 deletions src/common-principles.md
Original file line number Diff line number Diff line change
Expand Up @@ -238,10 +238,14 @@ distinguish partial results from the raw data and share the latter.
See [Storage of derived datasets](#storage-of-derived-datasets) for more on
organizing derivatives.

Similar rules apply to source data, which is defined as data before
harmonization, reconstruction, and/or file format conversion (for example, E-Prime event logs or DICOM files).
Storing actual source files with the data is preferred over links to
external source repositories to maximize long term preservation,
Similar rules apply to source data, which is defined as data
before harmonization, reconstruction, and/or file format conversion
(for example, E-Prime event logs or DICOM files).
Retaining the source data is especially valuable
in a case when conversion fails to preserve crucial metadata
unique to specific acquisition setup.
Storing actual source files with the data is preferred over links
to external source repositories to maximize long term preservation,
which would suffer if an external repository would not be available anymore.
This specification currently does not go into the details of
recommending a particular naming scheme for including different types of
Expand Down Expand Up @@ -426,36 +430,54 @@ NIfTI header.
### Tabular files
Tabular data MUST be saved as tab delimited values (`.tsv`) files, that is, CSV
files where commas are replaced by tabs. Tabs MUST be true tab characters and
MUST NOT be a series of space characters. Each TSV file MUST start with a header
line listing the names of all columns (with the exception of
[physiological and other continuous recordings](modality-specific-files/physiological-and-other-continuous-recordings.md)
as well as [motion recording data](modality-specific-files/motion.md)).
Tabular data MUST be saved as plain-text, tab-delimited values (TSV) files
(with [extension `.tsv`](glossary.md#tsv-extensions)),
that is, [CSV files](https://en.wikipedia.org/wiki/Comma-separated_values) where commas are replaced by tab characters.
Tabs MUST be true tab characters and MUST NOT be a series of space characters.
Tabular data such as continuous physiology recordings typically containing
large numbers of rows MAY be saved as
[compressed tabular files (with extension `.tsv.gz`)](#compressed-tabular-files),
which are introduced below.
Plain-text TSV and compressed TSV are not interchangeable, that is, each section
of the specification prescribes which one MUST be used for the data type at
hand.
Each TSV file MUST start with a header line listing the names of all columns
with two exceptions:
1. [compressed tabular files](#compressed-tabular-files),
for which column names are defined in a sidecar metadata
[JSON object](https://www.json.org/json-en.html) described below; and
1. [motion recording data](modality-specific-files/motion.md),
which use plain-text TSV and columns are defined as described
in its corresponding section of the specifications.
It is RECOMMENDED that the column names in the header of the TSV file are
written in [`snake_case`](https://en.wikipedia.org/wiki/Snake_case) with the
first letter in lower case (for example, `variable_name`, not `Variable_name`).
As for all other data in the TSV files, column names MUST be separated with tabs.
Column names defined in the header MUST be separated with tabs as for the data contents.
Furthermore, column names MUST NOT be blank (that is, an empty string) and MUST NOT
be duplicated within a single TSV file.
String values containing tabs MUST be escaped using double
quotes. Missing and non-applicable values MUST be coded as `n/a`. Numerical
values MUST employ the dot (`.`) as decimal separator and MAY be specified
String values containing tabs MUST be escaped using double quotes.
Missing and non-applicable values MUST be coded as `n/a`.
Numerical values MUST employ the dot (`.`) as decimal separator and MAY be specified
in scientific notation, using `e` or `E` to separate the significand from the
exponent. TSV files MUST be in UTF-8 encoding.
exponent.
TSV files MUST be in UTF-8 encoding.
Example:
```Text
onset duration response_time correct stop_trial go_trial
200 200 0 n/a n/a n/a
onset duration response_time trial_type trial_extra
200 20.0 15.8 word 中国人
240 5.0 17.34e-1 visual n/a
```

**Note**: The TSV examples in this document (like the one above this note)
are occasionally formatted using space characters instead of tabs to improve
human readability.
Directly copying and then pasting these examples from the specification
for use in new BIDS datasets can lead to errors and is discouraged.
!!! warning "Attention"

The TSV examples in this document (like the one above this note) are occasionally
formatted using space characters instead of tabs to improve human readability.
Directly copying and then pasting these examples from the specification
for use in new BIDS datasets can lead to errors and is discouraged.

Tabular files MAY be optionally accompanied by a simple data dictionary
in the form of a JSON [object](https://www.json.org/json-en.html)
Expand Down Expand Up @@ -532,12 +554,38 @@ like in the example below.
"F": {
"Description": "Female",
"TermURL": "https://www.ncbi.nlm.nih.gov/mesh/68005260"
},
}
}
}
}
```

### Compressed tabular files

Large tabular information, such as physiological recordings, MUST be stored with
[compressed tab-delineated (TSV.GZ) files](glossary.md#tsvgz-extensions) when
so established by the specifications.
Rules for formatting plain-text tabular files apply to TSVGZ files with three exceptions:

1. The contents of TSVGZ files MUST be compressed with
[gzip](https://datatracker.ietf.org/doc/html/rfc1952).
1. Compressed tabular files MUST NOT contain a header in the first row
indicating the column names.
1. TSVGZ files MUST have an associated JSON file that defines the columns in the tabular file.

!!! warning "Attention"

In contrast to plain-text TSV files,
compressed tabular files files MUST NOT include a header line.
Column names MUST be provided in the JSON file with the
[`Columns`](glossary.md#columns-metadata) field.
Each column MAY additionally be described with a column description,
as described in [Tabular files](#tabular-files).

TSVGZ are header-less to improve compatibility with existing software
(for example, FSL, or PNM), and to facilitate the support for other file formats
in the future.

### Key-value files (dictionaries)

JavaScript Object Notation (JSON) files MUST be used for storing key-value
Expand Down
12 changes: 4 additions & 8 deletions src/modality-specific-files/electroencephalography.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,14 +42,10 @@ It is RECOMMENDED to use the European data format, or the BrainVision data
format. It is furthermore discouraged to use the other accepted formats over
these RECOMMENDED formats, particularly because there are conversion scripts
available in most commonly used programming languages to convert data into the
RECOMMENDED formats. The data in their original format, if different from the
supported formats, can be stored in the [`/sourcedata` directory](../common-principles.md#source-vs-raw-vs-derived-data).

The original data format is especially valuable in case conversion elicits the
loss of crucial metadata specific to manufacturers and specific EEG systems. We
also encourage users to provide additional meta information extracted from the
manufacturer specific data files in the sidecar JSON file. Other relevant files
MAY be included alongside the original EEG data in `/sourcedata`.
RECOMMENDED formats.

We encourage users to provide additional metadata extracted from the
manufacturer-specific data files in the sidecar JSON file.

Note the `RecordingType`, which depends on whether the data stream on disk
is interrupted or not.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -53,12 +53,8 @@ packages. Other formats that may be considered in the future should have a clear
added advantage over the existing formats and should have wide adoption in the
BIDS community.

The data format in which the data was originally stored is especially valuable
in case conversion elicits the loss of crucial metadata specific to
manufacturers and specific iEEG systems. We also encourage users to provide
additional meta information extracted from the manufacturer-specific data files
in the sidecar JSON file. Other relevant files MAY be included alongside the
original iEEG data in the [`/sourcedata` directory](../common-principles.md#source-vs-raw-vs-derived-data).
We encourage users to provide additional metadata extracted from the
manufacturer-specific data files in the sidecar JSON file.

Note the RecordingType, which depends on whether the data stream on disk is interrupted or not.
Continuous data is by definition 1 segment without interruption.
Expand Down
3 changes: 0 additions & 3 deletions src/modality-specific-files/microscopy.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,9 +59,6 @@ Microscopy raw data MUST be stored in one of the following formats:

- [OME-ZARR/NGFF](https://ngff.openmicroscopy.org/latest/) (`.ome.zarr` directories)

If different from PNG, TIFF, OME-TIFF, or OME-ZARR, the original unprocessed data in the native format MAY be
stored in the [`/sourcedata` directory](../common-principles.md#source-vs-raw-vs-derived-data).

### Modality suffixes
Microscopy data currently support the following imaging modalities:

Expand Down
8 changes: 2 additions & 6 deletions src/modality-specific-files/motion.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,18 +50,14 @@ The number of columns in `_motion.tsv` files MUST equal the number of rows
in the associated `_channels.tsv` file.
All relevant metadata about a tracking systems is stored in accompanying sidecar `*_tracksys-<label>_motion.json` file.

The source data from each tracking system in their original format, if different from `.tsv`,
can be stored in the [`/sourcedata` directory](../common-principles.md#source-vs-raw-vs-derived-data).
The original data format MAY hold more metadata than currently specified in the `*_motion.json` file.

When multiple tracking systems are used to record motion or motion capture is used alongside the recording of other BIDS modalities and recordings should be interpreted together,
it is advised to provide a possibility to synchronize recordings.
The preferred way to do so is to use the acquisition time of the first data point of recordings and
to store this information in the `acq_time` column of the [`*_scans.tsv`](../modality-agnostic-files.md#scans-file) file.
The Note that the [BIDS date time format](../common-principles.md#units) allows optional fractional seconds, which SHOULD be used to maximize the precision of the synchronization.
Note that the [BIDS date time format](../common-principles.md#units) allows optional fractional seconds, which SHOULD be used to maximize the precision of the synchronization.
Only if the precision of the synchronization is not high enough, the `*_events.tsv` file SHOULD be used to synchronize recordings.
In this file, the start- and stop time of the recording of a system are specified in relation to a system to synchronize with.
If more than two systems are to be synchronized, it is up to the user to indntify the "main" system.
If more than two systems are to be synchronized, it is up to the user to identify the "main" system.

In case a tracking system provides time information with every recorded sample,
these time information MAY be stored in form of latencies to recording onset (first sample) in the `*_motion.tsv` file.
Expand Down
9 changes: 0 additions & 9 deletions src/modality-specific-files/near-infrared-spectroscopy.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,15 +36,6 @@ replicated in the BIDS specification. This redundancy allows the data to be
easily parsed by humans and machines that do not have a SNIRF reader at hand,
which improves findability and tooling development.

Raw NIRS data in the native format, if different from SNIRF, can also
be stored in the [`/sourcedata`](../common-principles.md#source-vs-raw-vs-derived-data)
directory along with code to convert the data to
SNIRF in the [`/code`](../common-principles.md#storage-of-derived-datasets) directory.
The unprocessed raw data should be stored in
the manufacturer's format before any additional processing or conversion is applied.
Retaining the native file format is especially valuable in a case when conversion elicits the
loss of crucial metadata unique to specific manufacturers and NIRS systems.

### Terminology

For proper documentation of NIRS recording metadata, it is important
Expand Down
Original file line number Diff line number Diff line change
@@ -1,13 +1,12 @@
# Physiological and other continuous recordings

## Physiological recordings

Physiological recordings such as cardiac and respiratory signals and other
continuous measures (such as parameters of a film or audio stimuli) MAY be
specified using two files:

1. a [gzip](https://datatracker.ietf.org/doc/html/rfc1952)
compressed TSV file with data (without header line)

1. a JSON file for storing metadata fields (see below)
specified using a [compressed tabular file](../common-principles.md#compressed-tabular-files)
([TSVGZ file](../glossary.md#tsvgz-extensions)) and a corresponding
JSON file for storing metadata fields (see below).

!!! example "Example datasets"

Expand Down Expand Up @@ -38,8 +37,12 @@ before the suffix.
For example for the file `sub-control01_task-nback_run-1_bold.nii.gz`,
`<matches>` would correspond to `sub-control01_task-nback_run-1`.

Note that when supplying a `*_<physio|stim>.tsv.gz` file, an accompanying
`*_<physio|stim>.json` MUST be supplied as well.
!!! note "TSVGZ headers are specified in metadata files."

TSVGZ files MUST NOT include a header line,
as established by the [common-principles](../common-principles.md#compressed-tabular-files).
As a result, when supplying a `*_<physio|stim>.tsv.gz` file, an accompanying
`*_<physio|stim>.json` MUST be supplied as well.

The [`recording-<label>`](../appendices/entities.md#recording)
entity MAY be used to distinguish between several recording files.
Expand All @@ -51,7 +54,19 @@ measurements in a different sampling frequency.
Physiological recordings (including eyetracking) SHOULD use the `_physio`
suffix, and signals related to the stimulus SHOULD use `_stim` suffix.

The following table specifies metadata fields for the `*_<physio|stim>.json` file.
The following tables specify metadata fields for the `*_<physio|stim>.json` file.

<!-- This block generates a metadata table.
These tables are defined in
src/schema/rules/sidecars
The definitions of the fields specified in these tables may be found in
src/schema/objects/metadata.yaml
A guide for using macros can be found at
https://github.com/bids-standard/bids-specification/blob/master/macros_doc.md
-->
{{ MACROS___make_sidecar_table(["continuous.Continuous"]) }}

## Hardware information

<!-- This block generates a metadata table.
These tables are defined in
Expand All @@ -61,20 +76,11 @@ The definitions of the fields specified in these tables may be found in
A guide for using macros can be found at
https://github.com/bids-standard/bids-specification/blob/master/macros_doc.md
-->
{{ MACROS___make_sidecar_table(["continuous.Continuous", "continuous.Physio"]) }}
{{ MACROS___make_sidecar_table(["continuous.PhysioHardware"]) }}

Additional metadata may be included as in
[any TSV file](../common-principles.md#tabular-files) to specify, for
example, the units of the recorded time series.
Please note that, in contrast to other TSV files in BIDS, the TSV files specified
for physiological and other continuous recordings *do not* include a header
line.
Instead the name of columns are specified in the JSON file (see `Columns` field).
This is to improve compatibility with existing software (for example, FSL, PNM)
as well as to make support for other file formats possible in the future.
As in any TSV file, column names MUST NOT be blank (that is, an empty string),
and MUST NOT be duplicated within a single JSON file describing a headerless
TSV file.

Example `*_physio.tsv.gz`:

Expand Down
2 changes: 2 additions & 0 deletions src/modality-specific-files/task-events.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,8 @@ and a guide for using macros can be found at
-->
{{ MACROS___make_columns_table("task.TaskEvents") }}

The content of `events.tsv` files SHOULD be sorted by values in the `onset` column.

Note for MRI data:
If any acquired scans have been discarded before forming the imaging data file,
ensure that an `onset` of 0 corresponds to the time the first image was stored.
Expand Down
4 changes: 2 additions & 2 deletions src/schema/objects/formats.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -22,12 +22,12 @@ integer:
display_name: Integer
description: |
An integer which may be positive or negative.
pattern: '[+-]?\d+'
pattern: ' *[+-]?\d+ *'
number:
display_name: Number
description: |
A number which may be an integer or float, positive or negative.
pattern: '[+-]?([0-9]+([.][0-9]*)?|[.][0-9]+)([eE][+-]?[0-9]+)?'
pattern: ' *[+-]?([0-9]+([.][0-9]*)?|[.][0-9]+)([eE][+-]?[0-9]+)? *'
string:
display_name: String
description: |
Expand Down
13 changes: 13 additions & 0 deletions src/schema/rules/checks/events.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -27,3 +27,16 @@ StimulusFileMissing:
- columns.stim_file != null
checks:
- exists(columns.stim_file, "stimuli") == length(columns.stim_file) - count(columns.stim_file, "n/a")

SortedOnsets:
issue:
code: EVENT_ONSET_ORDER
message: |
The onset column in events.tsv files should be sorted.
level: warning
selectors:
- suffix == "events"
- extension == ".tsv"
checks:
# n/a values will likely cause false alarms if encountered. Consider alternatives.
- sorted(columns.onset) == columns.onset
2 changes: 1 addition & 1 deletion src/schema/rules/sidecars/continuous.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ Continuous:
Columns: required

# Other recommended metadata for physiological data
Physio:
PhysioHardware:
selectors:
- suffix == "physio"
fields:
Expand Down

0 comments on commit 25c1ed2

Please sign in to comment.