Skip to content

Commit

Permalink
enh: add compressed TSV files to the common principles
Browse files Browse the repository at this point in the history
Their description is hedged within the physiological recordings so
upcasting them to the common principles seems important.
  • Loading branch information
oesteban committed Mar 25, 2024
1 parent bd08602 commit ee4966b
Show file tree
Hide file tree
Showing 2 changed files with 33 additions and 17 deletions.
26 changes: 26 additions & 0 deletions src/common-principles.md
Original file line number Diff line number Diff line change
Expand Up @@ -542,6 +542,32 @@ like in the example below.
}
```

### Compressed tabular files

Large tabular information such as physiological recordings MAY be stored with
[compressed tab-delineated (TSVGZ) files](glossary.md#tsvgz-extensions).
Rules for formatting plain-text tabular files apply to TSVGZ files with three exceptions:

1. The contents of TSVGZ files SHOULD be compressed with
[gzip](https://datatracker.ietf.org/doc/html/rfc1952).
1. Compressed tabular files SHOULD NOT contain a header in the first row
indicating the column names.
1. TSVGZ files SHOULD be accompanied by a JSON file with the same name as their
corresponding tabular file but with a `.json` extension.

???+ warning "Columns of TSVGZ files MUST be defined in the corresponding JSON sidecar and the tabular content MUST NOT include a header line."

In contrast to plain-text TSV files, compressed tabular files files
MUST NOT include a header line.
Column names MUST be specified in the JSON file following the
[`Columns` metadata](glossary.md#columns-metadata) specifications.
As plain-text tabular data, column names MUST NOT be blank (that is, an empty string),
and MUST NOT be duplicated within a single JSON file describing a TSVGZ file.

TSVGZ are header-less to improve compatibility with existing software
(for example, FSL, or PNM), and to facilitate the support for other file formats
in the future.

### Key-value files (dictionaries)

JavaScript Object Notation (JSON) files MUST be used for storing key-value
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,9 @@

Physiological recordings such as cardiac and respiratory signals and other
continuous measures (such as parameters of a film or audio stimuli) MAY be
specified using two files:

1. a [gzip](https://datatracker.ietf.org/doc/html/rfc1952)
compressed TSV file with data (without header line)

1. a JSON file for storing metadata fields (see below)
specified using a [compressed tabular file](../common-principles.md#compressed-tabular-files)
([TSVGZ file](../glossary.md#tsvgz-extensions)) and a corresponding
JSON file for storing metadata fields (see below).

!!! example "Example datasets"

Expand Down Expand Up @@ -38,8 +35,10 @@ before the suffix.
For example for the file `sub-control01_task-nback_run-1_bold.nii.gz`,
`<matches>` would correspond to `sub-control01_task-nback_run-1`.

Note that when supplying a `*_<physio|stim>.tsv.gz` file, an accompanying
`*_<physio|stim>.json` MUST be supplied as well.
!!! warning "TSVGZ files SHOULD NOT include a header line (as established by the [common-principles](../common-principles.md#compressed-tabular-files))"

As a result, when supplying a `*_<physio|stim>.tsv.gz` file, an accompanying
`*_<physio|stim>.json` MUST be supplied as well.

The [`recording-<label>`](../appendices/entities.md#recording)
entity MAY be used to distinguish between several recording files.
Expand All @@ -66,15 +65,6 @@ A guide for using macros can be found at
Additional metadata may be included as in
[any TSV file](../common-principles.md#tabular-files) to specify, for
example, the units of the recorded time series.
Please note that, in contrast to other TSV files in BIDS, the TSV files specified
for physiological and other continuous recordings *do not* include a header
line.
Instead the name of columns are specified in the JSON file (see `Columns` field).
This is to improve compatibility with existing software (for example, FSL, PNM)
as well as to make support for other file formats possible in the future.
As in any TSV file, column names MUST NOT be blank (that is, an empty string),
and MUST NOT be duplicated within a single JSON file describing a headerless
TSV file.

Example `*_physio.tsv.gz`:

Expand Down

0 comments on commit ee4966b

Please sign in to comment.