From ee4966b48240102d00738a2b57c0fe47b76c2900 Mon Sep 17 00:00:00 2001 From: Oscar Esteban Date: Mon, 25 Mar 2024 07:34:35 +0100 Subject: [PATCH 01/17] enh: add compressed TSV files to the common principles Their description is hedged within the physiological recordings so upcasting them to the common principles seems important. --- src/common-principles.md | 26 +++++++++++++++++++ ...logical-and-other-continuous-recordings.md | 24 +++++------------ 2 files changed, 33 insertions(+), 17 deletions(-) diff --git a/src/common-principles.md b/src/common-principles.md index 1b8c09a407..df007b12d7 100644 --- a/src/common-principles.md +++ b/src/common-principles.md @@ -542,6 +542,32 @@ like in the example below. } ``` +### Compressed tabular files + +Large tabular information such as physiological recordings MAY be stored with +[compressed tab-delineated (TSVGZ) files](glossary.md#tsvgz-extensions). +Rules for formatting plain-text tabular files apply to TSVGZ files with three exceptions: + +1. The contents of TSVGZ files SHOULD be compressed with + [gzip](https://datatracker.ietf.org/doc/html/rfc1952). +1. Compressed tabular files SHOULD NOT contain a header in the first row + indicating the column names. +1. TSVGZ files SHOULD be accompanied by a JSON file with the same name as their + corresponding tabular file but with a `.json` extension. + +???+ warning "Columns of TSVGZ files MUST be defined in the corresponding JSON sidecar and the tabular content MUST NOT include a header line." + + In contrast to plain-text TSV files, compressed tabular files files + MUST NOT include a header line. + Column names MUST be specified in the JSON file following the + [`Columns` metadata](glossary.md#columns-metadata) specifications. + As plain-text tabular data, column names MUST NOT be blank (that is, an empty string), + and MUST NOT be duplicated within a single JSON file describing a TSVGZ file. + + TSVGZ are header-less to improve compatibility with existing software + (for example, FSL, or PNM), and to facilitate the support for other file formats + in the future. + ### Key-value files (dictionaries) JavaScript Object Notation (JSON) files MUST be used for storing key-value diff --git a/src/modality-specific-files/physiological-and-other-continuous-recordings.md b/src/modality-specific-files/physiological-and-other-continuous-recordings.md index 38f9842ae3..f3bc4f27e2 100644 --- a/src/modality-specific-files/physiological-and-other-continuous-recordings.md +++ b/src/modality-specific-files/physiological-and-other-continuous-recordings.md @@ -2,12 +2,9 @@ Physiological recordings such as cardiac and respiratory signals and other continuous measures (such as parameters of a film or audio stimuli) MAY be -specified using two files: - -1. a [gzip](https://datatracker.ietf.org/doc/html/rfc1952) - compressed TSV file with data (without header line) - -1. a JSON file for storing metadata fields (see below) +specified using a [compressed tabular file](../common-principles.md#compressed-tabular-files) +([TSVGZ file](../glossary.md#tsvgz-extensions)) and a corresponding +JSON file for storing metadata fields (see below). !!! example "Example datasets" @@ -38,8 +35,10 @@ before the suffix. For example for the file `sub-control01_task-nback_run-1_bold.nii.gz`, `` would correspond to `sub-control01_task-nback_run-1`. -Note that when supplying a `*_.tsv.gz` file, an accompanying -`*_.json` MUST be supplied as well. +!!! warning "TSVGZ files SHOULD NOT include a header line (as established by the [common-principles](../common-principles.md#compressed-tabular-files))" + + As a result, when supplying a `*_.tsv.gz` file, an accompanying + `*_.json` MUST be supplied as well. The [`recording-