[FIX] Move rawdata/ into sourcedata/raw in alternative structure …

…example, clarify on naming of datasets themselves (bids-standard#1741) * RF: move `rawdata/` to `sourcedata/raw` in an example + make overall dataset to be BIDS dataset This is my take on an extended discussion about ambiguity of `rawdata/` example: https://github.com/bids-standard/bids-specification/pull/1734/files#r1534475631 * Minor rewording in description of sourcedata/ content Prior one bundled naming aspect under the same MUST. I separated into separate sentences, added explicit statement that BIDS does not prescribe a particular naming scheme for source data. And added explicit RECOMMENDED on the example how to organize/name files there. * Add one dataset_description.json into an example to make it explicitly a BIDS dataset * My take on dataset naming common principle * [DATALAD RUNCMD] Replace use of rawdata in tests with explicit 'noncompliant' === Do not change lines below === { "chain": [], "cmd": "sed -i -e s,rawdata,noncompliant,g tools/schemacode/bidsschematools/validator.py tools/schemacode/bidsschematools/tests/test_validator.py tools/schemacode/bidsschematools/tests/data/expected_bids_validator_xs_write.log", "exit": 0, "extra_inputs": [], "inputs": [], "outputs": [], "pwd": "." } ^^^ Do not change lines above ^^^ * Do not use e.g. * Move dataset_description.json in the example to be listed after folders * Remove the notion that example layout can in fact be a valid BIDS dataset * Use lower case "recommended" as not part of BIDS spec, and recommend underscores too * Make into a single sentence Co-authored-by: Chris Markiewicz <[email protected]> --------- Co-authored-by: Chris Markiewicz <[email protected]>
markmikkelsen · Apr 25, 2024 · 90ec07f · 90ec07f
1 parent cbb94e1
commit 90ec07f
Show file tree

Hide file tree

Showing 4 changed files with 33 additions and 28 deletions.
diff --git a/src/common-principles.md b/src/common-principles.md
@@ -97,6 +97,12 @@ and/or files (like `events.tsv`) are fully omitted *when they are unavailable or
 instead of specified with an `n/a` value, or included as an empty file
 (for example an empty `events.tsv` file with only the headers included).
 
+## Dataset naming
+
+BIDS does not prescribe a particular naming scheme for directories containing individual BIDS datasets.
+However, it is recommended to use a short descriptive name that reflects the content of the dataset, avoid spaces in the name, and use hyphens or underscores to separate words.
+BIDS datasets embedded within a larger BIDS dataset MAY follow some convention (see for example [Storage of derived datasets](#storage-of-derived-datasets)).
+
 ## Filesystem structure
 
 Data for each subject are placed in subdirectories named "`sub-<label>`",
@@ -252,9 +258,10 @@ recommending a particular naming scheme for including different types of
 source data (such as the raw event logs or parameter files, before conversion to BIDS).
 However, in the case that these data are to be included:
 
-1.  These data MUST be kept in separate `sourcedata` directory with a similar
-    directory structure as presented below for the BIDS-managed data. For example:
-    `sourcedata/sub-01/ses-pre/func/sub-01_ses-pre_task-rest_bold.dicom.tgz` or
+1.  These data MUST be kept in separate `sourcedata` directory.
+    BIDS does not prescribe a particular naming scheme for source data,
+    but it is recommended for it to follow BIDS naming convention where possible.
+    For example: `sourcedata/sub-01/ses-pre/func/sub-01_ses-pre_task-rest_bold.dicom.tgz` or
     `sourcedata/sub-01/ses-pre/func/MyEvent.sce`.
 
 1.  A README file SHOULD be found at the root of the `sourcedata` directory or the
@@ -271,41 +278,38 @@ A guide for using macros can be found at
 -->
 {{ MACROS___make_filetree_example(
     {
-    "my_dataset-1": {
-            "sourcedata": "",
-            "...": "",
-            "rawdata": {
-                "dataset_description.json": "",
-                "participants.tsv": "",
+    "my_project-1": {
+        "sourcedata": {
+            "dicoms": {},
+            "raw": {
                 "sub-01": {},
                 "sub-02": {},
                 "...": "",
+                "dataset_description.json": "",
+				"...": "",
             },
-            "derivatives": {
-                "pipeline_1": {},
-                "pipeline_2": {},
-                "...": "",
-            },
+            "..." : "",
+        },
+        "derivatives": {
+            "pipeline_1": {},
+            "pipeline_2": {},
+            "...": "",
         }
     }
+   }
 ) }}
 
-In this example, where `sourcedata` and `derivatives` are not nested inside
-`rawdata`, **only the `rawdata` subdirectory** needs to be a BIDS-compliant
-dataset.
+In this example, `sourcedata/dicoms` is not nested inside
+`sourcedata/raw`, **and only the `sourcedata/raw` subdirectory** is a BIDS-compliant dataset among `sourcedata/` subfolders.
 The subdirectories of `derivatives` MAY be BIDS-compliant derivatives datasets
 (see [Non-compliant derivatives](#non-compliant-derivatives) for further discussion).
-This specification does not prescribe anything about the contents of `sourcedata`
-directories in the above example - nor does it prescribe the `sourcedata`,
-`derivatives`, or `rawdata` directory names.
-The above example is just a convention that can be useful for organizing raw,
-source, and derived data while maintaining BIDS compliance of the raw data
-directory. When using this convention it is RECOMMENDED to set the `SourceDatasets`
+The above example is just a convention useful for organizing source, raw BIDS, and derived BIDS data while maintaining BIDS compliance of the raw data directory.
+When using this convention it is RECOMMENDED to set the `SourceDatasets`
 field in `dataset_description.json` of each subdirectory of `derivatives` to:
 
 ```JSON
 {
-  "SourceDatasets": [ {"URL": "../../rawdata/"} ]
+  "SourceDatasets": [ {"URL": "../../sourcedata/raw/"} ]
 }
 ```
 
@@ -406,6 +410,7 @@ Derivatives can be stored/distributed in two ways:
             "sub-01": {},
             "sub-02": {},
             "...": "",
+            "dataset_description.json": "",
             }
         }
     ) }}

diff --git a/tools/schemacode/bidsschematools/tests/data/expected_bids_validator_xs_write.log b/tools/schemacode/bidsschematools/tests/data/expected_bids_validator_xs_write.log
@@ -3,5 +3,5 @@ SUMMARY:
 0 out of 1 files were successfully validated, using the following regular expressions:
 	- `.*?/sub-(?P<subject>[0-9a-zA-Z]+)/(|ses-(?P<session>[0-9a-zA-Z]+)/)anat/sub-(?P=subject)(|_ses-(?P=session))(|_acq-(?P<acquisition>[0-9a-zA-Z]+))(|_ce-(?P<ceagent>[0-9a-zA-Z]+))(|_rec-(?P<reconstruction>[0-9a-zA-Z]+))(|_run-(?P<run>[0-9a-zA-Z]+))(|_part-(?P<part>(mag|phase|real|imag)))_(T1w|T2w|PDw|T2starw|FLAIR|inplaneT1|inplaneT2|PDT2|angio|T2star)\.(nii.gz|nii|json)$`
 The following files were not matched by any regex schema entry:
-	* `/home/chymera/.data2/datalad/000026/rawdata/sub-EXC022/anat/sub-EXC022_ses-MRI_flip-1_VFA.nii.gz
+	* `/home/chymera/.data2/datalad/000026/noncompliant/sub-EXC022/anat/sub-EXC022_ses-MRI_flip-1_VFA.nii.gz
 The following mandatory regex schema entries did not match any files:
diff --git a/tools/schemacode/bidsschematools/tests/test_validator.py b/tools/schemacode/bidsschematools/tests/test_validator.py
@@ -64,11 +64,11 @@ def test_write_report(tmp_path):
     ]
     validation_result["path_tracking"] = [
         "/home/chymera/.data2/datalad/000026/"
-        "rawdata/sub-EXC022/anat/sub-EXC022_ses-MRI_flip-1_VFA.nii.gz"
+        "noncompliant/sub-EXC022/anat/sub-EXC022_ses-MRI_flip-1_VFA.nii.gz"
     ]
     validation_result["path_listing"] = [
         "/home/chymera/.data2/datalad/000026/"
-        "rawdata/sub-EXC022/anat/sub-EXC022_ses-MRI_flip-1_VFA.nii.gz"
+        "noncompliant/sub-EXC022/anat/sub-EXC022_ses-MRI_flip-1_VFA.nii.gz"
     ]
 
     report_path = tmp_path / "output_bids_validator_xs_write.log"

diff --git a/tools/schemacode/bidsschematools/validator.py b/tools/schemacode/bidsschematools/validator.py
@@ -599,7 +599,7 @@ def validate_bids(
     ::
 
         from bidsschematools import validator
-        bids_paths = '~/.data2/datalad/000026/rawdata'
+        bids_paths = '~/.data2/datalad/000026/noncompliant'
         validator.validate_bids(bids_paths)
 
     Notes