Skip to content

Commit

Permalink
[FIX] Move rawdata/ into sourcedata/raw in alternative structure …
Browse files Browse the repository at this point in the history
…example, clarify on naming of datasets themselves (bids-standard#1741)

* RF: move `rawdata/` to `sourcedata/raw` in an example + make overall dataset to be BIDS dataset

This is my take on an extended discussion about ambiguity of
`rawdata/` example:
https://github.com/bids-standard/bids-specification/pull/1734/files#r1534475631

* Minor rewording in description of sourcedata/ content

Prior one bundled naming aspect under the same MUST. I separated into
separate sentences, added explicit statement that BIDS does not
prescribe a particular naming scheme for source data. And added
explicit RECOMMENDED on the example how to organize/name files there.

* Add one dataset_description.json into an example to make it explicitly a BIDS dataset

* My take on dataset naming common principle

* [DATALAD RUNCMD] Replace use of rawdata in tests with explicit 'noncompliant'

=== Do not change lines below ===
{
 "chain": [],
 "cmd": "sed -i -e s,rawdata,noncompliant,g tools/schemacode/bidsschematools/validator.py tools/schemacode/bidsschematools/tests/test_validator.py tools/schemacode/bidsschematools/tests/data/expected_bids_validator_xs_write.log",
 "exit": 0,
 "extra_inputs": [],
 "inputs": [],
 "outputs": [],
 "pwd": "."
}
^^^ Do not change lines above ^^^

* Do not use e.g.

* Move dataset_description.json in the example to be listed after folders

* Remove the notion that example layout can in fact be a valid BIDS dataset

* Use lower case "recommended" as not part of BIDS spec, and recommend underscores too

* Make into a single sentence

Co-authored-by: Chris Markiewicz <[email protected]>

---------

Co-authored-by: Chris Markiewicz <[email protected]>
  • Loading branch information
yarikoptic and effigies authored Apr 25, 2024
1 parent cbb94e1 commit 90ec07f
Show file tree
Hide file tree
Showing 4 changed files with 33 additions and 28 deletions.
53 changes: 29 additions & 24 deletions src/common-principles.md
Original file line number Diff line number Diff line change
Expand Up @@ -97,6 +97,12 @@ and/or files (like `events.tsv`) are fully omitted *when they are unavailable or
instead of specified with an `n/a` value, or included as an empty file
(for example an empty `events.tsv` file with only the headers included).

## Dataset naming

BIDS does not prescribe a particular naming scheme for directories containing individual BIDS datasets.
However, it is recommended to use a short descriptive name that reflects the content of the dataset, avoid spaces in the name, and use hyphens or underscores to separate words.
BIDS datasets embedded within a larger BIDS dataset MAY follow some convention (see for example [Storage of derived datasets](#storage-of-derived-datasets)).

## Filesystem structure

Data for each subject are placed in subdirectories named "`sub-<label>`",
Expand Down Expand Up @@ -252,9 +258,10 @@ recommending a particular naming scheme for including different types of
source data (such as the raw event logs or parameter files, before conversion to BIDS).
However, in the case that these data are to be included:

1. These data MUST be kept in separate `sourcedata` directory with a similar
directory structure as presented below for the BIDS-managed data. For example:
`sourcedata/sub-01/ses-pre/func/sub-01_ses-pre_task-rest_bold.dicom.tgz` or
1. These data MUST be kept in separate `sourcedata` directory.
BIDS does not prescribe a particular naming scheme for source data,
but it is recommended for it to follow BIDS naming convention where possible.
For example: `sourcedata/sub-01/ses-pre/func/sub-01_ses-pre_task-rest_bold.dicom.tgz` or
`sourcedata/sub-01/ses-pre/func/MyEvent.sce`.

1. A README file SHOULD be found at the root of the `sourcedata` directory or the
Expand All @@ -271,41 +278,38 @@ A guide for using macros can be found at
-->
{{ MACROS___make_filetree_example(
{
"my_dataset-1": {
"sourcedata": "",
"...": "",
"rawdata": {
"dataset_description.json": "",
"participants.tsv": "",
"my_project-1": {
"sourcedata": {
"dicoms": {},
"raw": {
"sub-01": {},
"sub-02": {},
"...": "",
"dataset_description.json": "",
"...": "",
},
"derivatives": {
"pipeline_1": {},
"pipeline_2": {},
"...": "",
},
"..." : "",
},
"derivatives": {
"pipeline_1": {},
"pipeline_2": {},
"...": "",
}
}
}
) }}

In this example, where `sourcedata` and `derivatives` are not nested inside
`rawdata`, **only the `rawdata` subdirectory** needs to be a BIDS-compliant
dataset.
In this example, `sourcedata/dicoms` is not nested inside
`sourcedata/raw`, **and only the `sourcedata/raw` subdirectory** is a BIDS-compliant dataset among `sourcedata/` subfolders.
The subdirectories of `derivatives` MAY be BIDS-compliant derivatives datasets
(see [Non-compliant derivatives](#non-compliant-derivatives) for further discussion).
This specification does not prescribe anything about the contents of `sourcedata`
directories in the above example - nor does it prescribe the `sourcedata`,
`derivatives`, or `rawdata` directory names.
The above example is just a convention that can be useful for organizing raw,
source, and derived data while maintaining BIDS compliance of the raw data
directory. When using this convention it is RECOMMENDED to set the `SourceDatasets`
The above example is just a convention useful for organizing source, raw BIDS, and derived BIDS data while maintaining BIDS compliance of the raw data directory.
When using this convention it is RECOMMENDED to set the `SourceDatasets`
field in `dataset_description.json` of each subdirectory of `derivatives` to:

```JSON
{
"SourceDatasets": [ {"URL": "../../rawdata/"} ]
"SourceDatasets": [ {"URL": "../../sourcedata/raw/"} ]
}
```

Expand Down Expand Up @@ -406,6 +410,7 @@ Derivatives can be stored/distributed in two ways:
"sub-01": {},
"sub-02": {},
"...": "",
"dataset_description.json": "",
}
}
) }}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,5 +3,5 @@ SUMMARY:
0 out of 1 files were successfully validated, using the following regular expressions:
- `.*?/sub-(?P<subject>[0-9a-zA-Z]+)/(|ses-(?P<session>[0-9a-zA-Z]+)/)anat/sub-(?P=subject)(|_ses-(?P=session))(|_acq-(?P<acquisition>[0-9a-zA-Z]+))(|_ce-(?P<ceagent>[0-9a-zA-Z]+))(|_rec-(?P<reconstruction>[0-9a-zA-Z]+))(|_run-(?P<run>[0-9a-zA-Z]+))(|_part-(?P<part>(mag|phase|real|imag)))_(T1w|T2w|PDw|T2starw|FLAIR|inplaneT1|inplaneT2|PDT2|angio|T2star)\.(nii.gz|nii|json)$`
The following files were not matched by any regex schema entry:
* `/home/chymera/.data2/datalad/000026/rawdata/sub-EXC022/anat/sub-EXC022_ses-MRI_flip-1_VFA.nii.gz
* `/home/chymera/.data2/datalad/000026/noncompliant/sub-EXC022/anat/sub-EXC022_ses-MRI_flip-1_VFA.nii.gz
The following mandatory regex schema entries did not match any files:
4 changes: 2 additions & 2 deletions tools/schemacode/bidsschematools/tests/test_validator.py
Original file line number Diff line number Diff line change
Expand Up @@ -64,11 +64,11 @@ def test_write_report(tmp_path):
]
validation_result["path_tracking"] = [
"/home/chymera/.data2/datalad/000026/"
"rawdata/sub-EXC022/anat/sub-EXC022_ses-MRI_flip-1_VFA.nii.gz"
"noncompliant/sub-EXC022/anat/sub-EXC022_ses-MRI_flip-1_VFA.nii.gz"
]
validation_result["path_listing"] = [
"/home/chymera/.data2/datalad/000026/"
"rawdata/sub-EXC022/anat/sub-EXC022_ses-MRI_flip-1_VFA.nii.gz"
"noncompliant/sub-EXC022/anat/sub-EXC022_ses-MRI_flip-1_VFA.nii.gz"
]

report_path = tmp_path / "output_bids_validator_xs_write.log"
Expand Down
2 changes: 1 addition & 1 deletion tools/schemacode/bidsschematools/validator.py
Original file line number Diff line number Diff line change
Expand Up @@ -599,7 +599,7 @@ def validate_bids(
::
from bidsschematools import validator
bids_paths = '~/.data2/datalad/000026/rawdata'
bids_paths = '~/.data2/datalad/000026/noncompliant'
validator.validate_bids(bids_paths)
Notes
Expand Down

0 comments on commit 90ec07f

Please sign in to comment.