Skip to content

Commit

Permalink
DOC: Update schema README
Browse files Browse the repository at this point in the history
  • Loading branch information
effigies committed Sep 20, 2022
1 parent d7bb3e7 commit 649c391
Showing 1 changed file with 116 additions and 52 deletions.
168 changes: 116 additions & 52 deletions src/schema/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,17 +17,16 @@ src/schema
├── meta
│   └── context.yaml
├── objects
│   ├── associated_data.yaml
│   ├── columns.yaml
│   ├── ...
│   └── top_level_files.yaml
│   └── suffixes.yaml
├── rules
│   ├── associated_data.yaml
│   ├── checks
│   │   ├── asl.yaml
│   │   ├── ...
│   │   └── mri.yaml
│   ├── ...
│   └── top_level_files.yaml
│   └── suffixes.yaml
└── SCHEMA_VERSION
```

Expand Down Expand Up @@ -310,8 +309,7 @@ The namespaces are:
| `objects.suffixes` | Filename suffixes that describe the contents of the file | Value terms |
| `objects.extensions` | Filename component that describe the format of the file | Value terms |
| `objects.formats` | Terms that define the forms values (for example, in metadata) might take | Formats |
| `objects.associated_data` | Directories that may appear at the root of a dataset | Files |
| `objects.top_level_files` | Files that may appear at the root of a dataset | Files |
| `objects.files` | Files and directories that may appear at the root of a dataset | Files |

Because these objects vary, the contents of each namespace can vary.
Common fields to all objects:
Expand Down Expand Up @@ -502,17 +500,12 @@ The convention can be summed up in the following rules:
| `description` | Term definition |
| `pattern` | Regular expression defining format |

- `objects.associated_data`
| Field | Description |
| -------------- | ------------------- |
| `display_name` | Human-friendly name |
| `description` | Term definition |

- `objects.top_level_files`
| Field | Description |
| -------------- | ------------------- |
| `display_name` | Human-friendly name |
| `description` | Term definition |
- `objects.files`
| Field | Description |
| -------------- | ------------------------------------------------------------------------------------ |
| `display_name` | Human-friendly name |
| `description` | Term definition |
| `file_type` | Indicator that the file is a regular file (`"regular"`) or directory (`"directory"`) |

## Rule files

Expand Down Expand Up @@ -559,10 +552,86 @@ duplication or conflict.
A significant portion of BIDS is devoted to the naming of files, and almost all file names consist
of entities, a suffix, an extension, and a data type. Exceptions will be noted below.

#### Data types
`rules.files` contains the following subdivisions.

| Namespace | Description |
| --------------------------- | ----------------------------------------------------------------------------------------- |
| `rules.files.common.core` | Files and directories that reside at the top level of datasets |
| `rules.files.common.tables` | Tabular metadata files that associate metadata with entities |
| `rules.files.raw.*` | Raw data and metadata files that have entities, suffixes, datatypes and extensions |
| `rules.files.deriv.*` | Derivative data and metadata files that have entities, suffixes, datatypes and extensions |

#### Core files and directories

`rules.datatypes` contains a series of related rules, grouped by the `datatype` path component.
All such files take the form:
`rules.files.common.core` describes files that have little-to-no variability in their form.
These either have a single `path` field, or a `stem` field and a list of `extensions`:

| Field | Description |
| ------------ | ------------------------------------------------------------------------------------------------------------- |
| `level` | Requirement level of file, one of (`optional`, `recommended`, `required`, `deprecated`) |
| `path` | Location of file, relative to dataset root; mutually exclusive with `stem` and `extensions` |
| `stem` | Name of file, relative to dataset root, up to but not including the extension; mutually exclusive with `path` |
| `extensions` | List of valid extension strings, including the initial dot (`.`); mutually exclusive with `path` |

These are the entries for `dataset_description.json` and `README`:

```YAML
dataset_description:
level: required
path: dataset_description.json
README:
level: required
stem: README
extensions:
- ''
- .md
- .rst
- .txt
```

Here, `README` and `README.md` are both valid, while only `dataset_description.json` is permitted.

#### Tabular metadata files

`rules.files.common.tables` describes TSV files and their associated metadata,
including `participants.tsv`, `samples.tsv`, `*_sessions.tsv` and `*_scans.tsv`.
The first two use the `stem` field, while the latter two specify the entities used
to construct the filename.
The valid fields are:

| Field | Description |
| ------------ | ----------------------------------------------------------------------------------------------------------------- |
| `level` | Requirement level of file, one of (`optional`, `recommended`, `required`, `deprecated`) |
| `stem` | Name of file, relative to dataset root, up to but not including the extension; mutually exclusive with `entities` |
| `entities` | Object where the keys are entries in `objects.entities`. The value is a requirement level. |
| `extensions` | List of valid extension strings, including the initial dot (`.`) |

For example:

```YAML
participants:
level: optional
stem: participants
extensions:
- .tsv
- .json
sessions:
suffixes:
- sessions
extensions:
- .tsv
- .json
entities:
subject: required
```

Note that these files do not have a `datatype`, but otherwise follow the same rules as above.

#### BIDS filenames

`rules.files.raw` and `rules.files.deriv` contain series of related rules.
These are largely grouped by datatype, but file types that appear in multiple locations may be grouped together.
The files described take the form:

```plain
[sub-<label>/][ses-<label>/]<datatype>/<entities>_<suffix><extension>
Expand Down Expand Up @@ -626,46 +695,41 @@ Specifically, files in the first group may have `task`, `run`, `processing`, and
while files in the second group may not.
Also, when files in the second group have the `acq` entity, the associated value MUST be `crosstalk`.

#### Tabular metadata

Sessions files (`_sessions.tsv`) and scans files `_scans.tsv`) are defined in
`rules.tabular_metadata`, for example:
A common derivatives type is preprocessed data, where the type of the generated data is the same
as the input data. BIDS Derivatives specifies that these files may be distinguished from raw data
with the new entities `space-<label>` or `desc-<label>`. This rule is encoded:

```YAML
sessions:
suffixes:
- sessions
extensions:
- .tsv
- .json
```yaml
meg_meg_common:
$ref: rules.files.raw.meg.meg
entities:
subject: required
$ref: rules.files.raw.meg.meg.entities
space: optional
description: optional
```

Note that these files do not have a `datatype`, but otherwise follow the same rules as above.
When expanded, this becomes:

#### Top-level files

Top-level files follow somewhat different rules (and are likely to change). Currently, the
rule name is the name of the file, and they contain `required` and `extensions` properties.
For example:

```YAML
README:
required: true
extensions:
- ''
- .md
- .rst
- .txt
CHANGES:
required: false
```yaml
meg_meg_common:
suffixes:
- meg
extensions:
- ''
- .fif
datatypes:
- meg
entities:
subject: required
session: optional
task: required
acquisition: optional
run: optional
processing: optional
split: optional
space: optional
description: optional
```

Here, `README` and `README.md` are valid, while only `CHANGES` is permitted.

### Sidecar and tabular data rules

Tabular data and JSON sidecar files follow a similar pattern:
Expand Down

0 comments on commit 649c391

Please sign in to comment.