Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SCHEMA] Add patterns for format validation #885

Merged
merged 34 commits into from
Mar 24, 2022
Merged
Changes from 11 commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
7262fda
Draft formats file.
tsalo Sep 24, 2021
f9a5d15
Limit time zones to between 3 and 4 characters.
tsalo Sep 24, 2021
0d053d8
Add stimuli_relative format.
tsalo Sep 24, 2021
f53d806
Don't allow absolute paths or problematic start folders.
tsalo Sep 24, 2021
e6aa4b3
Use URI regex.
tsalo Sep 24, 2021
35cf7c2
Fix(?) BIDS URI regex.
tsalo Sep 24, 2021
40fa1d2
Update formats.yaml
tsalo Sep 24, 2021
00aef74
Don't allow just zeros in index regex.
tsalo Sep 24, 2021
7143ee8
pattern --> format.
tsalo Sep 24, 2021
a3fda8e
Document the rules for the different formats.
tsalo Sep 24, 2021
5b9ccbb
Merge branch 'bids-standard:master' into format-rules
tsalo Oct 5, 2021
dfd3057
Remove fancy n/a handling.
tsalo Oct 14, 2021
89b37e6
Move formats file into rules folder.
tsalo Oct 14, 2021
25944e7
Actually, the individual formats are really objects.
tsalo Oct 14, 2021
3189a88
Add patterns for rrid and time.
tsalo Oct 14, 2021
a2e56b0
Be more specific with RRIDs.
tsalo Oct 15, 2021
3a2eac3
Remove anchors around patterns and add quotes.
tsalo Dec 6, 2021
f9c316c
Replace double-quotes with single-quotes.
tsalo Dec 6, 2021
998742a
Merge branch 'master' into format-rules
tsalo Jan 5, 2022
cc502c5
Draft a test for formats.
tsalo Jan 5, 2022
30e7358
Add names for tests. They probably won't ever be used.
tsalo Jan 5, 2022
fa6256c
Add more valid pattern tests.
tsalo Jan 5, 2022
b57d891
Fix mistake in date/datetime patterns.
tsalo Jan 5, 2022
a9abc49
Make time part of datetimes more restrictive.
tsalo Jan 5, 2022
755285a
Make datetimes more robust.
tsalo Jan 5, 2022
b92aa0f
Add more bad-pattern checks.
tsalo Jan 5, 2022
9e2b6cf
Apply suggestions from code review
tsalo Jan 5, 2022
245e149
Merge branch 'format-rules' of https://github.com/tsalo/bids-specific…
tsalo Jan 5, 2022
e5e3563
Test the new pattern changes.
tsalo Jan 5, 2022
9284172
Merge branch 'master' into format-rules
tsalo Feb 1, 2022
aea7982
Apply suggestions from code review
tsalo Feb 2, 2022
84061cc
Reorder formats alphabetically within each section.
tsalo Feb 2, 2022
df0dd34
Run black on test and improve check.
tsalo Feb 2, 2022
4a0ad3f
Comment out failing pattern, but add a note about it.
tsalo Feb 3, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
88 changes: 88 additions & 0 deletions src/schema/formats.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
---
# This file defines valid patterns for different formats
# Entity patterns
label:
description: |
Freeform labels without special characters.
pattern: ^[0-9a-zA-Z]+$
index:
description: |
Non-negative, non-zero integers, optionally prefixed with leading zeros for sortability.
An index may not be all zeros.
pattern: ^[0-9]*[1-9]+[0-9]*$
# Metadata types
string:
description: |
The basic string type (not a specific format).
This should allow any free-form string *except* "n/a".
pattern: ^(?!(n/a)$).*$
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is related to #876

This should allow any free-form string except "n/a".

I am not sure whether disallowing n/a as a string for free-form text fields would break some datasets. For example for the required EEGReference field, users could currently pass n/a according to bids-validator:

https://github.com/bids-standard/bids-validator/blob/75c54e4f7a17bf58c7cfc47c9698eb4e0841315b/bids-validator/validators/json/schemas/eeg.json#L25

Whether or not that's a good idea should be discussed in #876, this is just a note that what you introduce here has not been discussed and has not been formally part of the spec 🤔

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My hope is that any cases where "n/a" is acceptable will have "n/a" as an explicit option within the schema. I've tried to add it as an option in all of the cases that I've seen like that, but that doesn't mean we're close to having complete coverage. I can drop any fancy "n/a" handling from this PR if you'd rather wait to tackle it within #876 though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've dropped the special "n/a" handling in dfd3057, but I do think we will want to distinguish between "n/a"s and freeform strings in the long run.

integer:
description: |
An integer which may be positive or negative.
pattern: ^[+-]?\d+$
number:
description: |
A number which may be an integer or float, positive or negative.
pattern: ^[+-]?([0-9]+([.][0-9]*)?|[.][0-9]+)$
boolean:
description: |
A boolean.
Must be either "true" or "false".
pattern: ^(true|false)$
# String formats
date:
tsalo marked this conversation as resolved.
Show resolved Hide resolved
description: |
A date in the form YYYY-MM-DD[Z],
where [Z] is an optional, valid timezone code.
pattern: ^[0-9]{4}-[0-9]{2}-[0-9]{2}?([A-Z]{3,4})$
datetime:
description: |
A datetime in the form YYYY-MM-DDThh:mm:ss[.000000][Z],
where [.000000] is an optional subsecond resolution between 1 and 6 decimal points,
and [Z] is an optional, valid timezone code.
pattern: ^[0-9]{4}-[0-9]{2}-[0-9]{2}T[0-9]{2}:[0-9]{2}:[0-9]{2}?(\.[0-9]{1,6})?([A-Z]{3,4})$
unit:
description: |
A unit.
Currently this matches any string that isn't "n/a".

TODO: Somehow reference the actual unit options in the Units appendix.
pattern: ^(?!(n/a)$).*$
stimuli_relative:
description: |
A path to a stimulus file, relative to a `/stimuli` folder somewhere.

The validation for this format is minimal.
It simply ensures that the value is a string with any characters that may appear in a valid path,
without starting with "/" (an absolute path) or "stimuli/"
(a relative path starting with the stimuli folder, rather than relative to that folder).
pattern: ^(?!/)(?!stimuli/)[0-9a-zA-Z/\_\-\.]+$
dataset_relative:
description: |
A path to a file, relative to the dataset folder.

The validation for this format is minimal.
It simply ensures that the value is a string with any characters that may appear in a valid path,
without starting with "/" (an absolute path).
pattern: ^(?!/)[0-9a-zA-Z/\_\-\.]+$
participant_relative:
description: |
A path to a file, relative to the participant's folder in the dataset.

The validation for this format is minimal.
It simply ensures that the value is a string with any characters that may appear in a valid path,
without starting with "/" (an absolute path) or "sub/"
(a relative path starting with the participant folder, rather than relative to that folder).
pattern: ^(?!/)(?!sub-)[0-9a-zA-Z/\_\-\.]+$
uri:
description: |
A uniform resource indicator.
pattern: ^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?
bids_uri:
description: |
A BIDS uniform resource indicator.

The validation for this format is minimal.
It simply ensures that the value is a string with any characters that may appear in a valid URI,
starting with "bids:".
pattern: ^bids:[0-9a-zA-Z/#:\?\_\-\.]+$