-
Notifications
You must be signed in to change notification settings - Fork 169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SCHEMA] Add patterns for format validation #885
Merged
Merged
Changes from 11 commits
Commits
Show all changes
34 commits
Select commit
Hold shift + click to select a range
7262fda
Draft formats file.
tsalo f9a5d15
Limit time zones to between 3 and 4 characters.
tsalo 0d053d8
Add stimuli_relative format.
tsalo f53d806
Don't allow absolute paths or problematic start folders.
tsalo e6aa4b3
Use URI regex.
tsalo 35cf7c2
Fix(?) BIDS URI regex.
tsalo 40fa1d2
Update formats.yaml
tsalo 00aef74
Don't allow just zeros in index regex.
tsalo 7143ee8
pattern --> format.
tsalo a3fda8e
Document the rules for the different formats.
tsalo 5b9ccbb
Merge branch 'bids-standard:master' into format-rules
tsalo dfd3057
Remove fancy n/a handling.
tsalo 89b37e6
Move formats file into rules folder.
tsalo 25944e7
Actually, the individual formats are really objects.
tsalo 3189a88
Add patterns for rrid and time.
tsalo a2e56b0
Be more specific with RRIDs.
tsalo 3a2eac3
Remove anchors around patterns and add quotes.
tsalo f9c316c
Replace double-quotes with single-quotes.
tsalo 998742a
Merge branch 'master' into format-rules
tsalo cc502c5
Draft a test for formats.
tsalo 30e7358
Add names for tests. They probably won't ever be used.
tsalo fa6256c
Add more valid pattern tests.
tsalo b57d891
Fix mistake in date/datetime patterns.
tsalo a9abc49
Make time part of datetimes more restrictive.
tsalo 755285a
Make datetimes more robust.
tsalo b92aa0f
Add more bad-pattern checks.
tsalo 9e2b6cf
Apply suggestions from code review
tsalo 245e149
Merge branch 'format-rules' of https://github.com/tsalo/bids-specific…
tsalo e5e3563
Test the new pattern changes.
tsalo 9284172
Merge branch 'master' into format-rules
tsalo aea7982
Apply suggestions from code review
tsalo 84061cc
Reorder formats alphabetically within each section.
tsalo df0dd34
Run black on test and improve check.
tsalo 4a0ad3f
Comment out failing pattern, but add a note about it.
tsalo File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,88 @@ | ||
--- | ||
# This file defines valid patterns for different formats | ||
# Entity patterns | ||
label: | ||
description: | | ||
Freeform labels without special characters. | ||
pattern: ^[0-9a-zA-Z]+$ | ||
index: | ||
description: | | ||
Non-negative, non-zero integers, optionally prefixed with leading zeros for sortability. | ||
An index may not be all zeros. | ||
pattern: ^[0-9]*[1-9]+[0-9]*$ | ||
# Metadata types | ||
string: | ||
description: | | ||
The basic string type (not a specific format). | ||
This should allow any free-form string *except* "n/a". | ||
pattern: ^(?!(n/a)$).*$ | ||
integer: | ||
description: | | ||
An integer which may be positive or negative. | ||
pattern: ^[+-]?\d+$ | ||
number: | ||
description: | | ||
A number which may be an integer or float, positive or negative. | ||
pattern: ^[+-]?([0-9]+([.][0-9]*)?|[.][0-9]+)$ | ||
boolean: | ||
description: | | ||
A boolean. | ||
Must be either "true" or "false". | ||
pattern: ^(true|false)$ | ||
# String formats | ||
date: | ||
tsalo marked this conversation as resolved.
Show resolved
Hide resolved
|
||
description: | | ||
A date in the form YYYY-MM-DD[Z], | ||
where [Z] is an optional, valid timezone code. | ||
pattern: ^[0-9]{4}-[0-9]{2}-[0-9]{2}?([A-Z]{3,4})$ | ||
datetime: | ||
description: | | ||
A datetime in the form YYYY-MM-DDThh:mm:ss[.000000][Z], | ||
where [.000000] is an optional subsecond resolution between 1 and 6 decimal points, | ||
and [Z] is an optional, valid timezone code. | ||
pattern: ^[0-9]{4}-[0-9]{2}-[0-9]{2}T[0-9]{2}:[0-9]{2}:[0-9]{2}?(\.[0-9]{1,6})?([A-Z]{3,4})$ | ||
unit: | ||
description: | | ||
A unit. | ||
Currently this matches any string that isn't "n/a". | ||
|
||
TODO: Somehow reference the actual unit options in the Units appendix. | ||
pattern: ^(?!(n/a)$).*$ | ||
stimuli_relative: | ||
description: | | ||
A path to a stimulus file, relative to a `/stimuli` folder somewhere. | ||
|
||
The validation for this format is minimal. | ||
It simply ensures that the value is a string with any characters that may appear in a valid path, | ||
without starting with "/" (an absolute path) or "stimuli/" | ||
(a relative path starting with the stimuli folder, rather than relative to that folder). | ||
pattern: ^(?!/)(?!stimuli/)[0-9a-zA-Z/\_\-\.]+$ | ||
dataset_relative: | ||
description: | | ||
A path to a file, relative to the dataset folder. | ||
|
||
The validation for this format is minimal. | ||
It simply ensures that the value is a string with any characters that may appear in a valid path, | ||
without starting with "/" (an absolute path). | ||
pattern: ^(?!/)[0-9a-zA-Z/\_\-\.]+$ | ||
participant_relative: | ||
description: | | ||
A path to a file, relative to the participant's folder in the dataset. | ||
|
||
The validation for this format is minimal. | ||
It simply ensures that the value is a string with any characters that may appear in a valid path, | ||
without starting with "/" (an absolute path) or "sub/" | ||
(a relative path starting with the participant folder, rather than relative to that folder). | ||
pattern: ^(?!/)(?!sub-)[0-9a-zA-Z/\_\-\.]+$ | ||
uri: | ||
description: | | ||
A uniform resource indicator. | ||
pattern: ^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))? | ||
bids_uri: | ||
description: | | ||
A BIDS uniform resource indicator. | ||
|
||
The validation for this format is minimal. | ||
It simply ensures that the value is a string with any characters that may appear in a valid URI, | ||
starting with "bids:". | ||
pattern: ^bids:[0-9a-zA-Z/#:\?\_\-\.]+$ |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is related to #876
I am not sure whether disallowing
n/a
as a string for free-form text fields would break some datasets. For example for the requiredEEGReference
field, users could currently passn/a
according to bids-validator:https://github.com/bids-standard/bids-validator/blob/75c54e4f7a17bf58c7cfc47c9698eb4e0841315b/bids-validator/validators/json/schemas/eeg.json#L25
Whether or not that's a good idea should be discussed in #876, this is just a note that what you introduce here has not been discussed and has not been formally part of the spec 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My hope is that any cases where "n/a" is acceptable will have "n/a" as an explicit option within the schema. I've tried to add it as an option in all of the cases that I've seen like that, but that doesn't mean we're close to having complete coverage. I can drop any fancy "n/a" handling from this PR if you'd rather wait to tackle it within #876 though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've dropped the special "n/a" handling in dfd3057, but I do think we will want to distinguish between "n/a"s and freeform strings in the long run.