-
Notifications
You must be signed in to change notification settings - Fork 169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH] Added the specification for using HED libraries in BIDS #1106
Conversation
Codecov Report
@@ Coverage Diff @@
## master #1106 +/- ##
=======================================
Coverage 88.33% 88.33%
=======================================
Files 6 6
Lines 1020 1020
=======================================
Hits 901 901
Misses 119 119 Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
Thank you @VisLab, looks good to me! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
was just curious to review , left some questions/feedback
src/99-appendices/03-hed.md
Outdated
"HEDVersion": { | ||
"base": "8.1.0", | ||
"libraries": { | ||
"sc": "score_0.0.1", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this some "official" identity specification combining library and a version? if not, may be worth separating out, even if more verbose:
"sc": "score_0.0.1", | |
"sc": {"library": "score", "version": "0.0.1"}, |
or alike to make it explicit, and to allow in the future e.g. to expand with optional extra information (e.g. URL to the HED library which is not in the library of libraries i.e. hed-schema-library) but available from another library of libraries or may be even directly pointed to by that URL?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some version of what you suggest was the original plan (though URIs were never planned to be supported other than bids:
schema paths within the dataset), but it was simplified during development after feedback from the HED community. @VisLab can provide more detail, which is escaping me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The suggestion is above is one that we have considered (see hed-specification issue#156). The actual format is only proposed at this point in the HED specification, and we would appreciate opinions and are open to changes.
We have gone through a number of versions of this, and originally we thought we would support a file within the bids data set itself and arbitrary URLs. The HED working group consensus was to start with the standard libraries and see how it goes, since the purpose is to have standardized vocabularies to make meaningful comparisons across datasets.
On a related note:
The hed-python
tools allow a schema group to be passed into its BIDS data set constructor to override the specification in dataset_description. Right now the hed-javascript
public interface (which is what BIDS uses) only passes in the data set and constructs the schema group internally.
I am mentioning this because another possibility is to allow a schema group to be passed in as part of the public interface to the hed-javascript
validator (from bids) which would override the internal specification extracted from the dataset_description
. This is relevant because we are going to have to do a major version bump to support the libraries because of another interface change and we could add this in as an option and allow BIDS to decide. @sappelhoff. @rwblair?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The specification of which schemas are to be used to be used for validating a dataset are provided in a separate parameter, since hed-javascript
cannot handle relative paths within the BIDS dataset directory and needs them parsed to use the absolute path. However, the actual Schema
and Schemas
types used by the validator to represent schemas are considered implementation details (despite being returned by the validator.buildSchema
function) and not part of the stable API. In theory, the object passed as the second argument to validateBidsDataset
can be any object conforming to the defined API, regardless of what datasetDescription.json
says.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@VisLab if you want to move on with this PR towards a merge, please let us know and remove the "draft" status.
I would like to hold off for another week to allow more internal discussion. In working on the example use cases using the score library, it has become apparent that many users will just want to use the score library when they are doing neurological annotation and not use the base schema at all. We have the option of allowing the following so that users wouldn't have to prefix the score tags: "HEDVersion": {
"base": "score_0.0.1",
"libraries": {
"bs": "8.0.0"
}
} or also: "HEDVersion": "score_0.0.1" @tpatpa @dorahermes comments? |
@tpatpa @happy5214 @dorahermes @sappelhoff @dungscout96 @tsalo @yarikoptic |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @VisLab for a great discussion!
After spending additional time putting together the bids example dataset (still WIP, available here) and working with the HED validation tools I feel the use of a string would be easier altogether.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Am I correct in assuming that underscores _
are forbidden characters when naming an HED library? Or do you document somewhere how to unambiguously get library name and library version from the suggested format <library-name>_<library-version>
.
Relatedly: Are :
characters also forbidden at the start of an HED library? Else I think the situation needs to be explicitly clarified when something like this happens: People having a library called :myThing
and then want to do "HEDVersion": ["8.0.0", "abbr::myThing"]
(note the double :
)
Overall I like the proposed format of specifying which HED library is being used 👍 it looks much simpler than before and still encoding the relevant info. We only need to take care that the "separator characters" (_
and :
, apparently) are always unambiguous.
One question regarding that: Do you prescribe a versioning scheme for all "official" HED libraries? For example, MUST they always use semantic versioning?
src/schema/objects/metadata.yaml
Outdated
- type: string | ||
- type: array | ||
items: | ||
type: string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might be beneficial to think about what validation we can build into the schema here apart from "string". E.g., checking for permissible versioning formats, occurrence of :
and _
characters, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is a good idea if someone knows how to do it that would be great.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Try adding this behemoth regex to src/schema/objects/formats.yaml
(the first two groups are for the library nickname and library name, and the rest was adapted from https://semver.org/#is-there-a-suggested-regular-expression-regex-to-check-a-semver-string):
^(?:[a-zA-Z]+:)?(?:[a-zA-Z]+_)?(?:0|[1-9]\d*)\.(?:0|[1-9]\d*)\.(?:0|[1-9]\d*)(?:-(?:(?:0|[1-9]\d*|\d*[a-zA-Z-][0-9a-zA-Z-]*)(?:\.(?:0|[1-9]\d*|\d*[a-zA-Z-][0-9a-zA-Z-]*))*))?(?:\+(?:[0-9a-zA-Z-]+(?:\.[0-9a-zA-Z-]+)*))?$
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @happy5214 what's your opinion on adding this regex for validation? Will it cause more pain than it's worth, or should we do it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was pinged, but I don't know if the question was directed toward me or if you were just thanking me. The tail end of the regex could be simplified to the following if we were to ban the use of pre-release schema versions:
^(?:[a-zA-Z]+:)?(?:[a-zA-Z]+_)?(?:0|[1-9]\d*)\.(?:0|[1-9]\d*)\.(?:0|[1-9]\d*)$
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
both to thank you and to get your opinion, because you called it a "behemoth" and I wasn't sure whether that means you advise against adding this. :-)
Meanwhile this discussion progresses in the main thread
Co-authored-by: Stefan Appelhoff <[email protected]>
Co-authored-by: Stefan Appelhoff <[email protected]>
great! lgtm |
- dataset_description.json fixed and waiting for [bids-specification pull request](bids-standard/bids-specification#1106) - units added to _channels.tsv for TUH subjects - Files truncated via [terminal](https://github.com/bids-standard/bids-examples/blob/master/CONTRIBUTING.md#why-do-we-only-host-truncated-data-with-0kb-size)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems to me the only big discussion that is left open is on how to define libraries (#1106 (comment)) -- currently proposed form is <library-name<_>library-version>
.
If in the meantime the HED team has converged and made a decision, we can resolve that discussion and proceed with this PR.
We'll also need to adjust the validator ... currently the score example is failing CI because of this:
src/schema/objects/metadata.yaml
Outdated
- type: string | ||
- type: array | ||
items: | ||
type: string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @happy5214 what's your opinion on adding this regex for validation? Will it cause more pain than it's worth, or should we do it?
This version is correct and has been agreed upon by the Hed Working Group. The bids-validator part isn't quite ready yet. I'll try to work on a PR for the BIDS validator side, which just needs a small change. The modifications needed for the HED side aren't quite there yet, but we can probably write a by-pass temporarily. (We recommend that users check their validation using the Python validator until we get it completed.) @happy5214 As for a link to a library: the SCORE library official 1.0.0 was finished and out for community comment. @tpatpa @dorahermes need to give final approval and then we can release it and provide the stable link. (We know what the link will be but once we put the schema there, tools will start using it.) It would be nice if this made it into the new release. What time frame do we have for finishing the validation side? |
I think we should wait on the validation of the However, longer term we would support BIDS URI's, the PR of which I notice is also moving along. |
The 1.8 release will probably come mid September. It'd be great to have the new hed features in by then! |
I believe BIDS has the primary responsibility to validate |
We have now released hed-validator 3.8.0 on npm supporting HED library schema. Could we move this PR forward now that it has supporting PRs bids-standard/bids-validator#1496 and bids-standard/bids-examples#332? I think all three PRs are ready to go. @sappelhoff @Remi-Gau @rwblair |
I think the only open question is whether or not we want to add the regexp suggested by @happy5214 to the BIDS schema validation. If you have strict rules about how versions for HED need to look like, then I think it'd be a good idea ... even if the regexp looks a bit unwieldy, as described in #1106 (comment) cc @VisLab |
I have added the |
Yes the RegEx is fine. I had a chance to test on our set of testcases. Thanks.... |
Thanks @VisLab et al. 🎉 |
Special thanks to @happy5214 |
I would prefer to defer adding this regular expression. We do use regular
expressions as part of the checking of schema for the hed-validator, but
they are not quite this regular expression. Let's put this on a todo list
for a future PR.
…On Mon, Aug 29, 2022 at 3:24 PM Stefan Appelhoff ***@***.***> wrote:
I think the only open question is whether or not we want to add the regexp
suggested by @happy5214 <https://github.com/happy5214> to the BIDS schema
validation.
If you have strict rules about how versions for HED need to look like,
then I think it'd be a good idea ... even if the regexp looks a bit
unwieldy, as described in #1106 (comment)
<#1106 (comment)>
cc @VisLab <https://github.com/VisLab>
—
Reply to this email directly, view it on GitHub
<#1106 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAJCJOXSK3DPY4Q6UTEYEQTV3UL7PANCNFSM5WXIN25A>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
- dataset_description.json fixed and waiting for [bids-specification pull request](bids-standard/bids-specification#1106) - units added to _channels.tsv for TUH subjects - Files truncated via [terminal](https://github.com/bids-standard/bids-examples/blob/master/CONTRIBUTING.md#why-do-we-only-host-truncated-data-with-0kb-size)
This is a preliminary request for change in the BIDS specification to accommodate HED libraries.
The first official HED library schema (SCORE library for clinical neurological annotation) is close to release.
We will prepare a PR for the
bids-validator
to validate datasets using HED libraries as well as an example datasets forbids-examples
when there is agreement on the format.We would appreciate reviews and comments: @sappelhoff @Remi-Gau @tsalo (the yaml could be improved--please help) @happy5214 @IanCa, @dorahermes @dungscout96 @smakeig, @tpatpa.