Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] Added the specification for using HED libraries in BIDS #1106

Merged
merged 18 commits into from
Aug 31, 2022
Merged

[ENH] Added the specification for using HED libraries in BIDS #1106

merged 18 commits into from
Aug 31, 2022

Conversation

VisLab
Copy link
Member

@VisLab VisLab commented May 23, 2022

This is a preliminary request for change in the BIDS specification to accommodate HED libraries.

The first official HED library schema (SCORE library for clinical neurological annotation) is close to release.

We will prepare a PR for the bids-validator to validate datasets using HED libraries as well as an example datasets for bids-examples when there is agreement on the format.

We would appreciate reviews and comments: @sappelhoff @Remi-Gau @tsalo (the yaml could be improved--please help) @happy5214 @IanCa, @dorahermes @dungscout96 @smakeig, @tpatpa.

@codecov
Copy link

codecov bot commented May 23, 2022

Codecov Report

Merging #1106 (6101459) into master (b3afa2a) will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master    #1106   +/-   ##
=======================================
  Coverage   88.33%   88.33%           
=======================================
  Files           6        6           
  Lines        1020     1020           
=======================================
  Hits          901      901           
  Misses        119      119           

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@tpatpa
Copy link
Contributor

tpatpa commented May 24, 2022

Thank you @VisLab, looks good to me!

Copy link
Collaborator

@yarikoptic yarikoptic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was just curious to review , left some questions/feedback

src/99-appendices/03-hed.md Show resolved Hide resolved
"HEDVersion": {
"base": "8.1.0",
"libraries": {
"sc": "score_0.0.1",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this some "official" identity specification combining library and a version? if not, may be worth separating out, even if more verbose:

Suggested change
"sc": "score_0.0.1",
"sc": {"library": "score", "version": "0.0.1"},

or alike to make it explicit, and to allow in the future e.g. to expand with optional extra information (e.g. URL to the HED library which is not in the library of libraries i.e. hed-schema-library) but available from another library of libraries or may be even directly pointed to by that URL?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some version of what you suggest was the original plan (though URIs were never planned to be supported other than bids: schema paths within the dataset), but it was simplified during development after feedback from the HED community. @VisLab can provide more detail, which is escaping me.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The suggestion is above is one that we have considered (see hed-specification issue#156). The actual format is only proposed at this point in the HED specification, and we would appreciate opinions and are open to changes.

We have gone through a number of versions of this, and originally we thought we would support a file within the bids data set itself and arbitrary URLs. The HED working group consensus was to start with the standard libraries and see how it goes, since the purpose is to have standardized vocabularies to make meaningful comparisons across datasets.

On a related note:
The hed-python tools allow a schema group to be passed into its BIDS data set constructor to override the specification in dataset_description. Right now the hed-javascript public interface (which is what BIDS uses) only passes in the data set and constructs the schema group internally.

I am mentioning this because another possibility is to allow a schema group to be passed in as part of the public interface to the hed-javascript validator (from bids) which would override the internal specification extracted from the dataset_description. This is relevant because we are going to have to do a major version bump to support the libraries because of another interface change and we could add this in as an option and allow BIDS to decide. @sappelhoff. @rwblair?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The specification of which schemas are to be used to be used for validating a dataset are provided in a separate parameter, since hed-javascript cannot handle relative paths within the BIDS dataset directory and needs them parsed to use the absolute path. However, the actual Schema and Schemas types used by the validator to represent schemas are considered implementation details (despite being returned by the validator.buildSchema function) and not part of the stable API. In theory, the object passed as the second argument to validateBidsDataset can be any object conforming to the defined API, regardless of what datasetDescription.json says.

src/99-appendices/03-hed.md Outdated Show resolved Hide resolved
src/schema/objects/metadata.yaml Outdated Show resolved Hide resolved
src/schema/rules/tabular_metadata.yaml Outdated Show resolved Hide resolved
Copy link
Member

@sappelhoff sappelhoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@VisLab if you want to move on with this PR towards a merge, please let us know and remove the "draft" status.

@VisLab
Copy link
Member Author

VisLab commented Jun 7, 2022

I would like to hold off for another week to allow more internal discussion.

In working on the example use cases using the score library, it has become apparent that many users will just want to use the score library when they are doing neurological annotation and not use the base schema at all.

We have the option of allowing the following so that users wouldn't have to prefix the score tags:

"HEDVersion": {
      "base": "score_0.0.1",
      "libraries": {
          "bs": "8.0.0"
       }
}

or also:

"HEDVersion":  "score_0.0.1"

@tpatpa @dorahermes comments?

@VisLab VisLab marked this pull request as ready for review June 9, 2022 21:50
@VisLab
Copy link
Member Author

VisLab commented Jun 9, 2022

@tpatpa @happy5214 @dorahermes @sappelhoff @dungscout96 @tsalo @yarikoptic
Please review. After discussion.... the proposal has been simplified... for the better.

Copy link
Contributor

@tpatpa tpatpa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @VisLab for a great discussion!
After spending additional time putting together the bids example dataset (still WIP, available here) and working with the HED validation tools I feel the use of a string would be easier altogether.

Copy link
Member

@sappelhoff sappelhoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I correct in assuming that underscores _ are forbidden characters when naming an HED library? Or do you document somewhere how to unambiguously get library name and library version from the suggested format <library-name>_<library-version>.

Relatedly: Are : characters also forbidden at the start of an HED library? Else I think the situation needs to be explicitly clarified when something like this happens: People having a library called :myThing and then want to do "HEDVersion": ["8.0.0", "abbr::myThing"] (note the double :)

Overall I like the proposed format of specifying which HED library is being used 👍 it looks much simpler than before and still encoding the relevant info. We only need to take care that the "separator characters" (_ and :, apparently) are always unambiguous.

One question regarding that: Do you prescribe a versioning scheme for all "official" HED libraries? For example, MUST they always use semantic versioning?

src/99-appendices/03-hed.md Outdated Show resolved Hide resolved
src/99-appendices/03-hed.md Outdated Show resolved Hide resolved
- type: string
- type: array
items:
type: string
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be beneficial to think about what validation we can build into the schema here apart from "string". E.g., checking for permissible versioning formats, occurrence of : and _ characters, etc.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a good idea if someone knows how to do it that would be great.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try adding this behemoth regex to src/schema/objects/formats.yaml (the first two groups are for the library nickname and library name, and the rest was adapted from https://semver.org/#is-there-a-suggested-regular-expression-regex-to-check-a-semver-string):

^(?:[a-zA-Z]+:)?(?:[a-zA-Z]+_)?(?:0|[1-9]\d*)\.(?:0|[1-9]\d*)\.(?:0|[1-9]\d*)(?:-(?:(?:0|[1-9]\d*|\d*[a-zA-Z-][0-9a-zA-Z-]*)(?:\.(?:0|[1-9]\d*|\d*[a-zA-Z-][0-9a-zA-Z-]*))*))?(?:\+(?:[0-9a-zA-Z-]+(?:\.[0-9a-zA-Z-]+)*))?$

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @happy5214 what's your opinion on adding this regex for validation? Will it cause more pain than it's worth, or should we do it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was pinged, but I don't know if the question was directed toward me or if you were just thanking me. The tail end of the regex could be simplified to the following if we were to ban the use of pre-release schema versions:

^(?:[a-zA-Z]+:)?(?:[a-zA-Z]+_)?(?:0|[1-9]\d*)\.(?:0|[1-9]\d*)\.(?:0|[1-9]\d*)$

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

both to thank you and to get your opinion, because you called it a "behemoth" and I wasn't sure whether that means you advise against adding this. :-)

Meanwhile this discussion progresses in the main thread

VisLab and others added 2 commits June 11, 2022 10:39
@sappelhoff sappelhoff added this to the 1.7.1 milestone Jun 22, 2022
@dorahermes
Copy link
Member

great! lgtm

@sappelhoff sappelhoff modified the milestones: 1.7.1, 1.8.0 Jul 13, 2022
tpatpa added a commit to tpatpa/bids-examples that referenced this pull request Jul 13, 2022
 - dataset_description.json fixed and waiting for [bids-specification pull request](bids-standard/bids-specification#1106)
 - units added to _channels.tsv for TUH subjects
 - Files truncated via [terminal](https://github.com/bids-standard/bids-examples/blob/master/CONTRIBUTING.md#why-do-we-only-host-truncated-data-with-0kb-size)
Copy link
Member

@sappelhoff sappelhoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me the only big discussion that is left open is on how to define libraries (#1106 (comment)) -- currently proposed form is <library-name<_>library-version>.

If in the meantime the HED team has converged and made a decision, we can resolve that discussion and proceed with this PR.

We'll also need to adjust the validator ... currently the score example is failing CI because of this:

src/99-appendices/03-hed.md Outdated Show resolved Hide resolved
src/99-appendices/03-hed.md Show resolved Hide resolved
- type: string
- type: array
items:
type: string
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @happy5214 what's your opinion on adding this regex for validation? Will it cause more pain than it's worth, or should we do it?

@VisLab
Copy link
Member Author

VisLab commented Jul 22, 2022

This version is correct and has been agreed upon by the Hed Working Group. The bids-validator part isn't quite ready yet. I'll try to work on a PR for the BIDS validator side, which just needs a small change.

The modifications needed for the HED side aren't quite there yet, but we can probably write a by-pass temporarily. (We recommend that users check their validation using the Python validator until we get it completed.) @happy5214

As for a link to a library: the SCORE library official 1.0.0 was finished and out for community comment. @tpatpa @dorahermes need to give final approval and then we can release it and provide the stable link. (We know what the link will be but once we put the schema there, tools will start using it.)

It would be nice if this made it into the new release. What time frame do we have for finishing the validation side?

@VisLab
Copy link
Member Author

VisLab commented Jul 22, 2022

I think we should wait on the validation of the HEDVersion using a regular expression unless this is standard for BIDS and we have to do it. I would rather just pass the HEDVersion to the hed-validator and let it resolve the version. On the first pass, we would only support the straight version strings as shown in the example.

However, longer term we would support BIDS URI's, the PR of which I notice is also moving along.

@sappelhoff
Copy link
Member

The 1.8 release will probably come mid September. It'd be great to have the new hed features in by then!

@happy5214
Copy link
Contributor

I think we should wait on the validation of the HEDVersion using a regular expression unless this is standard for BIDS and we have to do it. I would rather just pass the HEDVersion to the hed-validator and let it resolve the version. On the first pass, we would only support the straight version strings as shown in the example.

I believe BIDS has the primary responsibility to validate HEDVersion, whether or not we do our own validation, especially if it's as simple as adding a regex to an already-implemented framework. We implement the functionality on our end as a matter of convenience and expedience, but in the end this is a format defined by BIDS, not hed-validator.

@VisLab
Copy link
Member Author

VisLab commented Aug 29, 2022

We have now released hed-validator 3.8.0 on npm supporting HED library schema. Could we move this PR forward now that it has supporting PRs bids-standard/bids-validator#1496 and bids-standard/bids-examples#332?

I think all three PRs are ready to go. @sappelhoff @Remi-Gau @rwblair

@sappelhoff
Copy link
Member

I think the only open question is whether or not we want to add the regexp suggested by @happy5214 to the BIDS schema validation.

If you have strict rules about how versions for HED need to look like, then I think it'd be a good idea ... even if the regexp looks a bit unwieldy, as described in #1106 (comment)

cc @VisLab

@sappelhoff
Copy link
Member

I have added the hed_version format as suggested by @happy5214. If you disagree, we can discuss removing it again -- but I think it's a good idea to validate HED versions early on, and not only in the HED validator.

@VisLab
Copy link
Member Author

VisLab commented Aug 30, 2022

Yes the RegEx is fine. I had a chance to test on our set of testcases. Thanks....

@sappelhoff sappelhoff merged commit da68bd9 into bids-standard:master Aug 31, 2022
@sappelhoff
Copy link
Member

Thanks @VisLab et al.

🎉

@VisLab
Copy link
Member Author

VisLab commented Sep 1, 2022

Special thanks to @happy5214

@VisLab
Copy link
Member Author

VisLab commented Oct 11, 2022 via email

sappelhoff pushed a commit to tpatpa/bids-examples that referenced this pull request Oct 27, 2023
 - dataset_description.json fixed and waiting for [bids-specification pull request](bids-standard/bids-specification#1106)
 - units added to _channels.tsv for TUH subjects
 - Files truncated via [terminal](https://github.com/bids-standard/bids-examples/blob/master/CONTRIBUTING.md#why-do-we-only-host-truncated-data-with-0kb-size)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants