-
Notifications
You must be signed in to change notification settings - Fork 170
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Convert entity table to machine-readable format #466
Comments
I would like to second this as someone largely outside the ecosystem. I wanted to make a program which could generate filenames programmatically since we have a big, complicated study, but without a file like this it's hard to build tooling to that effect IMO. |
|
|
What about something like the following? I've formatted it as yaml because that was easier to write freehand into a file, but could easily switch to json for the real thing. The suffices are organized into groups like the entity table to keep it reasonably short, but I could drop the groups and make each suffix a dictionary under the datatypes. entities:
sub:
description: Subject
format: label
ses:
description: Session
format: label
task:
description: Task
format: label
acq:
description: Acquisition
format: label
ce:
description: Contrast Enhancing Agent
format: label
rec:
description: Reconstruction
format: label
dir:
description: Phase-Encoding Direction
format: label
run:
description: Run
format: index
mod:
description: Corresponding Modality
format: label
echo:
description: Echo
format: index
recording:
description: Recording
format: label
proc:
description: Processed (on device)
format: label
space:
description: Space
format: label
datatypes:
anat:
group1:
suffices:
- T1w
- T2w
- T1rho
- T1map
- T2map
- T2star
- FLAIR
- FLASH
- PD
- PDmap
- PDT2
- inplaneT1
- inplaneT2
- angio
extensions:
- nii.gz
- nii
- json
entities:
sub: required
ses: optional
acq: optional
ce: optional
rec: optional
group2:
suffices:
- defacemask
extensions:
- nii.gz
- nii
- json
entities:
sub: required
ses: optional
acq: optional
ce: optional
rec: optional
mod: optional
func:
group1:
suffices:
- bold
- cbv
- phase
- sbref
extensions:
- nii.gz
- nii
- json
entities:
sub: required
ses: optional
task: required
acq: optional
ce: optional
rec: optional
dir: optional
run: optional
echo: optional |
@tsalo -- this looks beautiful to me! @jbteves : I do agree that consistency which could be achieved by using .json is indeed a benefit. But IMHO YAML is so much nicer and human friendly that I simply can't resist it. It also got a feature of XXI century -- support for Here is a json view of the above yaml for comparison -- although not too bad yet but as it grows I would find it more and more easy to orient in yaml than json and all the clutter from everything in "" really makes it less readable to me{
"entities": {
"task": {
"description": "Task",
"format": "label"
},
"ses": {
"description": "Session",
"format": "label"
},
"sub": {
"description": "Subject",
"format": "label"
},
"space": {
"description": "Space",
"format": "label"
},
"ce": {
"description": "Contrast Enhancing Agent",
"format": "label"
},
"echo": {
"description": "Echo",
"format": "index"
},
"recording": {
"description": "Recording",
"format": "label"
},
"acq": {
"description": "Acquisition",
"format": "label"
},
"rec": {
"description": "Reconstruction",
"format": "label"
},
"run": {
"description": "Run",
"format": "index"
},
"proc": {
"description": "Processed (on device)",
"format": "label"
},
"dir": {
"description": "Phase-Encoding Direction",
"format": "label"
},
"mod": {
"description": "Corresponding Modality",
"format": "label"
}
},
"datatypes": {
"anat": {
"group1": {
"suffices": [
"T1w",
"T2w",
"T1rho",
"T1map",
"T2map",
"T2star",
"FLAIR",
"FLASH",
"PD",
"PDmap",
"PDT2",
"inplaneT1",
"inplaneT2",
"angio"
],
"extensions": [
"nii.gz",
"nii",
"json"
],
"entities": {
"rec": "optional",
"acq": "optional",
"ses": "optional",
"sub": "required",
"ce": "optional"
}
},
"group2": {
"suffices": [
"defacemask"
],
"extensions": [
"nii.gz",
"nii",
"json"
],
"entities": {
"acq": "optional",
"ses": "optional",
"sub": "required",
"rec": "optional",
"ce": "optional",
"mod": "optional"
}
}
},
"func": {
"group1": {
"suffices": [
"bold",
"cbv",
"phase",
"sbref"
],
"extensions": [
"nii.gz",
"nii",
"json"
],
"entities": {
"task": "required",
"ses": "optional",
"sub": "required",
"ce": "optional",
"echo": "optional",
"acq": "optional",
"rec": "optional",
"run": "optional",
"dir": "optional"
}
}
}
}
}
|
Unrelated to this issue, just wanted to share 1c of no value here: some not really widely known fact is that YAML 2.0 is a superset of json (any JSON is also a valid YAML). I.e. if at some point we decide "let's prepare for migration to YAML", conversion of .json into .yaml could be as easy as |
Sorry for spamming... but I am just too excited! Such spec could then be used to produce almost if not all term tables we have. It could be used to produce target filename patterns. We could even manage to programmatically validate example filenames! It would reduce duplication and thus possible errors. Validators could avoid hardcoring and there would be no need to change validator upon addition of a term, entity, etc - it would make it possible to make validator to validate against specific version of bids, not just the latest! |
Thanks for throwing in some ideas to improve the entity table @tsalo -> these are some related issues: #289 #290 re: the current proposal
|
@yarikoptic I was thinking the same thing! The versioning aspect will be awesome! @sappelhoff I agree that the file will end up being prohibitively long in its current form. What about splitting the files into the following:
I was also a little stuck on how the json/yaml file would be rendered as a table on the site. Will whatever rendering function is used need to be in a specific language? |
Re length: We can partition at the top level into separate files. Unfortunately yaml as json didn't have native include mechanism, but solutions exist trying avoid doing it ourselves: https://stackoverflow.com/questions/528281/how-can-i-include-a-yaml-file-inside-another . Similar approach is taken by nwb standard, see https://github.com/NeurodataWithoutBorders/nwb-schema/blob/24fba6174ddbad171ee5bb824edfa31f86b1b16d/core/nwb.namespace.yaml which defines includes for different modalities. I am yet not sure if we want to partition by modality, I feel that we might better partition by concept/structure: entities, datatypes, terms, ... as prototyped by @tsalo. |
And then partition per datatypes (modality!) ;-) |
@tsalo we will not render this structure directly. We code helper tool to render from it all the .md tables etc to include into spec upon compilation. edit 1: we could use something like https://pypi.org/project/tabulate/ to prepare such tables. |
To not derail discussion here but to outline possible mechanism for establishing historical versions of schema etc suitable for reuse by bids-aware tools, I have initiated https://github.com/bids-standard/bids-schema -- see it README.md and welcome to initiate issues (probably there is nothing really to be contributed in PRs until we get a schema going here) with questions/suggestions/notes. |
I started working on the files in tsalo/bids-specification@ref/json-entity. The datatypes are split up in the @yarikoptic If we'll be using a Python script to handle the rendering then that alleviates my concerns. Thanks! Regarding releases, I had assumed that we'd use the releases in the specification repository, but since the specification for the yaml/json files will probably change, it only makes sense to backup the schemas elsewhere and allow maintainers to adjust them as needed. |
Yeap, that is the purpose of that bids-schema. Also for it to be more lightweight and not carry all the bids-specification history/images etc so it could be included in tools distribution where desired... That thought triggered need to file bids-standard/bids-schema#1 ;-) |
Re your branch - please place all of the produced yamls into a dedicated folder (eg Edit: I think it will be useful beyond appendices, so I would have placed it on top level in the hierarchy. |
Done! |
Awesome! If it was a PR here I could try on entity take generation/ embedding script (unless you just do it) ;-) |
I just opened #475 as a draft PR. |
I'm not sure if it's feasible, but it would be nice if the entity table was stored as a json file, in order to make it both programmatically accessible and centralized. I know that there is an equivalent file in bids-validator and pybids, but if the filename construction rules were centralized under the actual specification, then it would be much easier to update the specification across the ecosystem without having to update a range of other packages as well.
The text was updated successfully, but these errors were encountered: