Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consolidate schema files into a single file for each term type #877

Closed
tsalo opened this issue Sep 13, 2021 · 9 comments · Fixed by #883
Closed

Consolidate schema files into a single file for each term type #877

tsalo opened this issue Sep 13, 2021 · 9 comments · Fixed by #883
Labels
discussion ongoing discussion schema Issues related to the YAML schema representation of the specification. Patch version release.

Comments

@tsalo
Copy link
Member

tsalo commented Sep 13, 2021

Multiple people have raised concerns with having the schema definitions separated into one file per term. I found that structure easier to work with as I started adding terms to the schema, but that doesn't mean we have to stick with it in the long run. One proposal would be to consolidate all terms for each term type (metadata, suffix, datatype) into their own file.

I'd like to know how people feel about this.

Here are some potential blockers that came to mind:

  • We currently use the filenames of metadata YAML files to support different definitions of the same fields (see EchoTime.yaml and EchoTime_fmap.yaml for an example). If all fields are in the same file, then we'll need some other way to support this behavior.
  • Inherited fields are stored in files starting with an underscore (for example, _EEGCoordSys.yaml). It's probably not a problem when placing them in a single file, but I want to make sure everything still works that way.
  • datatypes files have a very different structure than the metadata, suffixes, or columns (see [SCHEMA] Add TSV column files #827) files.
    • The closest I have to a solution for this is in [SCHEMA] Separate schema object definitions from rules #834, where I move most of the datatypes information to a new "rules" folder, and create new datatypes object files with just the names and definitions. We need to separate object definitions from rules anyway.

Some potential benefits:

  • Fewer files to manage. Though it's the same amount of information of course.
  • The schema loading code could be simplified if we don't need to walk through folders to load all files.

Tagging some folks who have expressed opinions on this or who I think might have an opinion: @erdalkaraca, @effigies, @yarikoptic, @CPernet, @Remi-Gau, and @Tokazama. Apologies to anyone I might have missed.

@tsalo tsalo added discussion ongoing discussion schema Issues related to the YAML schema representation of the specification. Patch version release. labels Sep 13, 2021
@erdalkaraca
Copy link
Collaborator

I would just reduce the amount of IO operations, especially when a file system is involved. But that may not be a blocker given that those file can be read once and hold in memory.

It seems that the file names hold additional information. Some other way to support that behavior would be to have an additional field in the object definition to denote the context where the fields are valid.
So, for EchoTime this could look like:

name: EchoTime
# omit context to denote global validity
context: fmap
description: |
  The time (in seconds) when the echo corresponding to this map was acquired.
type: number
unit: s
exclusiveMinimum: 0 

That additional 'context' field in the object definition could also contain arbitrary complex expressions which would not be allowed in a file name.

@yarikoptic
Copy link
Collaborator

well -- it is BIDS, so yes - files hold additional information ;) on on a serious note re

Multiple people have raised concerns ...

what are concerns - are they pragmatically relevant ? (e.g. is there some kind of indication that loading multiple files pragmatically impairs IO and can't otherwise be easily resolved to gain some speed up -- e.g. parallel threaded operation across files which is trivial to implement, but the question either it is really needed)

@Tokazama
Copy link
Member

Speed is not the issue because if I'm generating code from it I'd only do that once every time something changed and if I was creating a dictionary to look things up in I would reformat it into a bit format that reads in more quickly.

My primary concern is that a filesystem is not the best way to formalize the standard into a machine readable format. I would have to do some github wizardry to selectively look at the schema files instead of this entire repository. This doesn't mean that the many files approach is impossible to work with.

@tsalo
Copy link
Member Author

tsalo commented Sep 14, 2021

I incorporated the change into #834, so folks can see what the files would probably look like here.

@erdalkaraca
Copy link
Collaborator

I incorporated the change into #834, so folks can see what the files would probably look like here.

I realized I can now more efficiently navigate through the data/files, though for a machine program effectively the change is marginal as you already mentioned.

@erdalkaraca
Copy link
Collaborator

The only thing now still missing is something that glues all of the object definitions together to get a "coherent schema" (instead of the loosely coupled files).

@Tokazama
Copy link
Member

Tokazama commented Sep 14, 2021

I incorporated the change into #834, so folks can see what the files would probably look like here.

Personally I find this a lot more manageable.

@tsalo
Copy link
Member Author

tsalo commented Sep 18, 2021

Okay, it sounds like there are few real drawbacks or benefits, but since single files will be easier for folks to deal with I will open a PR that consolidates them. I want to make sure that this doesn't cause problems for anyone using the schema files, so it would be great if we could tag anyone who might need to change any code that depends on them.

@erdalkaraca this won't include the "glue" definitions you're talking about, and per our call the other day I think the YAMALE files will require their own issue.

EDIT: I've opened #883.

@erdalkaraca
Copy link
Collaborator

@tsalo we can discuss about the "glue" definitions (coherent schema) in issue #884

@tsalo tsalo closed this as completed in #883 Oct 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion ongoing discussion schema Issues related to the YAML schema representation of the specification. Patch version release.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants