Consolidate schema files into a single file for each term type #877

tsalo · 2021-09-13T17:07:49Z

Multiple people have raised concerns with having the schema definitions separated into one file per term. I found that structure easier to work with as I started adding terms to the schema, but that doesn't mean we have to stick with it in the long run. One proposal would be to consolidate all terms for each term type (metadata, suffix, datatype) into their own file.

I'd like to know how people feel about this.

Here are some potential blockers that came to mind:

We currently use the filenames of metadata YAML files to support different definitions of the same fields (see EchoTime.yaml and EchoTime_fmap.yaml for an example). If all fields are in the same file, then we'll need some other way to support this behavior.
Inherited fields are stored in files starting with an underscore (for example, _EEGCoordSys.yaml). It's probably not a problem when placing them in a single file, but I want to make sure everything still works that way.
datatypes files have a very different structure than the metadata, suffixes, or columns (see [SCHEMA] Add TSV column files #827) files.
- The closest I have to a solution for this is in [SCHEMA] Separate schema object definitions from rules #834, where I move most of the datatypes information to a new "rules" folder, and create new datatypes object files with just the names and definitions. We need to separate object definitions from rules anyway.

Some potential benefits:

Fewer files to manage. Though it's the same amount of information of course.
The schema loading code could be simplified if we don't need to walk through folders to load all files.

Tagging some folks who have expressed opinions on this or who I think might have an opinion: @erdalkaraca, @effigies, @yarikoptic, @CPernet, @Remi-Gau, and @Tokazama. Apologies to anyone I might have missed.

The text was updated successfully, but these errors were encountered:

erdalkaraca · 2021-09-13T17:26:37Z

I would just reduce the amount of IO operations, especially when a file system is involved. But that may not be a blocker given that those file can be read once and hold in memory.

It seems that the file names hold additional information. Some other way to support that behavior would be to have an additional field in the object definition to denote the context where the fields are valid.
So, for EchoTime this could look like:

name: EchoTime
# omit context to denote global validity
context: fmap
description: |
  The time (in seconds) when the echo corresponding to this map was acquired.
type: number
unit: s
exclusiveMinimum: 0

That additional 'context' field in the object definition could also contain arbitrary complex expressions which would not be allowed in a file name.

yarikoptic · 2021-09-13T18:22:17Z

well -- it is BIDS, so yes - files hold additional information ;) on on a serious note re

Multiple people have raised concerns ...

what are concerns - are they pragmatically relevant ? (e.g. is there some kind of indication that loading multiple files pragmatically impairs IO and can't otherwise be easily resolved to gain some speed up -- e.g. parallel threaded operation across files which is trivial to implement, but the question either it is really needed)

Tokazama · 2021-09-13T18:45:22Z

Speed is not the issue because if I'm generating code from it I'd only do that once every time something changed and if I was creating a dictionary to look things up in I would reformat it into a bit format that reads in more quickly.

My primary concern is that a filesystem is not the best way to formalize the standard into a machine readable format. I would have to do some github wizardry to selectively look at the schema files instead of this entire repository. This doesn't mean that the many files approach is impossible to work with.

tsalo · 2021-09-14T16:23:49Z

I incorporated the change into #834, so folks can see what the files would probably look like here.

erdalkaraca · 2021-09-14T16:47:19Z

I incorporated the change into #834, so folks can see what the files would probably look like here.

I realized I can now more efficiently navigate through the data/files, though for a machine program effectively the change is marginal as you already mentioned.

erdalkaraca · 2021-09-14T17:14:07Z

The only thing now still missing is something that glues all of the object definitions together to get a "coherent schema" (instead of the loosely coupled files).

Tokazama · 2021-09-14T17:37:00Z

I incorporated the change into #834, so folks can see what the files would probably look like here.

Personally I find this a lot more manageable.

tsalo · 2021-09-18T21:34:47Z

Okay, it sounds like there are few real drawbacks or benefits, but since single files will be easier for folks to deal with I will open a PR that consolidates them. I want to make sure that this doesn't cause problems for anyone using the schema files, so it would be great if we could tag anyone who might need to change any code that depends on them.

@erdalkaraca this won't include the "glue" definitions you're talking about, and per our call the other day I think the YAMALE files will require their own issue.

EDIT: I've opened #883.

erdalkaraca · 2021-09-19T09:27:09Z

@tsalo we can discuss about the "glue" definitions (coherent schema) in issue #884

tsalo added discussion ongoing discussion schema Issues related to the YAML schema representation of the specification. Patch version release. labels Sep 13, 2021

tsalo mentioned this issue Sep 18, 2021

[SCHEMA] Consolidate schema files by term type #883

Merged

tsalo closed this as completed in #883 Oct 5, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consolidate schema files into a single file for each term type #877

Consolidate schema files into a single file for each term type #877

tsalo commented Sep 13, 2021

erdalkaraca commented Sep 13, 2021

yarikoptic commented Sep 13, 2021

Tokazama commented Sep 13, 2021

tsalo commented Sep 14, 2021

erdalkaraca commented Sep 14, 2021

erdalkaraca commented Sep 14, 2021

Tokazama commented Sep 14, 2021 •

edited

Loading

tsalo commented Sep 18, 2021 •

edited

Loading

erdalkaraca commented Sep 19, 2021

Consolidate schema files into a single file for each term type #877

Consolidate schema files into a single file for each term type #877

Comments

tsalo commented Sep 13, 2021

erdalkaraca commented Sep 13, 2021

yarikoptic commented Sep 13, 2021

Tokazama commented Sep 13, 2021

tsalo commented Sep 14, 2021

erdalkaraca commented Sep 14, 2021

erdalkaraca commented Sep 14, 2021

Tokazama commented Sep 14, 2021 • edited Loading

tsalo commented Sep 18, 2021 • edited Loading

erdalkaraca commented Sep 19, 2021

Tokazama commented Sep 14, 2021 •

edited

Loading

tsalo commented Sep 18, 2021 •

edited

Loading