Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SCHEMA: Reorganize file objects and rules #1288

Merged
merged 20 commits into from
Oct 10, 2022

Conversation

effigies
Copy link
Collaborator

This is a significant refactoring of the schema. I'm not sure if it justifies a minor version bump, since it will break things, but we haven't released a minor release since the last breakage.

Builds off of #1287, which was separated out from these drastic changes.

Summary:

  • objects.files
    • Combines old objects.top_level_files and objects.associated_data
    • Removes name field, adds file_type, which is either regular or directory
  • rules.files
    • Contains common, raw and deriv
    • rules.files.common
      • Adds level field, replacing required: true
      • rules.files.common.core combines rules.top_level_files and rules.associated_data, minus participants and samples
      • rules.files.common.tables combines rules.tabular_metadata and participants and samples
    • rules.files.raw
      • Mostly copies of rules.datatypes.*
      • Separates out associated data that spans more than one modality
        • rules.files.raw.channels - contains channels, coordsystem, electrodes (will contain optodes with nirs)
        • rules.files.raw.photo - contains photos
        • rules.files.raw.task - contains events and timeseries
    • rules.files.deriv
      • rules.files.deriv.preprocessed_data - was rules.datatypes.derivatives.common_derivatives, but removed a lot of derivatives of things where space wouldn't apply (such as events and photos). We can consider re-adding these with desc-, but I'm not sure how much these will ever occur. They're also not "data files" in the way that derivatives discusses, so it's not clear that they're specifically provided for, either.
      • rules.files.deriv.imaging - was rules.datatypes.derivatives.common_imaging_derivatives
        • This could probably be cleaned up further. Some of the things are actually supersets of what's in preprocessed_data. I've run out of energy in this effort, though, and it's already pretty big.

This will break tests. I'll get to them, but I wanted to get comments. @bids-standard/schema-users

@effigies effigies added schema-structure Changes to the fundamental organization/structure of the YAML schema. Minor version release. exclude-from-changelog This item will not feature in the automatically generated changelog labels Sep 16, 2022
@effigies effigies force-pushed the schema/file_reorg branch 3 times, most recently from 649c391 to 46e0755 Compare September 21, 2022 21:10
Copy link
Member

@tsalo tsalo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the changes.

Just going to make this note to refer back to whenever the file_type thing confuses me:

A Linux system, just like UNIX, makes no difference between a file and a directory, since a directory is just a file containing names of other files.

@codecov
Copy link

codecov bot commented Sep 25, 2022

Codecov Report

Base: 88.69% // Head: 88.44% // Decreases project coverage by -0.25% ⚠️

Coverage data is based on head (ea010f6) compared to base (7240c43).
Patch coverage: 99.15% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1288      +/-   ##
==========================================
- Coverage   88.69%   88.44%   -0.26%     
==========================================
  Files          11       11              
  Lines        1150     1073      -77     
==========================================
- Hits         1020      949      -71     
+ Misses        130      124       -6     
Impacted Files Coverage Δ
tools/schemacode/bidsschematools/render/utils.py 85.18% <ø> (-0.14%) ⬇️
tools/schemacode/bidsschematools/render/tables.py 86.84% <95.65%> (-1.86%) ⬇️
tools/schemacode/bidsschematools/render/text.py 97.39% <100.00%> (+0.02%) ⬆️
tools/schemacode/bidsschematools/validator.py 95.89% <100.00%> (+1.04%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@effigies effigies force-pushed the schema/file_reorg branch 2 times, most recently from bf448b5 to 3f15c26 Compare September 25, 2022 04:16
@effigies effigies force-pushed the schema/file_reorg branch 2 times, most recently from d42b30d to f679c09 Compare September 25, 2022 19:24
@effigies effigies force-pushed the schema/file_reorg branch 3 times, most recently from e5f4d5c to 098c3f2 Compare September 30, 2022 02:04
@tsalo tsalo self-requested a review October 4, 2022 16:58
Copy link
Member

@tsalo tsalo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just have one thought about the new make_filename_template parameter.

tools/schemacode/bidsschematools/render/text.py Outdated Show resolved Hide resolved
Comment on lines +13 to +17
- nirs
entities:
subject: required
session: optional
acquisition: optional
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rob-luke @sappelhoff I didn't follow the discussion of photo.* closely. Was there a reason that nirs photos were missing the optional acquisition flag? I can break this back out into a separate rule, but I wanted to check that this wasn't an oversight first.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, now I'm seeing that the rule was never rendered, and there's no mention in the text. Should I just remove it?

@rwblair
Copy link
Member

rwblair commented Oct 7, 2022

Some entries in rules.files.* have corresponding entries in objects (everything in common) and others don't (everything in raw and deriv). This is not a problem. What do we imagine happening if a directory becomes a valid entry in a datatype directory? And then we need to specify rules for can be in that directory? I suppose there's no reason that any given raw or deriv rule couldn't have a corresponding entry in objects if it really needs to. Not something that I'd like to write for every single thing.

Also with an eye towards the idea of something like group-<label> becoming a valid directory in the root directory of derivatives, how would we specify that in the rules.deriv folder? Could add a key to the rules specifying that the file should be at root location: '/' or that its parent is a specific entry in objects (or another rule). Under the latter regime dataset_description.json might get a new key in its rule entry likeparent: $ref.objects.directories.root.

None of this is blocking. All the rules in this PR are interpretable and make a lot of things easier as a consumer of the schema. (My secret agenda is to get rules specifying that subject directories are legal at the root of the dataset. and whats legal inside of them and so on.)

@@ -360,20 +306,16 @@ def load_all(
-------
all_regex : list of dict
A list of dictionaries, with keys including 'regex' and 'mandatory'.
my_schema : list of dict
my_schema : Mapping
Nested dictionaries representing the full schema.
"""
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TheChymera Do you want to look over the changes to the validator code at all?

@effigies
Copy link
Collaborator Author

effigies commented Oct 7, 2022

What do we imagine happening if a directory becomes a valid entry in a datatype directory? And then we need to specify rules for can be in that directory?

I guess you're talking about #1280? From the rule perspective, then the thing that would change is adding model: {optional,required} to relevant rules. From the name-construction perspective, it does argue that we should move away from the hard coding approach. You could imagine something like:

file_template:
  directories:
    - type: entity
      key: subject
    - type: entity
      key: session
    - type: datatype
  parts:
    - type: entity
      key: subject
    - type: entity
      key: session
    - type: entity
      key: sample
    - type: entity
      key: task
    ...
    - type: suffix
    - type: extension

Then we would just change it to:

file_template:
  directories:
    - type: entity
      key: subject
    - type: entity
      key: session
    - type: datatype
    - type: entity
      key: model
  ...

Not sure if this is the best way to go, but at the very least, this would extend the current schema.rules.entities list to describe how a file is put together.

Also with an eye towards the idea of something like group-<label> becoming a valid directory in the root directory of derivatives, how would we specify that in the rules.deriv folder?

If we went with something like the above, we'd need to permit multiple construction rules, where we replace subject with group.

I can't see a plausible way to do something like what we're currently doing with references to replace subject with group.

(My secret agenda is to get rules specifying that subject directories are legal at the root of the dataset. and whats legal inside of them and so on.)

I'll try to think more about this, especially looking at #1108.

@effigies effigies added the schema-code Updates or changes to the code used to parse, filter, and render the schema. label Oct 10, 2022
@effigies effigies merged commit 0b8bff5 into bids-standard:master Oct 10, 2022
@effigies effigies deleted the schema/file_reorg branch October 10, 2022 16:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
exclude-from-changelog This item will not feature in the automatically generated changelog schema-code Updates or changes to the code used to parse, filter, and render the schema. schema-structure Changes to the fundamental organization/structure of the YAML schema. Minor version release.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants