BIDS validation #259

kaitj · 2023-02-28T13:00:32Z

Proposed changes

An attempt at implementing bids validation (see #222). As suggested, this first attempts to perform a system call to the bids-validator prior to running _gen_bids_layout. If this first attempt fails, it will fall back running the validation via pybids. Validation, in both cases, are skipped if all custom paths are given. I wasn't quite sure about how to go and add a test for the system call and haven't had a chance to test this locally. There are tests for pybids.

Drawing some inspiration from fmriprep, a temporary config file is generated for the system call in (_validate_bids_dir) which includes files to ignore during this validation. Was also wondering if we want to limit the validation to the subject data themselves rather than a full dataset (e.g. ignore participants.tsv, dataset_description.json)?

Note to self:

pybids validation
Add to snakebids app
Tests

Checklist

Put an x in the boxes that apply. You can also fill these out after creating the PR. If you are unsure about any of the choices, don't hesitate to ask!

Changes have been tested to ensure that fix is effective or that a feature works.
Changes pass the unit tests
I have included necessary documentation or comments (as necessary)
Any dependent changes have been merged and published

Notes

All PRs will undergo the unit testing before being reviewed. You may be requested to explain or make additional changes before the PR is accepted.

pvandyken

Thanks @kaitj. I should have made this opinion known earlier, but I was thinking this type of validation might be better to do within run.py (à la #255).

There's two main reasons for this: first, it removes it from the snakebids "core", keeping things a bit more modular and hopefully easier to maintain. The second is more technical: as I mentioned in #256, generate_inputs() can potentially be called hundreds of times in a workflow, and running the full validator every single time would not be a great idea. So we could have tons of logic to figure out if this is the "first" run, or we could move validation to run.py and it would inherently only be run once.

This still leaves some implementation ambiguity: it could be a stand-alone function you import and call, or it could be part of Snakebids app. I'm not opinionated either way. Favouring the functional approach, it would leave space for potential future complexity (maybe you want to be able to pass a custom path to the validator executable). Favouring SnakebidsApp integration, the two share common arguments (database dir, derivative indexing, etc).

Anyway, that's my opinion, so you can let me know what you think.

pvandyken · 2023-02-28T15:02:44Z

One more thought... this type of validation seems most relevant to containers, where the developer can guarantee bids-validator will be installed. For a python installation, where bids-validator will likely not be installed (or potentially installed in a less accessible way, like a container), a developer may want more customized ways of handling validation (skip by default, print a warning only when running the app for the first time (e.g. outputs don't exist), link to installation instructions, etc). This type of thing becomes easier when validation is more of an external module.

pvandyken · 2023-02-28T15:18:02Z

Thought on implementation: maybe the most flexible approach would be to have a BidsValidator class containing the basic configuration options (executable, ignore files, config file, schema, etc), all with sensible defaults. This class gets passed to SnakebidsApp (maybe via a SnakebidsApp.add_plugin()?), which runs validation in run_snakemake(), providing the bids directory and other cli parameters.

Just looking through the source, it looks like we haven't actually standardized a lot of the basic CLI parameters like source_dir, output_dir, etc (they're still provided in the config). So this might require some architecture change.

kaitj · 2023-02-28T15:38:22Z

There's two main reasons for this: first, it removes it from the snakebids "core", keeping things a bit more modular and hopefully easier to maintain. The second is more technical: as I mentioned in https://github.com/akhanf/snakebids/issues/256, generate_inputs() can potentially be called hundreds of times in a workflow, and running the full validator every single time would not be a great idea. So we could have tons of logic to figure out if this is the "first" run, or we could move validation to run.py and it would inherently only be run once.

Just some initial thoughts, and can discuss more during a dev lunch. I think moving it to a separate module is fine. I also think it should be an opt-out feature however, so this poses a potential problem. Would also have need to have a way to differentiate between a "re-run" vs a "first-run" with additional data, which in the latter case, I would imagine you would still want to perform a validation.

Thought on implementation: maybe the most flexible approach would be to have a BidsValidator class containing the basic configuration options (executable, ignore files, config file, schema, etc), all with sensible defaults. This class gets passed to SnakebidsApp (maybe via a SnakebidsApp.add_plugin()?), which runs validation in run_snakemake(), providing the bids directory and other cli parameters.

I like this idea. The tricky bit I had trouble with initially was figuring out an implementation that falls back on using the pybids implementation if the system call fails. At the moment, we rely on whats already implemented in pybids, but I didn't realize there was also bids-validator python library. I think if we can use that directly for validation, bypassing pybids (i.e. set validate=False always in _gen_bids_layout). We can still have the first pass be to the system call, with this as the fallback in this plugin.

pvandyken · 2023-02-28T16:58:00Z

Would also have need to have a way to differentiate between a "re-run" vs a "first-run" with additional data, which in the latter case, I would imagine you would still want to perform a validation.

True. Just to clarify, that aspect was just spit-balling. The core issue is multiple validations within a single snakemake run, which would occur if validation was done in generate_inputs(). Anything else is just trying to keep space for future flexibility.

but I didn't realize there was also bids-validator python library.

The python library, AFAIK, does no more validation than pybids does. It's probably possible to install a node library with pip, but doubtfully advisable

pvandyken · 2023-02-28T17:07:51Z

I also think it should be an opt-out feature however, so this poses a potential problem.

Should still be possible within my scheme, although the api for opting out might be messy.

I think ultimately we need a broader discussion about handling these types of extras (perhaps jumping off #255). There's a number of things we could do to make snakebids apps work better: validation, dataset_description.json copying #54, automatic pybidsdb #256, logging #184, and there's potentially any number of more plugin-like extras people could imagine. It would be good to have a uniform way of handling these, especially since many will require information parsed from the CLI, so we'll need the cooperation of SnakeBidsApp. Once we have that framework, it will be easier to decide which things should be "loaded" by default, and how that should occur.

kaitj · 2023-02-28T18:50:52Z

The python library, AFAIK, does no more validation than pybids does. It's probably possible to install a node library with pip, but doubtfully advisable.

Just for clarification, pybids calls the python library (I believe), so the validation should be the same. I agree about (not) trying to install the node library via pip. This is more of option to ensure that validation is still performed if it hasn't been opted out of, and if there is node library installed.

pvandyken · 2023-03-03T20:21:59Z

Just opened #260. The goal is to use the architecture there to implement this.

- Currently performs validation for entire dataset, can look into performing validation for only subjects that are included in run

- also includes quality fixes"

- update docs to include information about skipping bids validation - renamed function from validate_input_dir -> validate_bids_dir - added exception to skipping bids-validation with node if all paths provided are custom - quality fixes

kaitj · 2023-05-09T12:59:40Z

Closing this in favour of #291.

kaitj added the enhancement New feature or request label Feb 28, 2023

kaitj changed the title ~~BIDS valiation~~ BIDS validation Feb 28, 2023

github-actions bot requested review from akhanf, pvandyken and tkkuehn February 28, 2023 13:00

github-actions bot assigned kaitj Feb 28, 2023

kaitj linked an issue Feb 28, 2023 that may be closed by this pull request

integrate bids validator by default into snake bids #222

Closed

pvandyken requested changes Feb 28, 2023

View reviewed changes

pvandyken mentioned this pull request Mar 3, 2023

Basic Plugin Architecture #260

Closed

kaitj added 10 commits March 10, 2023 12:15

add skip-validation to snakebids cli

21aa974

Add opt-out validation of bids_dir

6aa4f1a

- Currently performs validation for entire dataset, can look into performing validation for only subjects that are included in run

skip bids validation for existing tests

b2bbf93

add skip_validation options to app

a3cc17c

fix unbound var + vars with no defaults

737bb02

update mock config

a4ff1a9

add initial tests for bids-validation

b2239c4

skip_validation -> skip_bids_validation for clarity

92ea984

add test + description.json for passing validation

9ccfd1a

- also includes quality fixes"

update docs, rename validation fn

430f28a

- update docs to include information about skipping bids validation - renamed function from validate_input_dir -> validate_bids_dir - added exception to skipping bids-validation with node if all paths provided are custom - quality fixes

kaitj force-pushed the bids-validation branch from 97f5768 to 430f28a Compare March 10, 2023 19:03

wip - bids validation plugin

e9a5155

kaitj force-pushed the bids-validation branch from 15e46db to e9a5155 Compare March 14, 2023 09:47

kaitj removed a link to an issue May 9, 2023

integrate bids validator by default into snake bids #222

Closed

kaitj closed this May 9, 2023

kaitj deleted the bids-validation branch May 9, 2023 12:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BIDS validation #259

BIDS validation #259

kaitj commented Feb 28, 2023 •

edited

Loading

pvandyken left a comment •

edited

Loading

pvandyken commented Feb 28, 2023

pvandyken commented Feb 28, 2023

kaitj commented Feb 28, 2023 •

edited

Loading

pvandyken commented Feb 28, 2023

pvandyken commented Feb 28, 2023

kaitj commented Feb 28, 2023

pvandyken commented Mar 3, 2023

kaitj commented May 9, 2023

BIDS validation #259

BIDS validation #259

Conversation

kaitj commented Feb 28, 2023 • edited Loading

Proposed changes

Checklist

Notes

pvandyken left a comment • edited Loading

Choose a reason for hiding this comment

pvandyken commented Feb 28, 2023

pvandyken commented Feb 28, 2023

kaitj commented Feb 28, 2023 • edited Loading

pvandyken commented Feb 28, 2023

pvandyken commented Feb 28, 2023

kaitj commented Feb 28, 2023

pvandyken commented Mar 3, 2023

kaitj commented May 9, 2023

kaitj commented Feb 28, 2023 •

edited

Loading

pvandyken left a comment •

edited

Loading

kaitj commented Feb 28, 2023 •

edited

Loading