Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BIDS validation #259

Closed
wants to merge 11 commits into from
Closed

BIDS validation #259

wants to merge 11 commits into from

Conversation

kaitj
Copy link
Contributor

@kaitj kaitj commented Feb 28, 2023

Proposed changes

An attempt at implementing bids validation (see #222). As suggested, this first attempts to perform a system call to the bids-validator prior to running _gen_bids_layout. If this first attempt fails, it will fall back running the validation via pybids. Validation, in both cases, are skipped if all custom paths are given. I wasn't quite sure about how to go and add a test for the system call and haven't had a chance to test this locally. There are tests for pybids.

Drawing some inspiration from fmriprep, a temporary config file is generated for the system call in (_validate_bids_dir) which includes files to ignore during this validation. Was also wondering if we want to limit the validation to the subject data themselves rather than a full dataset (e.g. ignore participants.tsv, dataset_description.json)?

Note to self:

  • pybids validation
  • Add to snakebids app
  • Tests

Checklist

Put an x in the boxes that apply. You can also fill these out after creating the PR. If you are unsure about any of the choices, don't hesitate to ask!

  • Changes have been tested to ensure that fix is effective or that a feature works.
  • Changes pass the unit tests
  • I have included necessary documentation or comments (as necessary)
  • Any dependent changes have been merged and published

Notes

All PRs will undergo the unit testing before being reviewed. You may be requested to explain or make additional changes before the PR is accepted.

@kaitj kaitj added the enhancement New feature or request label Feb 28, 2023
@kaitj kaitj changed the title BIDS valiation BIDS validation Feb 28, 2023
@kaitj kaitj linked an issue Feb 28, 2023 that may be closed by this pull request
Copy link
Contributor

@pvandyken pvandyken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @kaitj. I should have made this opinion known earlier, but I was thinking this type of validation might be better to do within run.py (à la #255).

There's two main reasons for this: first, it removes it from the snakebids "core", keeping things a bit more modular and hopefully easier to maintain. The second is more technical: as I mentioned in #256, generate_inputs() can potentially be called hundreds of times in a workflow, and running the full validator every single time would not be a great idea. So we could have tons of logic to figure out if this is the "first" run, or we could move validation to run.py and it would inherently only be run once.

This still leaves some implementation ambiguity: it could be a stand-alone function you import and call, or it could be part of Snakebids app. I'm not opinionated either way. Favouring the functional approach, it would leave space for potential future complexity (maybe you want to be able to pass a custom path to the validator executable). Favouring SnakebidsApp integration, the two share common arguments (database dir, derivative indexing, etc).

Anyway, that's my opinion, so you can let me know what you think.

@pvandyken
Copy link
Contributor

One more thought... this type of validation seems most relevant to containers, where the developer can guarantee bids-validator will be installed. For a python installation, where bids-validator will likely not be installed (or potentially installed in a less accessible way, like a container), a developer may want more customized ways of handling validation (skip by default, print a warning only when running the app for the first time (e.g. outputs don't exist), link to installation instructions, etc). This type of thing becomes easier when validation is more of an external module.

@pvandyken
Copy link
Contributor

Thought on implementation: maybe the most flexible approach would be to have a BidsValidator class containing the basic configuration options (executable, ignore files, config file, schema, etc), all with sensible defaults. This class gets passed to SnakebidsApp (maybe via a SnakebidsApp.add_plugin()?), which runs validation in run_snakemake(), providing the bids directory and other cli parameters.

Just looking through the source, it looks like we haven't actually standardized a lot of the basic CLI parameters like source_dir, output_dir, etc (they're still provided in the config). So this might require some architecture change.

@kaitj
Copy link
Contributor Author

kaitj commented Feb 28, 2023

There's two main reasons for this: first, it removes it from the snakebids "core", keeping things a bit more modular and hopefully easier to maintain. The second is more technical: as I mentioned in https://github.com/akhanf/snakebids/issues/256, generate_inputs() can potentially be called hundreds of times in a workflow, and running the full validator every single time would not be a great idea. So we could have tons of logic to figure out if this is the "first" run, or we could move validation to run.py and it would inherently only be run once.

Just some initial thoughts, and can discuss more during a dev lunch. I think moving it to a separate module is fine. I also think it should be an opt-out feature however, so this poses a potential problem. Would also have need to have a way to differentiate between a "re-run" vs a "first-run" with additional data, which in the latter case, I would imagine you would still want to perform a validation.

Thought on implementation: maybe the most flexible approach would be to have a BidsValidator class containing the basic configuration options (executable, ignore files, config file, schema, etc), all with sensible defaults. This class gets passed to SnakebidsApp (maybe via a SnakebidsApp.add_plugin()?), which runs validation in run_snakemake(), providing the bids directory and other cli parameters.

I like this idea. The tricky bit I had trouble with initially was figuring out an implementation that falls back on using the pybids implementation if the system call fails. At the moment, we rely on whats already implemented in pybids, but I didn't realize there was also bids-validator python library. I think if we can use that directly for validation, bypassing pybids (i.e. set validate=False always in _gen_bids_layout). We can still have the first pass be to the system call, with this as the fallback in this plugin.

@pvandyken
Copy link
Contributor

Would also have need to have a way to differentiate between a "re-run" vs a "first-run" with additional data, which in the latter case, I would imagine you would still want to perform a validation.

True. Just to clarify, that aspect was just spit-balling. The core issue is multiple validations within a single snakemake run, which would occur if validation was done in generate_inputs(). Anything else is just trying to keep space for future flexibility.

but I didn't realize there was also bids-validator python library.

The python library, AFAIK, does no more validation than pybids does. It's probably possible to install a node library with pip, but doubtfully advisable

@pvandyken
Copy link
Contributor

I also think it should be an opt-out feature however, so this poses a potential problem.

Should still be possible within my scheme, although the api for opting out might be messy.

I think ultimately we need a broader discussion about handling these types of extras (perhaps jumping off #255). There's a number of things we could do to make snakebids apps work better: validation, dataset_description.json copying #54, automatic pybidsdb #256, logging #184, and there's potentially any number of more plugin-like extras people could imagine. It would be good to have a uniform way of handling these, especially since many will require information parsed from the CLI, so we'll need the cooperation of SnakeBidsApp. Once we have that framework, it will be easier to decide which things should be "loaded" by default, and how that should occur.

@kaitj
Copy link
Contributor Author

kaitj commented Feb 28, 2023

The python library, AFAIK, does no more validation than pybids does. It's probably possible to install a node library with pip, but doubtfully advisable.

Just for clarification, pybids calls the python library (I believe), so the validation should be the same. I agree about (not) trying to install the node library via pip. This is more of option to ensure that validation is still performed if it hasn't been opted out of, and if there is node library installed.

@pvandyken
Copy link
Contributor

Just opened #260. The goal is to use the architecture there to implement this.

kaitj added 10 commits March 10, 2023 12:15
- Currently performs validation for entire dataset, can look into
  performing validation for only subjects that are included in run
- update docs to include information about skipping bids validation
- renamed function from validate_input_dir -> validate_bids_dir
- added exception to skipping bids-validation with node if all paths
  provided are custom
- quality fixes
@kaitj kaitj force-pushed the bids-validation branch from 97f5768 to 430f28a Compare March 10, 2023 19:03
@kaitj
Copy link
Contributor Author

kaitj commented May 9, 2023

Closing this in favour of #291.

@kaitj kaitj closed this May 9, 2023
@kaitj kaitj deleted the bids-validation branch May 9, 2023 12:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants