-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BIDS validation #259
BIDS validation #259
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @kaitj. I should have made this opinion known earlier, but I was thinking this type of validation might be better to do within run.py
(à la #255).
There's two main reasons for this: first, it removes it from the snakebids "core", keeping things a bit more modular and hopefully easier to maintain. The second is more technical: as I mentioned in #256, generate_inputs()
can potentially be called hundreds of times in a workflow, and running the full validator every single time would not be a great idea. So we could have tons of logic to figure out if this is the "first" run, or we could move validation to run.py
and it would inherently only be run once.
This still leaves some implementation ambiguity: it could be a stand-alone function you import and call, or it could be part of Snakebids
app. I'm not opinionated either way. Favouring the functional approach, it would leave space for potential future complexity (maybe you want to be able to pass a custom path to the validator executable). Favouring SnakebidsApp
integration, the two share common arguments (database dir, derivative indexing, etc).
Anyway, that's my opinion, so you can let me know what you think.
One more thought... this type of validation seems most relevant to containers, where the developer can guarantee |
Thought on implementation: maybe the most flexible approach would be to have a Just looking through the source, it looks like we haven't actually standardized a lot of the basic CLI parameters like source_dir, output_dir, etc (they're still provided in the config). So this might require some architecture change. |
Just some initial thoughts, and can discuss more during a dev lunch. I think moving it to a separate module is fine. I also think it should be an opt-out feature however, so this poses a potential problem. Would also have need to have a way to differentiate between a "re-run" vs a "first-run" with additional data, which in the latter case, I would imagine you would still want to perform a validation.
I like this idea. The tricky bit I had trouble with initially was figuring out an implementation that falls back on using the pybids implementation if the system call fails. At the moment, we rely on whats already implemented in pybids, but I didn't realize there was also |
True. Just to clarify, that aspect was just spit-balling. The core issue is multiple validations within a single snakemake run, which would occur if validation was done in
The python library, AFAIK, does no more validation than pybids does. It's probably possible to install a node library with pip, but doubtfully advisable |
Should still be possible within my scheme, although the api for opting out might be messy. I think ultimately we need a broader discussion about handling these types of extras (perhaps jumping off #255). There's a number of things we could do to make |
Just for clarification, pybids calls the python library (I believe), so the validation should be the same. I agree about (not) trying to install the node library via pip. This is more of option to ensure that validation is still performed if it hasn't been opted out of, and if there is node library installed. |
Just opened #260. The goal is to use the architecture there to implement this. |
- Currently performs validation for entire dataset, can look into performing validation for only subjects that are included in run
- also includes quality fixes"
- update docs to include information about skipping bids validation - renamed function from validate_input_dir -> validate_bids_dir - added exception to skipping bids-validation with node if all paths provided are custom - quality fixes
Closing this in favour of #291. |
Proposed changes
An attempt at implementing bids validation (see #222). As suggested, this first attempts to perform a system call to the bids-validator prior to running
_gen_bids_layout
. If this first attempt fails, it will fall back running the validation via pybids. Validation, in both cases, are skipped if all custom paths are given. I wasn't quite sure about how to go and add a test for the system call and haven't had a chance to test this locally. There are tests for pybids.Drawing some inspiration from fmriprep, a temporary config file is generated for the system call in (
_validate_bids_dir
) which includes files to ignore during this validation. Was also wondering if we want to limit the validation to the subject data themselves rather than a full dataset (e.g. ignoreparticipants.tsv, dataset_description.json
)?Note to self:
pybids
validationChecklist
Put an
x
in the boxes that apply. You can also fill these out after creating the PR. If you are unsure about any of the choices, don't hesitate to ask!Notes
All PRs will undergo the unit testing before being reviewed. You may be requested to explain or make additional changes before the PR is accepted.