running spec2bids without saved changes in sourcedata should throw a warning #156

pvavra · 2020-04-18T09:37:20Z

disclaimer: I'm not sure this is actually a bug or just a misunderstanding of the workflow by me.

my situation:

I've imported dicoms into a sourcedata
sourcedata is inside the bids folder
I've created a custom datalad-procedure for converting a tarball containing the behavioral log files into bids-conforming events.tsv files (located in sourcedata). this has a bug, though.
my studyspec.json files for each subject are ready

Calling datalad hirni-spec2bids sourcedata/sub-001/studyspec.json from within the bids-folder, the behavior conversion doesn't work and throws an error.

After some debugging, I manage to fix the error and re-run the above call.
The conversion procedure also saves the newly created *_events.tsv files - and only those, by calling datalad save .. sub-001/*_events.tsv

And this last step is what I'm wondering about:
on the one hand, using save with the PATH arguments makes sense, as this would enables me to call this conversion in parallel.
However, from what I understand, the git-log now contains a reference to files (events.tsv) which are "out of sync" with the record of the sourcedata checksum, because in the above workflow I didn't save the sourcedata state yet (I forgot..).

Not sure what the "best" solution would be, but one that comes to mind is having "hirni-spec2bids" check whether the sourcedata dataset is in modified state and ask the user for confirmation to continue anyway. That is, give a warning in some form.

The goal, I think, should be to have the conversion to bids only work with a saved state of the used scripts/files, and not use things that are changed in the working tree.

@bpoldrack What are your thoughts on this?

The text was updated successfully, but these errors were encountered:

bpoldrack · 2020-04-18T12:20:43Z

disclaimer: I'm not sure this is actually a bug or just a misunderstanding of the workflow by me.

Neither. Behaves as expected to me, so - no bug. And I can't see anything really wrong with your workflow. It simply didn't occur to me that this could be unintended. There's a slight difference in your workflow, that I do consider valid but didn't think of. That is: I expected people to work on their sourcedata repo at an independent location, while sourcedata within bids would be a clone thereof (that would be updated if you changed something and committed it). So, this situation wouldn't happen.

But I think you have a point there! Of course you should be able to work directly within that subdataset. While I wouldn't enforce it since you should have the freedom to do so (for whatever reason), a warning or confirmation makes sense to me. Need to think how to handle a non-interactive execution though.

bpoldrack · 2020-04-18T12:30:46Z

Just for clarification in case this is needed:
As shown in the demo http://docs.datalad.org/projects/hirni/en/latest/demos/conversion.html
I'd expect people to have a location for the raw dataset, say - studyXY. And then have the bids dataset at studyXY_BIDS wherein you'd datalad install -d . -s studyXY sourcedata, referencing the raw dataset. After conversion you could uninstall sourcedata. Therefore you can have several converted datasets based on different states of the raw dataset or targeting different layouts (like different BIDS version for example), while having the one place where you'd maintain your raw dataset.

However - your approach is valid.

pvavra · 2020-04-22T20:36:45Z

I understand the "different location" workflow. The advantage of having such a nested structure is less disk-space usage, I think (without having to add "drops" everywhere..).

concerning the non-interactive execution: I cannot think of a use-case (limited imagination?) where one would like to convert automatically to bids from an unclean state (or at least unsaved files -- the rest of the repo might be irrelevant). But maybe having a --no-check analogous to drop could work?

Thinking about relative frequencies of use-cases, I'm quite confident that "clean sourcedata" will be more common, and should be hence the default..

bpoldrack · 2020-04-23T10:49:02Z

cannot think of a use-case (limited imagination?) where one would like to convert automatically to bids from an unclean state

Kind of the point. It's a warning after all. I always edit specfiles in a scripted fashion. Then convert. I can make a mistake in such scripts leading to uncommitted changes. Thus running into the same issue.

But maybe having a --no-check analogous to drop could work

Yes. The concern is the other way around: If the check only leads to a warning or an interactive confirmation request, you can't easily react to it non-interactively. For the same "protection" level in scripted usage the command would need to actually fail. That's an option of course: Fail except --no-check is given. However, that might be a bit annoying in the interactive use case.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

running spec2bids without saved changes in sourcedata should throw a warning #156

running spec2bids without saved changes in sourcedata should throw a warning #156

pvavra commented Apr 18, 2020

bpoldrack commented Apr 18, 2020

bpoldrack commented Apr 18, 2020

pvavra commented Apr 22, 2020

bpoldrack commented Apr 23, 2020

running spec2bids without saved changes in sourcedata should throw a warning #156

running spec2bids without saved changes in sourcedata should throw a warning #156

Comments

pvavra commented Apr 18, 2020

bpoldrack commented Apr 18, 2020

bpoldrack commented Apr 18, 2020

pvavra commented Apr 22, 2020

bpoldrack commented Apr 23, 2020