FR: batch import dicoms of multiple acquisitions #148

pvavra · 2019-12-12T16:58:24Z

If importing a multiple tarballs (of multiple subjects), it would be convenient to have a "batch mode" for calling the hirni-import-dcm.

I guess how precisely to specify this might vary substantially between circumstances, but then a simple helper-script template might be convenient.

The text was updated successfully, but these errors were encountered:

pvavra · 2019-12-12T17:00:43Z

I've written a simple procedure, which tries to achieve the above. Not really robust, but it seems to work for our specific use-case.

pvavra · 2019-12-12T18:46:03Z

@bpoldrack I also have a conceptual question:
Running the imports in parallel results in "interleaved" commits (each import seems to generate 3 separate commits: one for the dicoms, one for the specs, and one for the updated metadata).

Do you foresee any issues we could run into doing this? Maybe during the metadata aggregation step?

pvavra · 2019-12-13T13:01:16Z

So, running several imports in parallel doesn't seem to work well.

I noticed two main issues:

failed to create all studyspec.json files for all acquisitions - this is a major issue
commit messages are "mixed", as ds.save() calls do not use path=..

Using a structure like the datalad run --explicit call should make this work in parallel, assuming that no two imports are targeting the same dicoms folder.

To assert the latter part, it would be good to have the hirni-import-dcm call handle the submission of jobs to condor instead of using the --pbs-runner condor argument. This way, some basic sanity checks could be run over the whole set of imports. Then, the ds.save could use the aforementioned path=.. argument to be sure to only add the relevant files.

bpoldrack · 2019-12-13T15:57:44Z

commit messages are "mixed", as ds.save() calls do not use path=..

Agree. save calls - particularly in the superdataset - should do that. Otherwise it could commit intermediate states of other imports running in parallel.

Do you foresee any issues we could run into doing this? Maybe during the metadata aggregation step?

Metadata aggregation could have very similar issue as those save calls. It should "fix itself" with the last run, but I guess it's safer to properly account for that in hirni-import-dcm.

failed to create all studyspec.json files for all acquisitions - this is a major issue

That's interesting as I don't instantly see, where this issue is emerging from.

Generally, importing should be easier to parallelize - I agree. Once at it, addressing this should also include to allow for import of several archives into the same acquisition and support update of an already imported archive (which currently would be doable only by use of more low-level tools).
Not quite sure about the condor related part yet. There might be a better way making use of https://github.com/datalad/datalad-htcondor. Need to think that through. Ideally we can come up with something that generalizes beyond condor.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FR: batch import dicoms of multiple acquisitions #148

FR: batch import dicoms of multiple acquisitions #148

pvavra commented Dec 12, 2019

pvavra commented Dec 12, 2019

pvavra commented Dec 12, 2019

pvavra commented Dec 13, 2019

bpoldrack commented Dec 13, 2019

FR: batch import dicoms of multiple acquisitions #148

FR: batch import dicoms of multiple acquisitions #148

Comments

pvavra commented Dec 12, 2019

pvavra commented Dec 12, 2019

pvavra commented Dec 12, 2019

pvavra commented Dec 13, 2019

bpoldrack commented Dec 13, 2019