Skip to content
This repository has been archived by the owner on Nov 2, 2021. It is now read-only.

FR: batch import dicoms of multiple acquisitions #148

Open
pvavra opened this issue Dec 12, 2019 · 4 comments
Open

FR: batch import dicoms of multiple acquisitions #148

pvavra opened this issue Dec 12, 2019 · 4 comments

Comments

@pvavra
Copy link

pvavra commented Dec 12, 2019

If importing a multiple tarballs (of multiple subjects), it would be convenient to have a "batch mode" for calling the hirni-import-dcm.

I guess how precisely to specify this might vary substantially between circumstances, but then a simple helper-script template might be convenient.

@pvavra
Copy link
Author

pvavra commented Dec 12, 2019

I've written a simple procedure, which tries to achieve the above. Not really robust, but it seems to work for our specific use-case.

@pvavra
Copy link
Author

pvavra commented Dec 12, 2019

@bpoldrack I also have a conceptual question:
Running the imports in parallel results in "interleaved" commits (each import seems to generate 3 separate commits: one for the dicoms, one for the specs, and one for the updated metadata).

Do you foresee any issues we could run into doing this? Maybe during the metadata aggregation step?

@pvavra
Copy link
Author

pvavra commented Dec 13, 2019

So, running several imports in parallel doesn't seem to work well.

I noticed two main issues:

  • failed to create all studyspec.json files for all acquisitions - this is a major issue
  • commit messages are "mixed", as ds.save() calls do not use path=..

Using a structure like the datalad run --explicit call should make this work in parallel, assuming that no two imports are targeting the same dicoms folder.

To assert the latter part, it would be good to have the hirni-import-dcm call handle the submission of jobs to condor instead of using the --pbs-runner condor argument. This way, some basic sanity checks could be run over the whole set of imports. Then, the ds.save could use the aforementioned path=.. argument to be sure to only add the relevant files.

@bpoldrack
Copy link

commit messages are "mixed", as ds.save() calls do not use path=..

Agree. save calls - particularly in the superdataset - should do that. Otherwise it could commit intermediate states of other imports running in parallel.

Do you foresee any issues we could run into doing this? Maybe during the metadata aggregation step?

Metadata aggregation could have very similar issue as those save calls. It should "fix itself" with the last run, but I guess it's safer to properly account for that in hirni-import-dcm.

failed to create all studyspec.json files for all acquisitions - this is a major issue

That's interesting as I don't instantly see, where this issue is emerging from.

Generally, importing should be easier to parallelize - I agree. Once at it, addressing this should also include to allow for import of several archives into the same acquisition and support update of an already imported archive (which currently would be doable only by use of more low-level tools).
Not quite sure about the condor related part yet. There might be a better way making use of https://github.com/datalad/datalad-htcondor. Need to think that through. Ideally we can come up with something that generalizes beyond condor.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants