Feature idea: `datalad get` from other local source without copying files again #7674

lnnrtwttkhn · 2024-10-23T07:56:09Z

Description

Hi everyone, following YODA principles, I regularly run into the following "issue": To keep datasets modular, I usually add them as subdatasets in an inputs directory while they also exist at the project-directory level. Here is an example: I have a BIDS DataLad dataset (bids) that I add as a subdataset to my fmriprep DataLad dataset:

myproject
.
├── fmriprep
│   ├── code
│   └── inputs
│       └── bids (5992f12) # same DataLad dataset as below
└── bids (5992f12)

Now when I run fMRIprep, I give it ./fmriprep/inputs/bids as the input path. But this involves running datalad get to actually get the files of the BIDS dataset into that place. To speed this up, I usually configure a local DataLad sibling for ./fmriprep/inputs/bids like this datalad siblings add -s local --url ../../../bids. Then datalad get can retrieve the data from local. But then I have the full size of the BIDS dataset in two locations which takes up additional disk space. Of course, I could datalad drop the files again but, and here comes the idea, maybe there is a way to adjust the path such that the data does not have to be retrieved and copied again, while still staying in line with YODA principles.

I am not even sure if this is something that can or / should be handled on the DataLad side but maybe you know other nice workarounds for this? Thanks!

The text was updated successfully, but these errors were encountered:

yarikoptic · 2024-10-23T13:38:10Z

isn't --reckless=ephemeral mode is exactly what you need, where .git/annex/objects would be shared from the original repository, thus you would not need to actually "get" any load? Related issues worth reviewing/chiming in
(mostly unrelated to the question 1) Within fmriprep should have sourcedata/raw not, inputs/bids to follow recommended in BIDS hierarchy:
- https://bids-specification.readthedocs.io/en/stable/common-principles.html#source-vs-raw-vs-derived-data
(mostly unrelated to the question 2) Please consider following BIDS specification even for the project level dataset. See/chime in on
- Add DatasetType="project" and rework existing "layout" example into a proper BIDS dataset bids-standard/bids-specification#1861

edit: have a look at those derived datasets produced by openneuro folks: https://github.com/OpenNeuroDerivatives/ as for organization of subdatasets

yarikoptic · 2024-10-23T17:21:52Z

what stroke me after is to realize that --reckless=ephemeral wouldn't work if used with datalad-containers without ad-hoc/adjustments to bind mount some higher level folder from which .git/annex would be symlinked from, so it wouldn't be the complete solution as is

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature idea: `datalad get` from other local source without copying files again #7674

Feature idea: `datalad get` from other local source without copying files again #7674

lnnrtwttkhn commented Oct 23, 2024 •

edited

Loading

yarikoptic commented Oct 23, 2024 •

edited

Loading

yarikoptic commented Oct 23, 2024

Feature idea: datalad get from other local source without copying files again #7674

Feature idea: datalad get from other local source without copying files again #7674

Comments

lnnrtwttkhn commented Oct 23, 2024 • edited Loading

Description

yarikoptic commented Oct 23, 2024 • edited Loading

yarikoptic commented Oct 23, 2024

Feature idea: `datalad get` from other local source without copying files again #7674

Feature idea: `datalad get` from other local source without copying files again #7674

lnnrtwttkhn commented Oct 23, 2024 •

edited

Loading

yarikoptic commented Oct 23, 2024 •

edited

Loading