Skip to content

Commit

Permalink
enh: add datalad & containers section
Browse files Browse the repository at this point in the history
  • Loading branch information
oesteban committed Dec 26, 2024
1 parent 5fb9d15 commit 9af55c7
Show file tree
Hide file tree
Showing 2 changed files with 176 additions and 0 deletions.
175 changes: 175 additions & 0 deletions docs/apps/datalad.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,175 @@
Apps may be able to identify if the input dataset is handled with
*DataLad* or *Git-Annex*, and pull down linked data that has not
been fetched yet.
One example of one such application is *MRIQC*, and all the examples
on this documentation page will refer to it.

!!! important "Summary"

Executing *BIDS-Apps* leveraging *DataLad*-controlled datasets
within containers can be tricky.
In particular, one of our general recommendations involves mounting
or binding folders into the container in **read-only mode**, which
will disallow *DataLad* from writing to the dataset tree.
Similarly, and depending on the specific runtime settings of the
container framework, *DataLad* may encounter issues with file ownership too.
This section guides users through ensuring smooth execution of
*BIDS-Apps* on *DataLad*/*Git-annex*-managed datasets.

## *DataLad* and *Docker*

When executing *MRIQC* within *Docker* on a *DataLad* dataset
(for instance, installed from [*OpenNeuro*](https://openneuro.org)),
we will need to ensure the following settings are observed:

* the user id (uid) who *installed* the *DataLad* dataset must match
the uid who is *executing MRIQC* within the container runtime
* the uid who is *executing MRIQC* within the container must
have sufficient permissions to write in the tree.

### Setting execution uid

If the uid is not correct, we will likely encounter the following error:

```
datalad.runner.exception.CommandError: CommandError: 'git -c diff.ignoreSubmodules=none -c core.quotepath=false -c annex.merge-annex-branches=false annex find --not --in . --json --json-error-messages -c annex.dotfiles=true -- sub-0001/func/sub-0001_task-restingstate_acq-mb3_bold.nii.gz sub-0002/func/sub-0002_task-emomatching_acq-seq_bold.nii.gz sub-0002/func/sub-0002_task-restingstate_acq-mb3_bold.nii.gz sub-0001/func/sub-0001_task-emomatching_acq-seq_bold.nii.gz sub-0001/func/sub-0001_task-faces_acq-mb3_bold.nii.gz sub-0001/dwi/sub-0001_dwi.nii.gz sub-0002/func/sub-0002_task-workingmemory_acq-seq_bold.nii.gz sub-0001/anat/sub-0001_T1w.nii.gz sub-0002/anat/sub-0002_T1w.nii.gz sub-0001/func/sub-0001_task-gstroop_acq-seq_bold.nii.gz sub-0002/func/sub-0002_task-faces_acq-mb3_bold.nii.gz sub-0002/func/sub-0002_task-anticipation_acq-seq_bold.nii.gz sub-0002/dwi/sub-0002_dwi.nii.gz sub-0001/func/sub-0001_task-anticipation_acq-seq_bold.nii.gz sub-0001/func/sub-0001_task-workingmemory_acq-seq_bold.nii.gz sub-0002/func/sub-0002_task-gstroop_acq-seq_bold.nii.gz' failed with exitcode 1 under /data [info keys: stdout_json] [err: 'git-annex: Git refuses to operate in this repository, probably because it is owned by someone else.
To add an exception for this directory, call:
git config --global --add safe.directory /data
git-annex: automatic initialization failed due to above problems']
```

Confusingly, following the suggestion from *DataLad* directly on the host
(`git config --global --add safe.directory /data`) will not work in this
case, because this line must be executed within the container.

Instead, we can override the default user executing within the container
(which is `root`, or uid = 0).
This can be achieved with
[*Docker*'s `-u`/`--user` option](https://docs.docker.com/engine/containers/run/#user):

```
--user=[ user | user:group | uid | uid:gid | user:gid | uid:group ]
```

We can combine this option with *Bash*'s `id` command to ensure the current user's uid and group id (gid) are being set.
Let's update the last example in the previous
[*Docker* execution section](docker.md#running-a-niprep-directly-interacting-with-the-docker-engine):


``` {.shell hl_lines="5"}
$ docker run -ti --rm \
-v $HOME/ds002785:/data:ro \
-v $HOME/ds002785/derivatives:/out \
-v $HOME/tmp/ds002785-workdir:/work \
-u $(id -u):$(id -g) \ # set execution uid:gid
nipreps/mriqc:<latest-version> \
\
/data /out/mriqc-<latest-version> \
participant \
-w /work
```

The above command line will ensure *MRIQC* to be executed with the current
uid and gid, which will match the filesystem's permissions if the dataset
was installed with the same user.

!!! danger "Match uid and gid with those corresponding to the user who installed the dataset"

When different users are to install the dataset and
execute the application, *Docker* must be executed with the
uid and gid corresponding to the user who installed the dataset.
The uid corresponding to a given username (for instance `janedoe`)
can be obtained as follows:

```
getent passwd "janedoe" | cut -f 3 -d ":"
```

and her gid:

```
getent passwd "janedoe" | cut -f 4 -d ":"
```

### Mounting the dataset folder without *read-only* permissions

If the dataset is protected with *read-only* permissions, then *MRIQC*
will hit the following error
([see `nipreps/mriqc#1363`](https://github.com/nipreps/mriqc/issues/1363)):

```
get(error): sub-0001/func/sub-0001_task-restingstate_acq-mb3_bold.nii.gz (file) [git-annex: .git/annex/tmp: createDirectory: permission denied (Read-only file system)]
action summary:
get (error: 1)
Traceback (most recent call last):
File "/opt/conda/bin/mriqc", line 8, in <module>
sys.exit(main())
^^^^^^
File "/opt/conda/lib/python3.11/site-packages/mriqc/cli/run.py", line 43, in main
parse_args(argv)
File "/opt/conda/lib/python3.11/site-packages/mriqc/cli/parser.py", line 658, in parse_args
initialize_meta_and_data()
File "/opt/conda/lib/python3.11/site-packages/mriqc/utils/misc.py", line 447, in initialize_meta_and_data
_datalad_get(dataset)
File "/opt/conda/lib/python3.11/site-packages/mriqc/utils/misc.py", line 282, in _datalad_get
return get(
^^^^
File "/opt/conda/lib/python3.11/site-packages/datalad/interface/base.py", line 773, in eval_func
return return_func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/datalad/interface/base.py", line 763, in return_func
results = list(results)
^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/datalad_next/patches/interface_utils.py", line 287, in _execute_command_
raise IncompleteResultsError(
datalad.support.exceptions.IncompleteResultsError: Command did not complete successfully. 1 failed:
[{'action': 'get',
'annexkey': 'MD5E-s76037251--344f061a3165c71e36b98ad1649c3c8c.nii.gz',
'error_message': 'git-annex: .git/annex/tmp: createDirectory: permission '
'denied (Read-only file system)',
'path': '/data/sub-0001/func/sub-0001_task-restingstate_acq-mb3_bold.nii.gz',
'refds': '/data',
'status': 'error',
'type': 'file'}]
```

This error indicates that the container is executed with
the appropriate uid and gid pair.
In this case, we will need to ensure *DataLad* can write
to the dataset installation when obtaining new data.
This is easily achieved by **removing the read-only parameters** of the
mount option:

``` {.shell hl_lines="2 5"}
$ docker run -ti --rm \
-v $HOME/ds002785:/data \ # mount data WITHOUT :ro
-v $HOME/ds002785/derivatives:/out \
-v $HOME/tmp/ds002785-workdir:/work \
-u $(id -u):$(id -g) \ # set execution uid:gid
nipreps/mriqc:<latest-version> \
\
/data /out/mriqc-<latest-version> \
participant \
-w /work
```

## *DataLad* and *Singularity*/*Apptainer*

In the case of *Singularity* and *Apptainer*, ensuring the uid that
executes the container [involves using user namespace mappings](https://apptainer.org/docs/admin/1.0/user_namespace.html#user-namespace-requirementsn).
Therefore, you will need to contact your system administrator to figure
out a convenient solution to the problem.

Since most of *Singularity*/*Apptainer* deployments automatically bind
the user's `$HOME` directory, *DataLad*'s suggested direction may
work:

```
git config --global --add safe.directory <path-to-dataset-in-host>
```

Allowing the container to write on the dataset's tree is straightforward
and homologous to *Docker*, by removing the `:ro` setting in the binding
option (`-B`).
1 change: 1 addition & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@ nav:
- Introduction: apps/framework.md
- Executing with Docker: apps/docker.md
- Executing with Singularity: apps/singularity.md
- Git-Annex and DataLad within containers: apps/datalad.md
- Presentations: users/talks.md
- Educational: users/educational.md
- Documentation (devs):
Expand Down

0 comments on commit 9af55c7

Please sign in to comment.