Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a HPC troubleshooting section to the documentation #1349

Merged
merged 4 commits into from
Jan 13, 2025
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
62 changes: 52 additions & 10 deletions docs/source/usage.rst
Original file line number Diff line number Diff line change
Expand Up @@ -102,16 +102,8 @@ This will separate *DataLad* management from *MRIQC*'s operation,
which can be an effective way of debugging issues and averting
erroneous conditions.

Command line interface
----------------------
.. argparse::
:ref: mriqc.cli.parser._build_parser
:prog: mriqc
:nodefault:
:nodefaultconst:

Running mriqc on HPC clusters
-----------------------------
Running *MRIQC* on HPC with *Singularity*/*Apptainer*
-----------------------------------------------------
We have profiled cores and memory usages with the *resource profiler*
tool of *Nipype*.

Expand Down Expand Up @@ -140,6 +132,56 @@ on ds030 of OpenfMRI:
the 1-task example, a rule of thumb may be that each task takes around
1GB of memory.

Known issues with HPC
.....................

#. No internet access

The container needs to download the templates from the internet.
If the container does not have internet access, you can download the
templates manually using the ``templateflow`` library:

.. code-block:: python

import templateflow.api
templateflow.api.TF_S3_ROOT = 'http://templateflow.s3.amazonaws.com'
templateflow.api.get('MNI152NLin2009cAsym') # change template if needed

then provide the templates to the container by mounting the ``templateflow`` home directory and setting the ``TEMPLATEFLOW_HOME`` environment variable:

.. code-block:: bash

apptainer run -v /path/to/templateflow:/path/to/templates --env TEMPLATEFLOW_HOME=/path/to/templates ...

#. Socket error:

When running multiple instances of MRIQC on a HPC, you may encounter the following error:

.. code-block:: python

OSError: [Errno 98] Address already in use

To solve this issue, you can try to isolate the container network from the host network by using the ``--network none`` option.

.. code-block:: bash

apptainer run --net --network none ...

This solution might prevent the container from accessing the internet and downloading templates.
In this case, you can download the templates manually and provide access to the downloaded files as explained in the previous section.

.. code-block:: bash

apptainer run --net --network none -v /path/to/templateflow:/path/to/templates --env TEMPLATEFLOW_HOME=/path/to/templates ...

Command line interface
----------------------
.. argparse::
:ref: mriqc.cli.parser._build_parser
:prog: mriqc
:nodefault:
:nodefaultconst:

.. topic:: References

.. [BIDS] `Brain Imaging Data Structure <http://bids.neuroimaging.io/>`_
Expand Down
Loading