-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Isolate singularity containers more thoroughly for better reproducibility. #18628
base: dev
Are you sure you want to change the base?
Isolate singularity containers more thoroughly for better reproducibility. #18628
Conversation
Previously the parameter was unused
@@ -604,6 +604,12 @@ | |||
</destination> | |||
<destination id="singularity_local" runner="local"> | |||
<param id="singularity_enabled">true</param> | |||
<!--Galaxy requests an isolated /tmp directory from singularity, which means |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. No mount /tmp means /tmp is not mounted. There is still a /tmp
dir in the container itself though, it is made read-only .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the discussion you shared
instead (by default) provide a writable /tmp via $_GALAXY_JOB_TMP_DIR/:/tmp:rw note that this dir is still also available as $_GALAXY_JOB_TMP_DIR:rw
I found that this /tmp mount is only provided when tmp_dir
is set to true. Is that intentional behavior?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found that this /tmp mount is only provided when tmp_dir is set to true. Is that intentional behavior?
Guess so... and if one of TMP
, TEMP
, TMPDIR
is /tmp
, or?
Regarding FastQC: we should set --dir
in the tool wrapper: https://github.com/s-andrews/FastQC/blob/1faeea0412093224d7f6a07f777fad60a5650795/fastqc#L480 .. the culprit here seems to be JAVA (just another reason to dislike it).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it will be mounted as read-only. This can cause some problems with tools that | ||
do not use the TMP, TEMP, TMPDIR variable family properly. Setting |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tools that do not use the TMP, TEMP, TMPDIR variable family properly
What does this mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Proper applications create a temporary directory/file in the directory designated by TMPDIR. Shitty applications just hardcode /tmp
. Which causes problems when /tmp
is mounted read-only if --no-mount tmp
is chosen and no other directory is mounted to the /tmp
path.
I see now that the mount_home setting is wrong. On newer tool_profiles galaxy will set the |
This should be determined by the tool profile
<!-- Singularity by default inherits the PID namespace, this can give | ||
issues with multiprocessing, hence galaxy passes the `pid` argument | ||
by default to isolate the PID namespace. You can turn this of by setting | ||
singularity_pid to `false`. | ||
--> | ||
<!-- <param id="singularity_pid">true</param> --> | ||
<!-- Singularity by default inherits the IPC namespace, this can give | ||
issues with multiprocessing, hence galaxy passes the `ipc` argument | ||
by default to isolate the PID namespace. You can turn this of by setting | ||
singularity_ipc to `false`. | ||
--> | ||
<!-- <param id="singularity_ipc">true</param> --> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to expose these 2 arguments as options, or should we just always enabled them as you are proposing to do for --contain
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
--contain
is not an invasive param. You can undo its effects by simply manually adding the volumes.
This is not the case with --ipc
and --pid
(as far as I can see). So having an option to disable the flags might be very useful if they cause problems in some tools.
We have never experienced problems with --ipc
and --pid
at our institute, in fact, we experienced problems when they were not used. However I still take the conservative approach here to allow admins to at least turn it off if it gives them problems.
* fastqc: set tmpdir xref galaxyproject/galaxy#18628 * Update tools/fastqc/rgFastQC.xml Co-authored-by: Marius van den Beek <[email protected]> --------- Co-authored-by: Marius van den Beek <[email protected]>
Co-authored-by: Nicola Soranzo <[email protected]>
* fastqc: set tmpdir xref galaxyproject/galaxy#18628 * Update tools/fastqc/rgFastQC.xml Co-authored-by: Marius van den Beek <[email protected]> --------- Co-authored-by: Marius van den Beek <[email protected]>
This fixes #18620 and several other issues.
singularity exec
is now always passed the--contain
flag. This ensures only volumes explicitly requested by galaxy are mounted. Singularity mounts $HOME and $PWD by default, but that is redundant since galaxy also mounts the relevant directories.singularity exec
is now passed the--ipc
and--pid
flags that isolate the IPC and PID namespace respectively. This does not matter much for reproducibility but may fix issues that can occur in tools using the Python multiprocessing module, such as cutadapt. This behavior can be turned of by settingsingularity_ipc
andsingularity_pid
to false.$HOME is no longer mounted by default, see Mounting home in singularity containers is bad for security, reproducibility and creates race conditions #18620. This can be re-enabled by settingsingularity_mount_home
to true./tmp
directory, while simultaneously mounting a specific job temporary directory and setting the appropriate TMP and TMPDIR environment variables. Unfortunately some problematic tools (I am looking at you FastQC!) always write to /tmp and crash as a result. Setting the jobtmp_dir
variable to true alleviates this problem. This is now documented in the advanced job xml conf sample.EDIT
/EDIT
How to test the changes?
(Select all options that apply)
License