-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Docs and tests for environment variable setting in containerized execution #16666
base: dev
Are you sure you want to change the base?
Docs and tests for environment variable setting in containerized execution #16666
Conversation
918fe61
to
496426f
Compare
2ff6871
to
8b5655b
Compare
From the dev channel on matrix:
|
Isn't this a better option ? I worry about changing the defaults and leaking secrets into containers.
IMO there needs to be a layer of indirection anyway when translating these (and other) values from the shared database, I don't think that's necessarily a reason to expose all |
Thanks for the feedback.
At least a working option. And I share your concerns. Let's ask the TPV developers if there is a way... We should work on the docs for env and document how to set environment variables for containers. |
I had not realised, until @bernt-matthias pointed it out, that containers would not receive env vars available to non-containerized jobs. e.g.: https://github.com/galaxyproject/tpv-shared-database/blob/fd48e2d8970e672bd9cb56cbf5965af45c45f2b9/tools.yml#L2390. IMO, part of the advantage of having the shared-database is to make it easier to bridge the divide between containerised and non-containerised tools. That way, if the settings are tweaked for a non-containerised tool, they would be tweaked for a containerised tool as well. This is specifically a problem for environments like AnVIL, because most of the tweaking happens on non-AnVIL/slurm based environments, and we do not have enough bandwidth to tweak it again for containerized environments. I see the problem with security, and it looks like the essence of the problem here is that there are two types of envs - system and tool (more if we consider pre-job, post-job and tool - but that might be overkill). Perhaps the solution then is to explicitly acknowledge that divide? In TPV, we could have:
@mvdbeek Can you elaborate on this? |
Sure, you want to be able to modify and filter environment variables in TPV. Take secrets for instance, those should not be passed to containers by default. |
There is one thing that I do not get yet: For non-containerized destinations, there is currently no way to make this distinction. So if there are some variables for which we do not want the tool to see them, then there should be a way to do this for all kinds of destinations. Currently, there is only a way to define variables that should only be available in containers (via |
I think you're taking the wrong angle here. Right now it is possible to set variables that are not seen by containers. We cannot change this for anyone relying on this. It doesn't matter that there is no distinction for non-containerized tools. Unless we collectively and consciously decide that this doesn't pose a risk ... |
OK. Then along with the idea of @nuwang
So on a conda destination, we would ignore
And on a containerized destination, we would
? For TPV 's shared database probably only tool variables are of interest since system variables likely depend on the respective Galaxy installations. |
That looks good, but I'd maybe call it |
OK. Cool. Then my suggestion for this PR would be:
|
e1a02ff
to
b912522
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we have to document this, but I don't think this is a good state of affairs. I think env, file and execute should be available in the containers. It's just the random stuff that might be floating around like PYTHONPATH and secrets that shouldn't automatically bleed into the container.
@@ -12,6 +12,8 @@ echo \$(pwd) > '$pwd' && | |||
echo "\$HOME" > '$home' && | |||
echo "\$TMP" > '$tmp' && | |||
echo "\$SOME_ENV_VAR" > '$some_env_var' && | |||
echo "\${JOBCONF_ENV_VAR:-UNSET}" > '$jobconf_env_var' && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you remove the JOBCONF_ENV_VAR
tests ? We should fix this, not cement it with tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we leave this: It just tests the documented behavior (as per this PR :) ) and you gave good reasons for not changing it.
If at one time we allow certain env variables to be passed to the container, e.g. via ToolInfo
then we can easily add another variable for this behavior.
Then we should not merge it :)
Like what I have implemented in the commits that I reverted later on? I guess we should adapt the docs and the tests to what it should be and then fix the implementation. |
Then we're back to passing all secrets an admin might have set to the container. I think a reasonable middle ground is to implement https://github.com/galaxyproject/galaxy/blob/release_24.0/lib/galaxy/tool_util/deps/dependencies.py#L52-L54 ? Those can be safely passed into the container. |
by passing the complete JobDestination to the container. Before the container only had the information on the params of the job destination, i.e. `env` was not accessible. has the additional advantage (IMO) to replace a few of the ominous `destination_info: Dict[str, Any]` objects where it was hard to find out what this actually is.
they should be ignored in the context of this PR
like path and exec and ignore them
This reverts commit b21640d.
…ion" This reverts commit 8b5655b.
for container env variable setting
a8aa4a5
to
7c68315
Compare
not be available in the container, but only to the pre-and-post-tool-execution job environment. | ||
Instead, for containerized destinations variables that should only be available in the container | ||
can be set with ``<param id="docker_env_VARIABLE">VALUE</param>`` and | ||
``<param id="singularity_env_VARIABLE">VALUE</param>``, respectively. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe if you mention this is a known bug - it would help alleviate @mvdbeek's concerns below somewhat?
Seems that this never worked (for singularity it might work if cleanenv is disabled).
The environment variables that are passed explicitly to the container are constructed in the two for loops starting here and consider
<param id="docker_env_..."
>...(resp.
singularity_env_...`), which also needs docs.The fix is done by passing the complete
JobDestination
to the container. Before the container only had the information on the params of the job destination, i.e.env
was not accessible. This has the additional advantage (IMO) of replacing a few of the ominousdestination_info: Dict[str, Any]
objects where it was hard to find out what this actually is.A workaround is to set environment variables via
<param
docker_env_...>...</param>
(resp.singularity_env_...
) in the job configuration.The added test may fix Add <environment_variables> section to job_properties test tool #11348No it does not.TODO:
Should we backport this? I would say "yes" and suggest 23.0.
How to test the changes?
(Select all options that apply)
License