Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Forward arbitrary environment variables over SSH #5709

Merged
merged 7 commits into from
Oct 29, 2023
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@ requests_).
- Prasanna Challuri
- David Matthews
- Tim Whitcomb
- (Scott Wales)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉

- Scott Wales
- Tomek Trzeciak
- Thomas Coleman
- Bruno Kinoshita
Expand Down
1 change: 1 addition & 0 deletions changes.d/5709.feat.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Forward arbitrary environment variables over SSH connections
8 changes: 8 additions & 0 deletions cylc/flow/cfgspec/globalcfg.py
Original file line number Diff line number Diff line change
Expand Up @@ -784,6 +784,14 @@ def default_for(

{REPLACES}``[suite servers][run host select]rank``.
''')
Conf('ssh forward environment variables', VDR.V_STRING_LIST, '',
desc='''
A list containing the names of the environment variables to
forward with SSH connections to the workflow host from
the host running 'cylc play'

.. versionchanged:: 8.3.0
''')

with Conf('host self-identification', desc=f'''
How Cylc determines and shares the identity of the workflow host.
Expand Down
6 changes: 5 additions & 1 deletion cylc/flow/remote.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@
from cylc.flow.option_parsers import verbosity_to_opts
from cylc.flow.platforms import get_platform, get_host_from_platform
from cylc.flow.util import format_cmd
from cylc.flow.cfgspec.glbl_cfg import glbl_cfg


def get_proc_ancestors():
Expand Down Expand Up @@ -298,7 +299,10 @@ def construct_ssh_cmd(
'CYLC_CONF_PATH',
'CYLC_COVERAGE',
'CLIENT_COMMS_METH',
'CYLC_ENV_NAME'
'CYLC_ENV_NAME',
*(glbl_cfg().get(['scheduler'])
['run hosts']
['ssh forward environment variables']),
Copy link
Member

@oliver-sanders oliver-sanders Sep 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This configuration will apply to all SSH commands made by Cylc, not just the one made by cylc play and not just SSH'es too or from the scheduler run hosts.

For context, here are some examples of SSH use in Cylc:

  • play (client => scheduler-host): Automatic distribution of workflows onto scheduler hosts.
  • clean (client => remote-platform): Removal of files on remote platforms.
  • job-submission (scheduler-host => remote-platform): Submit jobs to remote platforms.

Suggest moving the configuration into the [platforms] section:

[platforms]
  [[myplatform]]
    ssh forward environment variables = FOO, BAR, PROJECT

It can then be used here like so:

Suggested change
*(glbl_cfg().get(['scheduler'])
['run hosts']
['ssh forward environment variables']),
*platform['ssh forward environment variables'],

Ping @hjoliver from his earlier comment which lead in this direction. In order to configure this in run hosts and have it apply only to run host comms we would need to compare the FQDN of the host name we are contacting to determine whether it is in run hosts in the first place. The [platforms][localhost] section is used for all run-host SSH's where it is used to configure the ssh command, etc for things including cylc play and workflow auto-migration (which this feature would also need to cover). So we might as well configure this in platforms opening this functionality up to other uses right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to have different platforms with different forwarded variables? Any variable used by a specific platform will also need to be sent to the scheduler for it to work properly, I can see things becoming confusing if they are out of sync.

Copy link
Member

@oliver-sanders oliver-sanders Sep 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have any use cases in mind for per-platform configuration. It could potentially make sense, e.g. for your use case if the project codes differ from one platform to another. There might potentially be other use cases for this sort of functionality e.g. configuring things at the Cylc level which you might otherwise have to configure in shell profile files.

The options for implementation are either a per-platform configuration, or a global configuration (as implemented). IMO it would make more sense to colocate this with the other SSH/rsync configurations, but a global config is ok too. I think putting the global configuration in the run hosts section is a bit too misleading as it also configures SSH commands which are neither to or from the run hosts.

Note we don't currently have platform inheritance which makes the per-platform configuration a little clunkier to configure than it strictly needs to be. Inheritance was planned as a more convenient way of sharing configuration between multiple platforms, however, we haven't got around to it yet.

]:
if envvar in os.environ:
command.append(
Expand Down
37 changes: 36 additions & 1 deletion tests/unit/test_remote.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,9 @@
# along with this program. If not, see <http://www.gnu.org/licenses/>.
"""Test the cylc.flow.remote module."""

from cylc.flow.remote import run_cmd, construct_rsync_over_ssh_cmd
from cylc.flow.remote import run_cmd, construct_rsync_over_ssh_cmd, construct_ssh_cmd
from unittest import mock
import cylc.flow


def test_run_cmd_stdin_str():
Expand Down Expand Up @@ -86,3 +88,36 @@ def test_construct_rsync_over_ssh_cmd():
'/foo/',
'miklegard:/bar/',
]


def test_construct_ssh_cmd_forward_env(mock_glbl_cfg):
""" Test for 'ssh forward environment variables'
"""
import os

mock_glbl_cfg(
'cylc.flow.remote.glbl_cfg',
'''
[scheduler]
[[run hosts]]
ssh forward environment variables = FOO, BAR
'''
)

host = 'example.com'
config = {
'ssh command': 'ssh',
'use login shell': None,
'cylc path': None,
}

# Variable isn't set, no change to command
expect = ['ssh', host, 'env', f'CYLC_VERSION={cylc.flow.__version__}', 'cylc', 'play']
cmd = construct_ssh_cmd(['play'], config, host)
assert cmd == expect

# Variable is set, appears in `env` list
with mock.patch.dict(os.environ, {'FOO': 'BAR'}):
expect = ['ssh', host, 'env', f'CYLC_VERSION={cylc.flow.__version__}', 'FOO=BAR', 'cylc', 'play']
cmd = construct_ssh_cmd(['play'], config, host)
assert cmd == expect