From 2f7710c22e6209c5767a03b7674e6451d955842f Mon Sep 17 00:00:00 2001 From: Bo Peng Date: Sat, 17 Feb 2024 16:11:39 -0600 Subject: [PATCH] Update doc to remote remote path mapping features vatlab/sos#1535 --- doc/user_guide/remote_target.html | 15529 ----------------------- doc/user_guide/task_files.html | 16139 ------------------------ src/user_guide/cli.ipynb | 10 +- src/user_guide/remote_execution.ipynb | 15 +- src/user_guide/remote_target.ipynb | 202 - src/user_guide/sos_remote.ipynb | 14 +- src/user_guide/sos_targets.ipynb | 1 - src/user_guide/targets.ipynb | 145 +- src/user_guide/task_files.ipynb | 759 -- src/user_guide/task_statement.ipynb | 63 +- 10 files changed, 58 insertions(+), 32819 deletions(-) delete mode 100644 doc/user_guide/remote_target.html delete mode 100644 doc/user_guide/task_files.html delete mode 100644 src/user_guide/remote_target.ipynb delete mode 100644 src/user_guide/task_files.ipynb diff --git a/doc/user_guide/remote_target.html b/doc/user_guide/remote_target.html deleted file mode 100644 index ab2106fc..00000000 --- a/doc/user_guide/remote_target.html +++ /dev/null @@ -1,15529 +0,0 @@ - - - - - - - - - - -remote_target - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - -
- - - - -
- -
- -
-

Edit this page on our live server and create a PR by running command !create-pr in the console panel

-
- - - - - - - - - - - - - -
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- - -
-
- - -
-
-
-

Working with remote files

-
-
-
- -
-
-
-
    -
  • Difficulty level: intermediate
  • -
  • Time need to lean: 10 minutes or less
  • -
  • Key points:
      -
    • Input targets marked by remote will be considered as remote files and will not be copied to remote host.
    • -
    • Output targets marked by remote will be considered as remote files and will not be copied to local host
    • -
    -
  • -
- -
-
-
- -
-
-
-

Remote Targets

-
-
-
- -
-
-
-

The task execution model automatically synchronize input and output files between local and remote hosts. This can be convenient but

-
    -
  1. If the input files are large and reside on remote host already, there is no need to make the input files available on local host for them to be processed.

    -
  2. -
  3. If the output files are large and do not need to be processed locally, there is no need to copy output files to local host.

    -
  4. -
-

To solve these problems, you can use remote targets to specify that the targets are on remote host, and do not need to be synchronized. For example, you could use the following step to process a large input file and only synchronize small output files to local desktop for further analysis:

- -
[10]
-input: remote('/path/to/large/input/file')
-output: remote('large_output'), 'summary.stat'
-task:
-sh:
-    script to generate large_output and summary.stat
-    from large input files.
- -
-
-
- -
-
-
-

The remote function accept any one or more SoS targets (e.g. remote('input.txt'), remote('input1.txt', 'input2.txt'), remote(fastq_files), or remote(R_Library('ggplot2')).

- -
-
-
- -
- -
-
-
In [1]:
-
- -
-
- -
- -
-
- - -
- -
- - -
-
INFO: No matching tasks are identified. Use option -a to check all tasks.
-INFO: 68398bb67cbef06a started
-
-
-
- -
-
- -
-
- - - -
-
-
-

The task is executed successfully on remote host bcb but the result file result.png, marked as remote('result.png') is not synchronized to localhost after the completion of the task.

- -
-
-
- -
- -
-
-
In [2]:
-
- -
-
- -
- -
-
- - -
- -
- - -
-
ls: result.png: No such file or directory
-
-
-
- -
-
- -
-
- - - -
-
-
-

Option host to remote targets

-
-
-
- -
-
-
-

SoS needs to know where the remote targets reside to check their existence and signatures. The remote() target has an option host, which defaults to the host that you specified with option -q.

-

In case that you specify the host on which the task will be executed with task option queue, you will need to specify the option also to the remote() targets.

- -
-
-
-
-
- - -
-
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - diff --git a/doc/user_guide/task_files.html b/doc/user_guide/task_files.html deleted file mode 100644 index 70393bb0..00000000 --- a/doc/user_guide/task_files.html +++ /dev/null @@ -1,16139 +0,0 @@ - - - - - - - - - - -task_files - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - -
- - - - -
- -
- -
-

Edit this page on our live server and create a PR by running command !create-pr in the console panel

-
- - - - - - - - - - - - - -
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- - -
-
- - -
-
-
-

Specifying and synchronization of remote files

-
-
-
- -
-
-
-
    -
  • Difficulty level: intermediate
  • -
  • Time need to lean: 20 minutes or less
      -
    • Paths that are relative to the current working directory are portable across hosts.
    • -
    • Use named paths (#name) to specify absolute paths that are different across local and remote hosts.
    • -
    • Options to_host and from_host specify files and directories send before task execution and retrieve after task execution, respectively.
    • -
    -
  • -
- -
-
-
- -
-
-
-

Path definitions and named paths

-
-
-
- -
-
-
-

When local and remote hosts do not share file systems (or share only some file systems), things can get a bit complicated because SoS will need to decide what paths to use on the remote host. The most important thing to remember here is that paths across local and remote hosts are linked by named paths defined in the SoS host definition file.

-

For example, a host definition file (usually ~/.sos/hosts.yml) could have the following paths definitions (incomplete)

-
localhost: office
-hosts:
-    office:
-        paths:
-            home:  /Users/{user_name}
-            projects: /Users/{user_name}/projects
-            scratch: /Users/{user_name}/scratch
-    cluster:
-        paths:
-            home:  /home/{user_name}
-            projects: /home/projects/{user_name}
-            scratch: /mount/scratch
-
-

so that paths under home, projects, or scratch could be linked across office and cluster.

-

Similar to ~/result.txt that indicates result.txt under the user's home directory, which can be different across different hosts, named path, namely paths that starts with #name, such as #projects/RNASeq are paths that are context specific. If you specify _output='#projects/RNASeq/genes.txt, the paths will refer to different files on different hosts with different definitions for #projects.

- -
-
-
- -
-
-
-

Use of relative path

-
-
-
- -
-
-
-

Let us execute an task on a remote host defined in a docker image. The task does nothing but reporting the value of _output and its current working directory. The output file result.txt is sent back to the local host after the completion of the task.

-

As expected, the value of _output is a relative path result.txt. The working directory is vatlab/sos-docs/src/user_guide under /root, which corresponds to the locally working directory.

- -
-
-
- -
- -
-
-
In [1]:
-
- -
-
- -
- -
-
- - -
- -
- - - -
INFO: Running default: -
- -
- -
- -
- - - -
- - - - - - - - -
- - -
d87b0aa3308b3ec9
-
-
7c0789b29bfba84edefaultuser_guide
-
-
-
-
missing
-
-
- -
- -
- -
- - - -
INFO: d87b0aa3308b3ec9 received 'result.txt' from docker -
- -
- -
- -
- - - -
INFO: default output: result.txt -
- -
- -
- -
- - - -
INFO: Workflow default (ID=7c0789b29bfba84e) is executed successfully with 1 completed step and 1 completed task. -
- -
- -
- -
- - - -
> result.txt (48 B):
- -
- -
- -
- - - -
2 lines
- -
- -
- -
- - -
-
result.txt
-/root/vatlab/sos-docs/src/user_guide
-
-
- -
-
- -
-
- - - -
-
-
-

Absolute paths and named paths

-
-
-
- -
-
-
-

If you would like to specify an absolute path, you can use either ~ as home directory, or any of the named paths.

-

In the following workflow, the output is specified as #home/result.txt (which is the same as ~/result.txt. It is /root/result.txt on the remote host, and the current working directory remains the same.

- -
-
-
- -
- -
-
-
In [2]:
-
- -
-
- -
- -
-
- - -
- -
- - - -
INFO: Running default: -
- -
- -
- -
- - - -
- - - - - - - - -
- - -
652942b23342ae82
-
-
87a9056943dc9a91defaultuser_guide
-
-
-
-
missing
-
-
- -
- -
- -
- - - -
INFO: 652942b23342ae82 received '/Users/bpeng/result.txt' from docker -
- -
- -
- -
- - - -
INFO: default output: /Users/bpeng/result.txt -
- -
- -
- -
- - - -
INFO: Workflow default (ID=87a9056943dc9a91) is executed successfully with 1 completed step and 1 completed task. -
- -
- -
- -
- - - -
> /Users/bpeng/result.txt (54 B):
- -
- -
- -
- - - -
2 lines
- -
- -
- -
- - -
-
/root/result.txt
-/root/vatlab/sos-docs/src/user_guide
-
-
- -
-
- -
-
- - - -
-
-
-

Working directory of tasks (Option workdir)

The workdir of task is default to the current working directory, or, in the case of remote execution, the remote counterpart of the current working directory. Option workdir controls the working directory of the task.

-

For example, the following example adds workdir='#home to the task. The current working directory of the shell script is changed to /root, and the _output remains at #home/result.txt.

- -
-
-
- -
- -
-
-
In [3]:
-
- -
-
- -
- -
-
- - -
- -
- - - -
INFO: Running default: -
- -
- -
- -
- - - -
- - - - - - - - -
- - -
dbc63a4bf9416b58
-
-
defaultf27dc1e18f432a10user_guide
-
-
-
-
missing
-
-
- -
- -
- -
- - - -
INFO: dbc63a4bf9416b58 received '/Users/bpeng/result.txt' from docker -
- -
- -
- -
- - - -
INFO: default output: /Users/bpeng/result.txt -
- -
- -
- -
- - - -
INFO: Workflow default (ID=f27dc1e18f432a10) is executed successfully with 1 completed step and 1 completed task. -
- -
- -
- -
- - - -
> /Users/bpeng/result.txt (23 B):
- -
- -
- -
- - - -
2 lines
- -
- -
- -
- - -
-
/root/result.txt
-/root
-
-
- -
-
- -
-
- - - -
-
-
-

However, change of workdir might result in the misplace of the output files. For example, if we remove #home from _output and specify workdir, the _output will be written to specified workdir but SoS still assumes that the _output is under the current project directory and will fail to retrieve the file.

- -
-
-
- -
- -
-
-
In [5]:
-
- -
-
- -
- -
-
- - -
- -
- - - -
INFO: Running default: -
- -
- -
- -
- - - -
- - - - - - - - -
- - -
260346ee4c42cce3
-
-
672fffefa5d36516defaultuser_guide
-
-
-
-
missing
-
-
- -
- -
- -
- - -
-
ERROR: [default]: Failed to copy /root/vatlab/sos-docs/src/user_guide/result_error.txt from docker using command "rsync -a --no-g -e 'ssh -o 'ControlMaster=auto' -o 'ControlPath=/Users/bpeng/.ssh/controlmasters/%r@%h:%p' -o 'ControlPersist=10m' -p 32798' root@localhost:/root/vatlab/sos-docs/src/user_guide/result_error.txt "/Users/bpeng/vatlab/sos-docs/src/user_guide"": command return 23
-
-
-
- -
- -
- - -
-
-Workflow exited with code 1
-
-
- -
-
- -
-
- - - -
-
-
-

Sending additional files before task execution (Option to_host)

-
-
-
- -
-
-
-

Option to_host specifies additional files or directories that would be synchronized to the remote host before tasks are executed. It can be specified as

-
    -
  • A single file or directory (with respect to local file system), or
  • -
  • A list of files or directories, or
  • -
-

The files or directories will be translated using the host-specific path maps. Note that if a symbolic link is specified in to_host, both the symbolic link and the path it refers to would be synchronized to the remote host.

-

Just to demontrate how to use this option, let us copy all notebooks in this directory to a remote host and count the number of them.

- -
-
-
- -
- -
-
-
In [3]:
-
- -
-
- -
- -
-
- - -
- -
- - - -
- - - - - - - - -
- - -
9e7b75df6a5d3767
-
-
5b7627b1ac52aa8fscratch_0user_guide
-
-
-
-
missing
-
-
- -
- -
- -
- - - -
INFO: 9e7b75df6a5d3767 sent 'task_files.ipynb', ... (5 items) to bcb
- -
- -
- -
- - - -
INFO: 9e7b75df6a5d3767 received 'wc.txt' from bcb
- -
- -
- -
- - - -
> wc.txt (156 B):
- -
- -
- -
- - - -
6 lines (5 displayed, see --limit)
- -
- -
- -
- - -
-
     363 task_files.ipynb
-     386 task_management.ipynb
-     817 task_statement.ipynb
-     223 task_tags.ipynb
-     390 task_template.ipynb
-
-
- -
-
- -
-
- - - -
-
-
-

Retrieving additional files after task completion (Option from_host)

-
-
-
- -
-
-
-

Option from_host specifies additional files or directories that would be synchronized from the remote host after tasks are executed. It can be specified as

-
    -
  • A single file or directory (with respect to local file system), or
  • -
  • A list of files or directories
  • -
-

The files or directories will be translated using the host-specific path maps to determine what remote files to retrieve.

- -
-
-
-
-
- - -
-
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - diff --git a/src/user_guide/cli.ipynb b/src/user_guide/cli.ipynb index 324b29c2..463ccb3d 100644 --- a/src/user_guide/cli.ipynb +++ b/src/user_guide/cli.ipynb @@ -51,7 +51,7 @@ }, { "cell_type": "code", - "execution_count": 1, + "execution_count": 2, "metadata": { "kernel": "SoS", "tags": [] @@ -62,25 +62,25 @@ "output_type": "stream", "text": [ "usage: sos [-h] [--version]\n", - " {install,run,dryrun,status,push,pull,execute,kill,purge,config,convert,remove}\n", + " {install,run,dryrun,status,execute,kill,purge,config,convert,remove}\n", " ...\n", "\n", "A workflow system for the execution of commands and scripts in different\n", "languages.\n", "\n", - "optional arguments:\n", + "options:\n", " -h, --help show this help message and exit\n", " --version show program's version number and exit\n", "\n", "subcommands:\n", - " {install,run,dryrun,status,push,pull,execute,kill,purge,config,convert,remove}\n", + " {install,run,dryrun,status,execute,kill,purge,config,convert,remove}\n", " run Execute default or specified workflow in script\n", " dryrun Execute workflow in dryrun mode\n", " status Check the status of specified tasks\n", " remote Listing and testing remote configurations\n", " execute Execute a packages task\n", " kill Stop the execution of running task\n", - " purge Remove local or remote tasks\n", + " purge Remove local or remote tasks or workflows\n", " config Read and write sos configuration files\n", " convert Convert between .sos, .ipynb and other formats\n", " remove Remove specified files and/or their signatures\n", diff --git a/src/user_guide/remote_execution.ipynb b/src/user_guide/remote_execution.ipynb index dec77163..396ff5b0 100644 --- a/src/user_guide/remote_execution.ipynb +++ b/src/user_guide/remote_execution.ipynb @@ -35,7 +35,8 @@ "* **Time need to lean**: 30 minutes or less\n", "* **Key points**:\n", " * Option `-r host` executes workflow on `host`, optionally through a `workflow_template` specified through host configuration.\n", - " * The remote host could be a regular server, or a cluster system, in which case the workflow could be executed using multiple computing nodes." + " * The remote host could be a regular server, or a cluster system, in which case the workflow could be executed using multiple computing nodes.\n", + " * Local and remote hosts should share directories (e.g. via NFS mount) for input and output files." ] }, { @@ -489,7 +490,9 @@ }, { "cell_type": "markdown", - "metadata": {}, + "metadata": { + "kernel": "SoS" + }, "source": [ "Note that local configrations, including the ones specified with option `-c` will be transferred and used on the remote host, with only the `localhost` definition switched to be the remote host. It is therefore safe to use local configurations with option `-r`." ] @@ -513,7 +516,9 @@ }, { "cell_type": "markdown", - "metadata": {}, + "metadata": { + "kernel": "SoS" + }, "source": [ "If the remote host specified by option `-r host` is a cluster system, the workflow will be submitted to the cluster as a regular cluster job. The `workflow_template` of `host` will be used, using options specified from command line (`-r host KEY=VALUE KEY=VALUE`.\n", "\n", @@ -525,7 +530,9 @@ { "cell_type": "code", "execution_count": 2, - "metadata": {}, + "metadata": { + "kernel": "SoS" + }, "outputs": [], "source": [ "## Executing entire workflows on remote cluster systems with job submission" diff --git a/src/user_guide/remote_target.ipynb b/src/user_guide/remote_target.ipynb deleted file mode 100644 index b0ee935f..00000000 --- a/src/user_guide/remote_target.ipynb +++ /dev/null @@ -1,202 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": { - "kernel": "SoS", - "tags": [] - }, - "source": [ - "# Working with remote files" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "kernel": "SoS", - "tags": [] - }, - "source": [ - "* **Difficulty level**: intermediate\n", - "* **Time need to lean**: 10 minutes or less\n", - "* **Key points**:\n", - " * Input targets marked by `remote` will be considered as remote files and will not be copied to remote host.\n", - " * Output targets marked by `remote` will be considered as remote files and will not be copied to local host\n", - " " - ] - }, - { - "cell_type": "markdown", - "metadata": { - "kernel": "SoS", - "tags": [] - }, - "source": [ - "## Remote Targets" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "kernel": "SoS", - "tags": [] - }, - "source": [ - "The task execution model automatically synchronize input and output files between local and remote hosts. This can be convenient but \n", - "\n", - "1. If the input files are large and reside on remote host already, there is no need to make the input files available on local host for them to be processed.\n", - "\n", - "2. If the output files are large and do not need to be processed locally, there is no need to copy output files to local host.\n", - "\n", - "To solve these problems, you can use `remote` targets to specify that the targets are on remote host, and do not need to be synchronized. For example, you could use the following step to process a large input file and only synchronize small output files to local desktop for further analysis:\n", - "\n", - "```\n", - "[10]\n", - "input: remote('/path/to/large/input/file')\n", - "output: remote('large_output'), 'summary.stat'\n", - "task:\n", - "sh:\n", - " script to generate large_output and summary.stat\n", - " from large input files.\n", - "```" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "kernel": "SoS", - "tags": [] - }, - "source": [ - "The `remote` function accept any one or more SoS targets (e.g. `remote('input.txt')`, `remote('input1.txt', 'input2.txt')`, `remote(fastq_files)`, or `remote(R_Library('ggplot2'))`. " - ] - }, - { - "cell_type": "code", - "execution_count": 1, - "metadata": { - "kernel": "SoS", - "tags": [] - }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "INFO: No matching tasks are identified. Use option -a to check all tasks.\n", - "INFO: 68398bb67cbef06a \u001b[32mstarted\u001b[0m\n" - ] - } - ], - "source": [ - "%run -q bcb\n", - "\n", - "output: remote('result.png')\n", - "task: walltime='1h', mem='2G', nodes=1, cores=1\n", - "\n", - "R:\n", - " set.seed(1)\n", - " x <- 1:100\n", - " y <- -0.03*x + rnorm(50)\n", - " png(\"result.png\", height=400, width=600)\n", - " plot(x, y, pch=19, col=rgb(0.5, 0.5, 0.5, 0.5), cex=1.5)\n", - " abline(lm(y ~ x))\n", - " dev.off()" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "kernel": "SoS", - "tags": [] - }, - "source": [ - "The task is executed successfully on remote host `bcb` but the result file `result.png`, marked as `remote('result.png')` is not synchronized to localhost after the completion of the task. " - ] - }, - { - "cell_type": "code", - "execution_count": 2, - "metadata": { - "kernel": "SoS", - "tags": [] - }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "ls: result.png: No such file or directory\n" - ] - } - ], - "source": [ - "!ls result.png" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "kernel": "SoS" - }, - "source": [ - "## Option `host` to `remote` targets" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "kernel": "SoS" - }, - "source": [ - "SoS needs to know where the remote targets reside to check their existence and signatures. The `remote()` target has an option `host`, which defaults to the `host` that you specified with option `-q`.\n", - "\n", - "In case that you specify the host on which the task will be executed with task option `queue`, you will need to specify the option also to the `remote()` targets." - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "SoS", - "language": "sos", - "name": "sos" - }, - "language_info": { - "codemirror_mode": "sos", - "file_extension": ".sos", - "mimetype": "text/x-sos", - "name": "sos", - "nbconvert_exporter": "sos_notebook.converter.SoS_Exporter", - "pygments_lexer": "sos" - }, - "sos": { - "kernels": [ - [ - "Bash", - "bash", - "Bash", - "#E6EEFF" - ], - [ - "R", - "ir", - "R", - "#DCDCDA" - ], - [ - "SoS", - "sos", - "", - "" - ] - ], - "panel": { - "displayed": true, - "height": 0 - }, - "version": "0.22.4" - } - }, - "nbformat": 4, - "nbformat_minor": 4 -} diff --git a/src/user_guide/sos_remote.ipynb b/src/user_guide/sos_remote.ipynb index 307ad4df..858ef1dc 100644 --- a/src/user_guide/sos_remote.ipynb +++ b/src/user_guide/sos_remote.ipynb @@ -28,7 +28,7 @@ }, { "cell_type": "code", - "execution_count": 1, + "execution_count": 2, "metadata": { "kernel": "SoS", "tags": [] @@ -38,15 +38,14 @@ "name": "stdout", "output_type": "stream", "text": [ - "usage: sos remote [-h] [-c CONFIG] [-p PASSWORD] [--files [FILES [FILES ...]]]\n", + "usage: sos remote [-h] [-c CONFIG] [-p PASSWORD] [--files [FILES ...]]\n", " [--cmd ...] [-v {0,1,2,3,4}]\n", - " {list,status,setup,test,login,push,pull,run}\n", - " [hosts [hosts ...]]\n", + " {list,status,setup,test,login,run} [hosts ...]\n", "\n", "Listing and testing remote configurations\n", "\n", "positional arguments:\n", - " {list,status,setup,test,login,push,pull,run}\n", + " {list,status,setup,test,login,run}\n", " List (list), check status of tasks (status), setup\n", " public-key authentication (setup), test configuration\n", " (test), login (login), push files to one or more\n", @@ -59,7 +58,7 @@ " acceptable even if it is defined in configuration\n", " file.\n", "\n", - "optional arguments:\n", + "options:\n", " -h, --help show this help message and exit\n", " -c CONFIG, --config CONFIG\n", " A configuration file with host definitions, in case\n", @@ -72,8 +71,7 @@ " password will be used for all specified hosts so you\n", " will need to use separate setup commands for hosts\n", " with different passwords.\n", - " --files [FILES [FILES ...]]\n", - " files or directories to be push or pulled for action\n", + " --files [FILES ...] files or directories to be push or pulled for action\n", " \"push\" or \"pull\"\n", " --cmd ... commands to be executed by action \"run\" or tested by\n", " action \"test\". This option takes all remaining options\n", diff --git a/src/user_guide/sos_targets.ipynb b/src/user_guide/sos_targets.ipynb index 1fac1b50..2de19d26 100644 --- a/src/user_guide/sos_targets.ipynb +++ b/src/user_guide/sos_targets.ipynb @@ -560,7 +560,6 @@ "| `q` |quote | `quoted()` | `file 1.txt` | `'file 1.txt'`|\n", "| `r` | repr | `repr()` | `file.txt` | `'file.txt'` |\n", "| `s` | str | `str()` | `file.txt` | `file.txt` |\n", - "| `R` | resolve remote and other targets | `.resolve()`| `remote('a.txt')` | `a.txt`|\n", "| `U` | undo expanduser | `replace(expanduser('~'), '~')` | `/home/user/test.sos` | `~/test.sos` |\n", "| `x` | file extension | `splitext()[1]` | `~/SoS/test.sos` | `.sos` |\n", "| `,` | join with comma | `','.join()` | `['a.txt', 'b.txt']` | `a.txt,b.txt`|\n" diff --git a/src/user_guide/targets.ipynb b/src/user_guide/targets.ipynb index 79323492..f1592ee7 100644 --- a/src/user_guide/targets.ipynb +++ b/src/user_guide/targets.ipynb @@ -55,7 +55,6 @@ "\n", "* is derived from Python [pathlib.Path](https://docs.python.org/3/library/pathlib.html)\n", "* automatically expands user from path starting with `~`\n", - "* automatically expands named path from `CONFIG['hosts'][current_host]['paths'][name]`\n", "* allows you to extend `path` with a `+` operation\n", "* has a special `zap` operation to replace (large) files with their signatures\n", "* accepts a list of format options to easily format path in different formats" @@ -689,7 +688,6 @@ "| `q` |quote | `quoted()` | `file 1.txt` | `'file 1.txt'`|\n", "| `r` | repr | `repr()` | `file.txt` | `'file.txt'` |\n", "| `s` | str | `str()` | `file.txt` | `file.txt` |\n", - "| `R` | resolve remote and other targets | `.resolve()`| `remote('a.txt')` | `a.txt`|\n", "| `U` | undo expanduser | `replace(expanduser('~'), '~')` | `/home/user/test.sos` | `~/test.sos` |\n", "| `x` | file extension | `splitext()[1]` | `~/SoS/test.sos` | `.sos` |\n", "| `,` | join with comma | `','.join()` | `['a.txt', 'b.txt']` | `a.txt,b.txt`|\n" @@ -862,74 +860,6 @@ "The last example is pretty interesting because it applies three converters and gets the name of grand-parent directory using an equivalence of `basename(dirname(dirname(file)))`." ] }, - { - "cell_type": "markdown", - "metadata": { - "kernel": "SoS", - "tags": [] - }, - "source": [ - "Note that, for completeness, we list the `R` formatter here although this formatter is used to resolve special targets, for example a `remote` target `remote(target)`, to a regular `BaseTarget`. For example," - ] - }, - { - "cell_type": "code", - "execution_count": 22, - "metadata": { - "kernel": "SoS", - "tags": [] - }, - "outputs": [], - "source": [ - "a = sos_targets(remote('file.txt'), sos_variable('some'))" - ] - }, - { - "cell_type": "code", - "execution_count": 23, - "metadata": { - "kernel": "SoS", - "tags": [] - }, - "outputs": [ - { - "data": { - "text/plain": [ - "'remote(\"file.txt\") sos_variable(\"some\")'" - ] - }, - "execution_count": 23, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "f\"{a}\"" - ] - }, - { - "cell_type": "code", - "execution_count": 24, - "metadata": { - "kernel": "SoS", - "tags": [] - }, - "outputs": [ - { - "data": { - "text/plain": [ - "'file.txt,some'" - ] - }, - "execution_count": 24, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "f\"{a:R,}\"" - ] - }, { "cell_type": "markdown", "metadata": { @@ -1035,79 +965,6 @@ "_output.touch()" ] }, - { - "cell_type": "markdown", - "metadata": { - "kernel": "SoS" - }, - "source": [ - "### Named paths (experimental)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "kernel": "SoS" - }, - "source": [ - "SoS uses [configuration files](config_files.html) to [define local and external hosts](host_setup.html). One of the host properties are `paths`, which defines paths on a host that can be mapped to other hosts to [facilitate file synchronization](task_files.html). These `paths` are host-specific and can be used to specify paths of a `file_target`.\n", - "\n", - "
\n", - "

Named paths

\n", - " Named paths are paths that are defined in host configurations and is represented by #name as the first piece of a file_target\n", - "
" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "kernel": "SoS" - }, - "source": [ - "Similar to `~` that represent home directory of the current user and `~user` represents home directory of user `user`, named paths are presented as `#name` as the first piece of the path. For example, with a `localhost` defined with\n", - "\n", - "```yaml\n", - "hosts:\n", - " my_host:\n", - " address: localhost\n", - " paths:\n", - " home: /home/{user_name}\n", - "```\n", - "\n", - "`#home` is expanded to `/home/bpeng1` in the following example." - ] - }, - { - "cell_type": "code", - "execution_count": 1, - "metadata": { - "kernel": "SoS" - }, - "outputs": [ - { - "data": { - "text/plain": [ - "file_target('/Users/bpeng1/sos/sos-docs')" - ] - }, - "execution_count": 1, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "file_target('#home/sos/sos-docs')" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "kernel": "SoS" - }, - "source": [ - "Named paths could make your SoS workflow more portable because it allows you to move a workflow to another machine, and refer to paths such as `#home`, `#project` or `#scratch` directly on that machine, provided that the named paths exists on that machine. This feature is mostly used for the use of absolute paths for remote tasks and is discussed in more details in [path translation and file synchronization](task_files.html)." - ] - }, { "cell_type": "markdown", "metadata": { @@ -1140,6 +997,7 @@ { "cell_type": "markdown", "metadata": { + "collapsed": true, "jupyter": { "outputs_hidden": true }, @@ -1155,6 +1013,7 @@ "execution_count": 28, "metadata": { "kernel": "SoS", + "scrolled": true, "tags": [] }, "outputs": [ diff --git a/src/user_guide/task_files.ipynb b/src/user_guide/task_files.ipynb deleted file mode 100644 index ba487508..00000000 --- a/src/user_guide/task_files.ipynb +++ /dev/null @@ -1,759 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": { - "kernel": "SoS", - "tags": [] - }, - "source": [ - "# Specifying and synchronization of remote files" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "kernel": "SoS", - "tags": [] - }, - "source": [ - "* **Difficulty level**: intermediate\n", - "* **Time need to lean**: 20 minutes or less\n", - " * Paths that are relative to the current working directory are portable across hosts.\n", - " * Use named paths (`#name`) to specify absolute paths that are different across local and remote hosts.\n", - " * Options `to_host` and `from_host` specify files and directories send before task execution and retrieve after task execution, respectively." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "kernel": "SoS" - }, - "source": [ - "## Path definitions and named paths" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "kernel": "SoS" - }, - "source": [ - "When local and remote hosts do not share file systems (or share only some file systems), things can get a bit complicated because SoS will need to decide what paths to use on the remote host. The most important thing to remember here is that **paths across local and remote hosts are linked by named paths defined in the SoS host definition file**.\n", - "\n", - "For example, a host definition file (usually `~/.sos/hosts.yml`) could have the following `paths` definitions (incomplete)\n", - "\n", - "```yaml\n", - "localhost: office\n", - "hosts:\n", - " office:\n", - " paths:\n", - " home: /Users/{user_name}\n", - " projects: /Users/{user_name}/projects\n", - " scratch: /Users/{user_name}/scratch\n", - " cluster:\n", - " paths:\n", - " home: /home/{user_name}\n", - " projects: /home/projects/{user_name}\n", - " scratch: /mount/scratch\n", - "```\n", - "\n", - "so that paths under `home`, `projects`, or `scratch` could be linked across `office` and `cluster`.\n", - "\n", - "Similar to `~/result.txt` that indicates `result.txt` under the user's home directory, which can be different across different hosts, **named path, namely paths that starts with `#name`, such as `#projects/RNASeq` are paths that are context specific**. If you specify `_output='#projects/RNASeq/genes.txt`, the paths will refer to different files on different hosts with different definitions for `#projects`." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "kernel": "SoS" - }, - "source": [ - "## Use of relative path" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "kernel": "SoS" - }, - "source": [ - "Let us execute an task on a remote host defined in a docker image. The task does nothing but reporting the value of `_output` and its current working directory. The output file `result.txt` is sent back to the local host after the completion of the task.\n", - "\n", - "As expected, the value of `_output` is a relative path `result.txt`. The working directory is `vatlab/sos-docs/src/user_guide` under `/root`, which corresponds to the locally working directory." - ] - }, - { - "cell_type": "code", - "execution_count": 1, - "metadata": { - "kernel": "SoS" - }, - "outputs": [ - { - "data": { - "text/html": [ - "
INFO: Running default: \n", - "
" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "data": { - "text/html": [ - "\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - "\n", - "
\n", - " \n", - " \n", - "
d87b0aa3308b3ec9
\n", - "
\n", - "
7c0789b29bfba84edefaultuser_guide
\n", - "
\n", - "
\n", - "
\n", - "
missing
\n", - "
\n" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "data": { - "text/html": [ - "
INFO: d87b0aa3308b3ec9 received 'result.txt' from docker\n", - "
" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "data": { - "text/html": [ - "
INFO: default output: result.txt\n", - "
" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "data": { - "text/html": [ - "
INFO: Workflow default (ID=7c0789b29bfba84e) is executed successfully with 1 completed step and 1 completed task.\n", - "
" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "data": { - "text/html": [ - "
> result.txt (48 B):
" - ], - "text/plain": [ - "\n", - "> result.txt (48 B):" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "data": { - "text/html": [ - "
2 lines
" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "result.txt\n", - "/root/vatlab/sos-docs/src/user_guide" - ] - } - ], - "source": [ - "%preview result.txt\n", - "\n", - "%run -c ~/docker.yml -q docker \n", - "\n", - "output: 'result.txt'\n", - "task:\n", - "\n", - "sh: expand=True\n", - " echo {_output} > {_output}\n", - " echo `pwd` >> {_output}" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "kernel": "SoS" - }, - "source": [ - "## Absolute paths and named paths" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "kernel": "SoS" - }, - "source": [ - "If you would like to specify an absolute path, you can use either `~` as home directory, or any of the named paths.\n", - "\n", - "In the following workflow, the output is specified as `#home/result.txt` (which is the same as `~/result.txt`. It is `/root/result.txt` on the remote host, and the current working directory remains the same." - ] - }, - { - "cell_type": "code", - "execution_count": 2, - "metadata": { - "kernel": "SoS" - }, - "outputs": [ - { - "data": { - "text/html": [ - "
INFO: Running default: \n", - "
" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "data": { - "text/html": [ - "\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - "\n", - "
\n", - " \n", - " \n", - "
652942b23342ae82
\n", - "
\n", - "
87a9056943dc9a91defaultuser_guide
\n", - "
\n", - "
\n", - "
\n", - "
missing
\n", - "
\n" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "data": { - "text/html": [ - "
INFO: 652942b23342ae82 received '/Users/bpeng/result.txt' from docker\n", - "
" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "data": { - "text/html": [ - "
INFO: default output: /Users/bpeng/result.txt\n", - "
" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "data": { - "text/html": [ - "
INFO: Workflow default (ID=87a9056943dc9a91) is executed successfully with 1 completed step and 1 completed task.\n", - "
" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "data": { - "text/html": [ - "
> /Users/bpeng/result.txt (54 B):
" - ], - "text/plain": [ - "\n", - "> /Users/bpeng/result.txt (54 B):" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "data": { - "text/html": [ - "
2 lines
" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "/root/result.txt\n", - "/root/vatlab/sos-docs/src/user_guide" - ] - } - ], - "source": [ - "%preview ~/result.txt\n", - "\n", - "%run -c ~/docker.yml -q docker -s force\n", - "\n", - "output: '#home/result.txt'\n", - "task:\n", - "\n", - "sh: expand=True\n", - " echo {_output} > {_output}\n", - " echo `pwd` >> {_output}" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "kernel": "SoS", - "tags": [] - }, - "source": [ - "## Working directory of tasks (Option `workdir`)\n", - "\n", - "The `workdir` of task is default to the current working directory, or, in the case of remote execution, the remote counterpart of the current working directory. Option `workdir` controls the working directory of the task.\n", - "\n", - "For example, the following example adds `workdir='#home` to the task. The current working directory of the shell script is changed to `/root`, and the `_output` remains at `#home/result.txt`." - ] - }, - { - "cell_type": "code", - "execution_count": 3, - "metadata": { - "kernel": "SoS", - "tags": [] - }, - "outputs": [ - { - "data": { - "text/html": [ - "
INFO: Running default: \n", - "
" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "data": { - "text/html": [ - "\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - "\n", - "
\n", - " \n", - " \n", - "
dbc63a4bf9416b58
\n", - "
\n", - "
defaultf27dc1e18f432a10user_guide
\n", - "
\n", - "
\n", - "
\n", - "
missing
\n", - "
\n" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "data": { - "text/html": [ - "
INFO: dbc63a4bf9416b58 received '/Users/bpeng/result.txt' from docker\n", - "
" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "data": { - "text/html": [ - "
INFO: default output: /Users/bpeng/result.txt\n", - "
" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "data": { - "text/html": [ - "
INFO: Workflow default (ID=f27dc1e18f432a10) is executed successfully with 1 completed step and 1 completed task.\n", - "
" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "data": { - "text/html": [ - "
> /Users/bpeng/result.txt (23 B):
" - ], - "text/plain": [ - "\n", - "> /Users/bpeng/result.txt (23 B):" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "data": { - "text/html": [ - "
2 lines
" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "/root/result.txt\n", - "/root" - ] - } - ], - "source": [ - "%preview ~/result.txt\n", - "\n", - "%run -c ~/docker.yml -q docker -s force\n", - "\n", - "output: '#home/result.txt'\n", - "task:\n", - "\n", - "sh: expand=True, workdir='#home'\n", - " echo {_output} > {_output}\n", - " echo `pwd` >> {_output}" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "kernel": "SoS", - "tags": [] - }, - "source": [ - "However, **change of `workdir` might result in the misplace of the output files**. For example, if we remove `#home` from `_output` and specify `workdir`, the `_output` will be written to specified `workdir` but SoS still assumes that the `_output` is under the current project directory and will fail to retrieve the file." - ] - }, - { - "cell_type": "code", - "execution_count": 5, - "metadata": { - "kernel": "SoS" - }, - "outputs": [ - { - "data": { - "text/html": [ - "
INFO: Running default: \n", - "
" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "data": { - "text/html": [ - "\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - "\n", - "
\n", - " \n", - " \n", - "
260346ee4c42cce3
\n", - "
\n", - "
672fffefa5d36516defaultuser_guide
\n", - "
\n", - "
\n", - "
\n", - "
missing
\n", - "
\n" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\u001b[91mERROR\u001b[0m: \u001b[91m[default]: Failed to copy /root/vatlab/sos-docs/src/user_guide/result_error.txt from docker using command \"rsync -a --no-g -e 'ssh -o 'ControlMaster=auto' -o 'ControlPath=/Users/bpeng/.ssh/controlmasters/%r@%h:%p' -o 'ControlPersist=10m' -p 32798' root@localhost:/root/vatlab/sos-docs/src/user_guide/result_error.txt \"/Users/bpeng/vatlab/sos-docs/src/user_guide\"\": command return 23\u001b[0m\n" - ] - }, - { - "ename": "RuntimeError", - "evalue": "Workflow exited with code 1", - "execution_count": 5, - "output_type": "error", - "status": "error", - "traceback": [ - "\u001b[91mWorkflow exited with code 1\u001b[0m" - ] - } - ], - "source": [ - "%run -c ~/docker.yml -q docker -s force\n", - "\n", - "output: 'result_error.txt'\n", - "task:\n", - "\n", - "sh: expand=True, workdir='#home'\n", - " echo {_output} > {_output}\n", - " echo `pwd` >> {_output}" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "kernel": "SoS", - "tags": [] - }, - "source": [ - "## Sending additional files before task execution (Option `to_host`)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "kernel": "SoS", - "tags": [] - }, - "source": [ - "Option `to_host` specifies additional files or directories that would be synchronized to the remote host before tasks are executed. It can be specified as\n", - "\n", - "* A single file or directory (with respect to local file system), or\n", - "* A list of files or directories, or\n", - "\n", - "The files or directories will be translated using the host-specific path maps. Note that if a symbolic link is specified in `to_host`, both the symbolic link and the path it refers to would be synchronized to the remote host.\n", - "\n", - "Just to demontrate how to use this option, let us copy all notebooks in this directory to a remote host and count the number of them." - ] - }, - { - "cell_type": "code", - "execution_count": 3, - "metadata": { - "kernel": "SoS" - }, - "outputs": [ - { - "data": { - "text/html": [ - "\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - "\n", - "
\n", - " \n", - " \n", - "
9e7b75df6a5d3767
\n", - "
\n", - "
5b7627b1ac52aa8fscratch_0user_guide
\n", - "
\n", - "
\n", - "
\n", - "
missing
\n", - "
\n" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "data": { - "text/html": [ - "
INFO: 9e7b75df6a5d3767 sent 'task_files.ipynb', ... (5 items) to bcb
" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "data": { - "text/html": [ - "
INFO: 9e7b75df6a5d3767 received 'wc.txt' from bcb
" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "data": { - "text/html": [ - "
> wc.txt (156 B):
" - ], - "text/plain": [ - "\n", - "> wc.txt (156 B):" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "data": { - "text/html": [ - "
6 lines (5 displayed, see --limit)
" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - " 363 task_files.ipynb\n", - " 386 task_management.ipynb\n", - " 817 task_statement.ipynb\n", - " 223 task_tags.ipynb\n", - " 390 task_template.ipynb" - ] - } - ], - "source": [ - "%preview -n wc.txt \n", - "output: 'wc.txt'\n", - "task: to_host='task*.ipynb', queue='bcb' \n", - "sh: expand=True\n", - " wc -l *.ipynb > {_output}" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "kernel": "SoS", - "tags": [] - }, - "source": [ - "## Retrieving additional files after task completion (Option `from_host`)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "kernel": "SoS", - "tags": [] - }, - "source": [ - "Option `from_host` specifies additional files or directories that would be synchronized from the remote host after tasks are executed. It can be specified as\n", - "\n", - "* A single file or directory (with respect to local file system), or\n", - "* A list of files or directories\n", - "\n", - "The files or directories will be translated using the host-specific path maps to determine what remote files to retrieve." - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "SoS", - "language": "sos", - "name": "sos" - }, - "language_info": { - "codemirror_mode": "sos", - "file_extension": ".sos", - "mimetype": "text/x-sos", - "name": "sos", - "nbconvert_exporter": "sos_notebook.converter.SoS_Exporter", - "pygments_lexer": "sos" - }, - "sos": { - "kernels": [ - [ - "Bash", - "bash", - "Bash", - "#E6EEFF" - ], - [ - "R", - "ir", - "R", - "#DCDCDA" - ], - [ - "SoS", - "sos", - "", - "" - ] - ], - "panel": { - "displayed": true, - "height": 0 - }, - "version": "0.22.4" - } - }, - "nbformat": 4, - "nbformat_minor": 4 -} diff --git a/src/user_guide/task_statement.ipynb b/src/user_guide/task_statement.ipynb index 623ca642..ad212dea 100644 --- a/src/user_guide/task_statement.ipynb +++ b/src/user_guide/task_statement.ipynb @@ -322,13 +322,42 @@ "\n", "1. Creates a task with an unique ID (`c600532cd99c02c2`) determined by the content of the task.\n", "2. Copy the task file to remote host, translating `input` and `output` paths if two systems have different file systems.\n", - "3. Copy input files to remote host if needed.\n", - "4. Execute the task by executing something like\n", + "3. Execute the task by executing something like\n", " ```\n", " ssh host \"bash --login -c sos execute c600532cd99c02c2\"\n", " ```\n", - "5. Monitor the status of the task and update the status in the notebook.\n", - "6. Copy the output file to local host if needed." + "5. Monitor the status of the task and update the status in the notebook." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "kernel": "SoS", + "tags": [] + }, + "source": [ + "SoS assumes that **all input and output files are available with the same pathname on local and remote hosts**. In practive, this means that both local and remote hosts should mount the same (NFS) volumes from a file server, or the one host should export drives for the remote hosts to mount. The remote task, however, can use disks (e.g. scratch diskspaces) that are unavailable on local hosts.\n", + "\n", + "\n", + "A host definition file (usually `~/.sos/hosts.yml`) could have the following `shared` definitions (incomplete)\n", + "\n", + "```yaml\n", + "localhost: office\n", + "hosts:\n", + " office:\n", + " shared:\n", + " data: /mount/data\n", + " project: /mount/project\n", + " cluster:\n", + " shared:\n", + " data: /mount/data\n", + " project: /mount/project\n", + " scratch: /mount/scratch\n", + "```\n", + "\n", + "In this configuration, the scripts that are executed on remote server `cluster` should have input and output files on `/mount/data` or `/mount/project`. The task can write to `/mount/scratch` as long as the content is not referred by the workflow from the `office` side.\n", + "\n", + "**NOTE**: SoS currently does not check if all input and output files are on shared drives so the definition of these shared drives are not mandatary. This could change in the future so the specification of shared drives are recommended." ] }, { @@ -590,28 +619,6 @@ "was used to submit the job to the cluster system." ] }, - { - "cell_type": "markdown", - "metadata": { - "kernel": "SoS", - "tags": [] - }, - "source": [ - "The entire workflow for executing tasks on a remote cluster system contains five steps and can be depicted as follows. The process would be a lot simpler if you are allowed to execute workflows and submit tasks on the headnode of a cluster system since no file synchronization would be needed." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "kernel": "SoS", - "tags": [] - }, - "source": [ - "

\n", - " \n", - "

" - ] - }, { "cell_type": "markdown", "metadata": { @@ -640,9 +647,7 @@ "| `mem` | Memory required for task | [task template](task_template.html) |\n", "| `nodes` | Number of computing nodes | [task template](task_template.html) |\n", "| `cores` | Number of cores per node | [task template](task_template.html) |\n", - "| `workdir` | | [path translation and file synchronization ](task_files.html) |\n", - "| `to_host` | | [path translation and file synchronization ](task_files.html) |\n", - "| `from_host` | | [path translation and file synchronization ](task_files.html) |\n", + "| `workdir` | | [path translation and file synchronization ](task_files.html) | \n", "| `map_vars` | | [path translation and file synchronization ](task_files.html) |\n", "| ANY | Any template variable | [task template](task_template.html) |\n", "\n",