Skip to content

Specific instructions to run harvester with ssh rpc middleware

FaHui Lin edited this page May 23, 2019 · 6 revisions

Instructions

Basic description of SSH RPC middleware: see here

The queue configuration depends on the architecture of the cluster (HPC) and the use case.

Limitations

The plugins to run remotely through SSH + RPC (by rpc_bot) should be able run independently of harvester server. That is, the plugins to run remotely CANNOT access Harvester DB (cannot call dbInterface methods) or access file paths on the harvester server (when there is no shared file system across harvester server and HPC).

Generic use case

Consider an HPC:

  • No outbound connectivity on login nodes and worker nodes
  • One can access login nodes via SSH
  • Login nodes have the same environment and mount the same shared filesystem as worker nodes do
  • Allowing service process to run on login nodes (no cputime limit per process or other limitations)
  • With DTNs (data transfer nodes) which has outbound connectivity and grid data transfer tools (globus, gfal, xroot, etc.)
  • DTNs are accessible from login nodes

Then it suffices to run harvester rpc_bot process on the login node of HPC, and let all harvester plugins run on the login node (run remotely).

That is, harvester runs the following plugins remotely:

  • submitter
  • monitor
  • sweeper
  • messenger
  • preparator
  • stager

The queue configuration (partial) may look like this:

                "preparator": {
                        "name": "SomePreparator",
                        "module": "pandaharvester.harvesterpreparator.some_preparator",
                        "basePath": "/some/remote/base/path",
                        "middleware": "rpc"
                },
                "submitter": {
                        "name":"SlurmSubmitter",
                        "module":"pandaharvester.harvestersubmitter.slurm_submitter",
                        "nCore": 9600,
                        "nCorePerNode": 48,
                        "templateFile": "/some/remote/template.sh",
                        "middleware": "rpc"
                },
                "messenger": {
                        "name": "SharedFileMessenger",
                        "module": "pandaharvester.harvestermessenger.shared_file_messenger",
                        "accessPoint": "/some/remote/path/${workerID}",
                        "middleware": "rpc"
                },
                "stager": {
                        "name":"SomeStager",
                        "module":"pandaharvester.harvesterstager.some_stager",
                        "middleware": "rpc"
                },
                "monitor": {
                        "name":"SlurmMonitor",
                        "module":"pandaharvester.harvestermonitor.slurm_monitor",
                        "middleware": "rpc"
                },
                "sweeper": {
                        "name": "SlurmSweeper",
                        "module": "pandaharvester.harvestersweeper.slurm_sweeper",
                        "middleware": "rpc"
                },
                "rpc": {
                        "name": "RpcHerder",
                        "module": "pandaharvester.harvestermiddleware.rpc_herder",
                        "remoteHost": "some.remote.host",
                        "remoteBindPort": 18861,
                        "numTunnels": 3,
                        "sshUserName": "someusername",
                        "sshPassword": null,
                        "privateKey": "/some/private/key",
                        "passPhrase": "somepassphrase",
                        "jumpHost": "some.jump.host",
                        "jumpPort": 22
                }

Note that the paths set for the plugin with rpc (e.g. messenger accessPoint) are the remote ones; i.e. on the HPC side.

Clone this wiki locally