-
Notifications
You must be signed in to change notification settings - Fork 16
Specific instructions to run harvester with ssh rpc middleware
Basic description of SSH RPC middleware: see here
The queue configuration depends on the architecture of the cluster (HPC) and the use case.
The plugins to run remotely through SSH + RPC (by rpc_bot) should be able run independently of harvester server. That is, the plugins to run remotely CANNOT access Harvester DB (cannot call dbInterface
methods) or access file paths on the harvester server (when there is no shared file system across harvester server and HPC).
Consider an HPC:
- No outbound connectivity on login nodes and worker nodes
- One can access login nodes via SSH
- Login nodes have the same environment and mount the same shared filesystem as worker nodes do
- Allowing service process to run on login nodes (no cputime limit per process or other limitations)
- With DTNs (data transfer nodes) which has outbound connectivity and grid data transfer tools (globus, gfal, xroot, etc.)
- DTNs are accessible from login nodes
Then it suffices to run harvester rpc_bot process on the login node of HPC, and let all harvester plugins run on the login node (run remotely).
That is, harvester runs the following plugins remotely:
- submitter
- monitor
- sweeper
- messenger
- preparator
- stager
The queue configuration (partial) may look like this:
"preparator": {
"name": "SomePreparator",
"module": "pandaharvester.harvesterpreparator.some_preparator",
"basePath": "/some/remote/base/path",
"middleware": "rpc"
},
"submitter": {
"name":"SlurmSubmitter",
"module":"pandaharvester.harvestersubmitter.slurm_submitter",
"nCore": 9600,
"nCorePerNode": 48,
"templateFile": "/some/remote/template.sh",
"middleware": "rpc"
},
"messenger": {
"name": "SharedFileMessenger",
"module": "pandaharvester.harvestermessenger.shared_file_messenger",
"accessPoint": "/some/remote/path/${workerID}",
"middleware": "rpc"
},
"stager": {
"name":"SomeStager",
"module":"pandaharvester.harvesterstager.some_stager",
"middleware": "rpc"
},
"monitor": {
"name":"SlurmMonitor",
"module":"pandaharvester.harvestermonitor.slurm_monitor",
"middleware": "rpc"
},
"sweeper": {
"name": "SlurmSweeper",
"module": "pandaharvester.harvestersweeper.slurm_sweeper",
"middleware": "rpc"
},
"rpc": {
"name": "RpcHerder",
"module": "pandaharvester.harvestermiddleware.rpc_herder",
"remoteHost": "some.remote.host",
"remoteBindPort": 18861,
"numTunnels": 3,
"sshUserName": "someusername",
"sshPassword": null,
"privateKey": "/some/private/key",
"passPhrase": "somepassphrase",
"jumpHost": "some.jump.host",
"jumpPort": 22
}
Note that the paths set for the plugin with rpc (e.g. messenger accessPoint
) are the remote ones; i.e. on the HPC side.
Getting started |
---|
Installation and configuration |
Testing and running |
Debugging |
Work with Middleware |
Admin FAQ |
Development guides |
---|
Development workflow |
Tagging |
Production & commissioning |
---|
Scale up submission |
Condor experiences |
Commissioning on the grid |
Production servers |
Service monitoring |
Auto Queue Configuration with CRIC |
SSH+RPC middleware setup |
Kubernetes section |
---|
Kubernetes setup |
X509 credentials |
AWS setup |
GKE setup |
CERN setup |
CVMFS installation |
Generic service accounts |
Advanced payloads |
---|
Horovod integration |