You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
First and simpler approach for HPC integration is to use ssh access and ssh keys so our app user can login to the cluster as users and start the slurm job as them.
Note that CAS integration (included in #10) is a prerequisite for this.
implementation details
request access to ssh without duo from test vm to hpc machine
ensure access to ssh from vm to hpc (may require PUL .lib domain firewall change)
add a vaulted ssh key to deploy and write instructions for adding to authorized keys on hpc machine
write remote versions equivalent to gpu celery tasks to kick off training jobs: export needed data/model, use scp/rsync to transfer files and ssh to log in as the current user, start the slurm job
modify escriptorium to call our remote version of the task instead of running locally (think about how to make configurable but this version doesn't have to be elegant)
implement method to check status of remote slurm job
modify escriptorium task monitoring to handle remote slurm job
when the job completes, update the refined model back in escriptorium and report on status
The text was updated successfully, but these errors were encountered:
First and simpler approach for HPC integration is to use ssh access and ssh keys so our app user can login to the cluster as users and start the slurm job as them.
Note that CAS integration (included in #10) is a prerequisite for this.
implementation details
request access to ssh without duo from test vm to hpc machine
ensure access to ssh from vm to hpc (may require PUL .lib domain firewall change)
add a vaulted ssh key to deploy and write instructions for adding to authorized keys on hpc machine
write remote versions equivalent to gpu celery tasks to kick off training jobs: export needed data/model, use scp/rsync to transfer files and ssh to log in as the current user, start the slurm job
modify escriptorium to call our remote version of the task instead of running locally (think about how to make configurable but this version doesn't have to be elegant)
implement method to check status of remote slurm job
modify escriptorium task monitoring to handle remote slurm job
when the job completes, update the refined model back in escriptorium and report on status
First and simpler approach for HPC integration is to use ssh access and ssh keys so our app user can login to the cluster as users and start the slurm job as them.
Note that CAS integration (included in #10) is a prerequisite for this.
implementation details
The text was updated successfully, but these errors were encountered: