From 1c4ddefd42ce8d5d809e94bbf5ae2080420224fa Mon Sep 17 00:00:00 2001 From: Zebula Sampedro Date: Sat, 25 Aug 2018 09:01:57 -0600 Subject: [PATCH] Issues/5 (#1) (#6) * Added initial subset of config for hub, spawner, and singularity notebooks. Not much in the way of documentation yet, mostly just bare config lifted out of our deployment for jupyterhub/jupyterhub-deploy-hpc#5. Still a WIP. * Added deployment summary. * Documented the jupyterhub config files. * Documented Singularity config and ipyparallel profile. * Added future plans section. * Corrected regex. --- .../README.md | 62 +++++++++++++ .../jupyterhub-config/form_config.py | 90 +++++++++++++++++++ .../jupyterhub-config/jupyterhub_config.py | 57 ++++++++++++ .../jupyterhub-config/slurm_config.py | 40 +++++++++ .../Singularity | 58 ++++++++++++ .../profile_example-shas/ipcluster_config.py | 34 +++++++ .../ipcontroller_config.py | 3 + .../profile_example-shas/startup/README | 11 +++ 8 files changed, 355 insertions(+) create mode 100644 optionsspawner-slurm-singularity-rmaccsummit/README.md create mode 100644 optionsspawner-slurm-singularity-rmaccsummit/jupyterhub-config/form_config.py create mode 100644 optionsspawner-slurm-singularity-rmaccsummit/jupyterhub-config/jupyterhub_config.py create mode 100644 optionsspawner-slurm-singularity-rmaccsummit/jupyterhub-config/slurm_config.py create mode 100644 optionsspawner-slurm-singularity-rmaccsummit/singularity-notebook-ipyparallel/Singularity create mode 100644 optionsspawner-slurm-singularity-rmaccsummit/singularity-notebook-ipyparallel/profile_example-shas/ipcluster_config.py create mode 100644 optionsspawner-slurm-singularity-rmaccsummit/singularity-notebook-ipyparallel/profile_example-shas/ipcontroller_config.py create mode 100644 optionsspawner-slurm-singularity-rmaccsummit/singularity-notebook-ipyparallel/profile_example-shas/startup/README diff --git a/optionsspawner-slurm-singularity-rmaccsummit/README.md b/optionsspawner-slurm-singularity-rmaccsummit/README.md new file mode 100644 index 0000000..a2bad5e --- /dev/null +++ b/optionsspawner-slurm-singularity-rmaccsummit/README.md @@ -0,0 +1,62 @@ +# Jupyter-Summit deployment at University of Colorado Boulder Research Computing +--- + +Author: [Zebula Sampedro](https://github.com/zebulasampedro) <sampedro@colorado.edu> + +## Summary + +Abridged configuration of the Jupyter deployment at CU Research Computing, which runs JupyterHub on a single node, and spawns Notebook servers on the [RMACC Summit](https://www.colorado.edu/rc/resources/summit) cluster via the Slurm scheduler. To allow users to configure Notebook server scheduling parameters during spawn, we use [OptionsFormSpawner](https://github.com/ResearchComputing/jupyterhub-options-spawner) to wrap [SlurmSpawner](https://github.com/jupyterhub/batchspawner). + +The Notebook servers are run in Singularity containers configured to use certain host services directly, like Slurm and SSSD. These containerized Notebook servers use the host Slurm to run multi-node IPyParallel clusters, with each IPEngine also running in a container. + +## JupyterHub Config Files +The `jupyterhub-config/` directory contains the three configuration files used in our deployment. + +### `jupyterhub_config.py` +The primary configuration file loaded by the `jupyterhub -f`. A rough outline of the configuration: +* Increases the spawner timeout to accommodate longer queue waits. +* Sets up the `OptionsFormSpawner` to wrap the `SlurmSpawner`. +* Sets the default location of the Notebook to the user's home directory. +* Makes sure the JupyterHub server is configured to be contacted by remote Notebook servers running on the compute cluster. + +We are currently using JupyterHub's built-in SSL, but are moving towards Nginx for SSL and HTTP -> HTTPS redirection. The Hub uses our PAM stack for auth. + +### `form_config.py` +Configuration for the `OptionsFormSpawner` form fields that allow for user-configuration of spawn options and Slurm scheduling parameters. The values specified in these fields will be applied to the corresponding traits of the wrapped child spawner, `SlurmSpawner`. + +SlurmSpawner will expose any trait prefixed with `req_` to the Slurm command templates it uses. The variable is usable in the template with the prefix removed. The options form config leverages this to expand the number of queueing options available to the Sbatch script. + +### `slurm_config.py` +Configuration for the `SlurmSpawner`, applied by the `OptionsFormSpawner.child_config` trait. There are two major components to this file: the Sbatch script that will start the Notebook server, and the Slurm command configuration that the spawner will use to control the notebook server's lifecycle. + +## Singularity Notebook Servers +The `singularity-notebook-ipyparallel/` directory contains the files and configuration for the Jupyter Notebook containers, as well as the Ipyparallel profile. + +### Jupyter Notebook Image +Singularity base image for Jupyter Notebook servers running on RC resources. Can be started either via JupyterHub or directly by the end-user in a tunneling setup. + +The container is started with a number of non-standard bind mounts _(example below)_ to allow for direct access to host services like Slurm, SSSD, and PAM. The first section of the Singularity recipe `%post` block configures the container-internal dependencies for these mounted services. +``` +singularity shell \ + --bind /var/lib/sss/pipes \ + --bind /home/slurm \ + --bind /var/run/munge \ + --bind /etc/slurm \ + --bind /curc/slurm \ + jupyter-notebook.img +``` + +### IPyParallel +The `singularity-notebook-ipyparallel/profile_example-shas/` directory contains the IPyParallel profile we preload into the notebook image as an example for our end-users. This profile uses the `SlurmEngineSetLauncher` to start ipengines using Slurm. The `$CONTAINER_PATH` environment variable set by the initial SlurmSpawner script ensures that the ipengines start in a new instance of the same image the notebook server is running on. + +## Omissions +Our deployment also containerizes JupyterHub using Docker and Docker Compose to improve change management and automation capabilities. We omitted this bit of the config for brevity, and also because there exist other _(likely better and more generic)_ examples for this deployment strategy. + +## Future Plans +* Add more dynamic form capabilities to OptionsFormSpawner. +* Make more Singularity stacks available to our end-users. High on the list are R and PySpark. +* Extend SlurmSpawner to provide more informative error messaging and status feedback to end-users. +* Place JupyterHub behind Nginx for SSL, redirection, and better outage messaging. +* Make JupyterLab available to end-users. +* Develop extensions for JupyterLab to allow for interactive creation of Sbatch job scripts, and job queue management. +* Research, evaluate, and document strategies for making Jupyter the central component in our science gateway efforts moving forward. diff --git a/optionsspawner-slurm-singularity-rmaccsummit/jupyterhub-config/form_config.py b/optionsspawner-slurm-singularity-rmaccsummit/jupyterhub-config/form_config.py new file mode 100644 index 0000000..aedd41e --- /dev/null +++ b/optionsspawner-slurm-singularity-rmaccsummit/jupyterhub-config/form_config.py @@ -0,0 +1,90 @@ +from optionsspawner.forms import ( + FormField, + TextInputField, + NumericalInputField, + CheckboxInputField, + SelectField, +) + + +partition_select = SelectField('req_partition', + label='Select a partition', + attr_required=True, + choices=[ + ('shas', "Summit - Haswell"), + ('sknl', "Summit - Knight's Landing"), + ('blanca-csdms', "Blanca - CSDMS"), + ('blanca-sol', "Blanca - Sol"), + ], + default='shas' +) + +qos_select = SelectField('req_qos', + label='Select a QoS', + attr_required=True, + choices=[ + ('jupyterhub', "Summit - All Partitions"), + ('blanca-csdms', "Blanca - CSDMS"), + ('blanca-sol', "Blanca - Sol"), + ], + default='jupyterhub' +) + +account_input = TextInputField('req_account', + label='Specify an account to charge (Required for Blanca users)' +) + +cluster_select = SelectField('req_cluster', + label='Select a cluster', + attr_required=True, + choices=[ + ('summit', "Summit"), + ('blanca', "Blanca"), + ], +) + +runtime_input = TextInputField('req_runtime', + label='Specify runtime (HH:MM:SS format, 12hr max)', + attr_required=True, + attr_value='02:00:00', + attr_pattern="[01]{1}[0-2]{1}:[0-5]{1}[0-9]{1}:[0-5]{1}[0-9]{1}" +) + +nodes_input = NumericalInputField('req_nodes', + label='Specify node count', + attr_required=True, + attr_value=1, + attr_min=1, + attr_max=4 +) + +ntasks_input = NumericalInputField('req_ntasks', + label='Specify tasks per node', + attr_required=True, + attr_value=1, + attr_min=1, + attr_max=24 +) + +image_select = SelectField('req_image_path', + label='Select a notebook image', + attr_required=True, + choices=[ + ('/curc/tools/images/jupyter-notebook-base/jupyter-notebook-baselatest.simg', "Python3"), + # TODO: Make more environments available to our end-users. + # ('/curc/tools/images/jupyter-notebook-base/jupyter-notebook-pysparklatest.simg', "PySpark"), + # ('/curc/tools/images/jupyter-notebook-base/jupyter-notebook-rlatest.simg', "R"), + ], + default='/curc/tools/images/jupyter-notebook-base/jupyter-notebook-baselatest.simg' +) + +form_fields = [ + cluster_select, + partition_select, + qos_select, + account_input, + image_select, + runtime_input, + nodes_input, + ntasks_input, +] diff --git a/optionsspawner-slurm-singularity-rmaccsummit/jupyterhub-config/jupyterhub_config.py b/optionsspawner-slurm-singularity-rmaccsummit/jupyterhub-config/jupyterhub_config.py new file mode 100644 index 0000000..5e6986c --- /dev/null +++ b/optionsspawner-slurm-singularity-rmaccsummit/jupyterhub-config/jupyterhub_config.py @@ -0,0 +1,57 @@ +import os +import importlib.machinery + +slurm_config = importlib.machinery.SourceFileLoader('slurm_config','/opt/jupyterhub/config/slurm_config.py').load_module() +form_config = importlib.machinery.SourceFileLoader('form_config','/opt/jupyterhub/config/form_config.py').load_module() + + + +# SPAWNER CONFIGURATION + +# Increase spawner timeout to be tolerant of longer queue wait times. +c.Spawner.http_timeout = 300 +c.Spawner.start_timeout = 300 + +# https://github.com/jupyterhub/jupyterhub/issues/929 +c.Spawner.notebook_dir = '/' +c.Spawner.default_url = '/tree/home/{username}' + +# OptionsSpawner: Attach options forms to any JupyterHub spawner using only configuration. +# https://github.com/ResearchComputing/jupyterhub-options-spawner +c.JupyterHub.spawner_class = 'optionsspawner.OptionsFormSpawner' + +# The OptionsSpawner wraps SlurmSpawner from https://github.com/jupyterhub/batchspawner +c.OptionsFormSpawner.child_class = 'batchspawner.SlurmSpawner' +c.OptionsFormSpawner.child_config = slurm_config.spawner_config +c.OptionsFormSpawner.form_fields = form_config.form_fields + +# HUB CONFIGURATION + +# Set the log level by value or name. +c.JupyterHub.log_level = 'DEBUG' +c.JupyterHub.extra_log_file = '/var/log/jupyterhub/debug-log.log' + +# Allow servers to persist between restarts of the hub itself +c.JupyterHub.cleanup_servers = False + +# We're doing SSL through the Hub's built-in capabilities, so host on 443 +c.JupyterHub.ip = '0.0.0.0' +c.JupyterHub.port = 443 + +# Make sure that the remote notebook servers can contact the hub +jupyterhub_hostname = os.environ.get('HOSTNAME') +c.JupyterHub.hub_ip = jupyterhub_hostname + +c.JupyterHub.cookie_secret_file = '/opt/jupyterhub/jupyterhub_cookie_secret' + +c.JupyterHub.db_url = '/opt/jupyterhub/jupyterhub.sqlite' + +# SSL Config +c.JupyterHub.ssl_cert = os.environ.get('JUPYTERHUB_CERT_PATH') +c.JupyterHub.ssl_key = os.environ.get('JUPYTERHUB_KEY_PATH') + +# Configure the admin interface +admins_env = os.environ.get('JUPYTERHUB_ADMINS', '') +admins = tuple(admins_env.split()) if admins_env else () +c.Authenticator.admin_users = admins +c.JupyterHub.admin_access = True diff --git a/optionsspawner-slurm-singularity-rmaccsummit/jupyterhub-config/slurm_config.py b/optionsspawner-slurm-singularity-rmaccsummit/jupyterhub-config/slurm_config.py new file mode 100644 index 0000000..5a6a650 --- /dev/null +++ b/optionsspawner-slurm-singularity-rmaccsummit/jupyterhub-config/slurm_config.py @@ -0,0 +1,40 @@ +summit_script = """#!/bin/bash +#SBATCH --partition={partition} +#SBATCH --qos={qos} +#SBATCH --account={account} +#SBATCH --time={runtime} +#SBATCH --nodes={nodes} +#SBATCH --ntasks-per-node={ntasks} +#SBATCH --output={homedir}/.jupyterhub-slurmspawner.log +#SBATCH --open-mode=append +#SBATCH --job-name=spawner-jupyterhub +#SBATCH --workdir={homedir} +#SBATCH --export={keepvars} +#SBATCH --uid={username} + +ml singularity/2.4.2 + +# jupyter-singleuser anticipates that environment will be dropped during sudo, however +# it is retained by batchspawner. The XDG_RUNTIME_DIR variable must be unset to force a +# fallback, otherwise a permissions error occurs when starting the notebook. +# https://github.com/jupyter/notebook/issues/1318 + +export SINGULARITYENV_JUPYTERHUB_API_TOKEN=$JUPYTERHUB_API_TOKEN +export SINGULARITYENV_XDG_RUNTIME_DIR=$HOME/.singularity-jupyter-run +export SINGULARITYENV_CONTAINER_PATH={image_path} +singularity run \ + --bind /var/lib/sss/pipes \ + --bind /home/slurm \ + --bind /var/run/munge \ + --bind /etc/slurm \ + --bind /curc/slurm \ + --bind /etc/pam.d \ + $SINGULARITYENV_CONTAINER_PATH {cmd} +""" + +spawner_config = { + 'batch_script': summit_script, + 'batch_submit_cmd': """sudo -E -u {username} SLURM_CONF=/curc/slurm/{cluster}/etc/slurm.conf sbatch""", + 'batch_query_cmd': """sudo -E -u {username} SLURM_CONF=/curc/slurm/{cluster}/etc/slurm.conf squeue -h -j {job_id} -o "%T %B" """, + 'batch_cancel_cmd': """sudo -E -u {username} SLURM_CONF=/curc/slurm/{cluster}/etc/slurm.conf scancel {job_id}""", +} diff --git a/optionsspawner-slurm-singularity-rmaccsummit/singularity-notebook-ipyparallel/Singularity b/optionsspawner-slurm-singularity-rmaccsummit/singularity-notebook-ipyparallel/Singularity new file mode 100644 index 0000000..ec4b930 --- /dev/null +++ b/optionsspawner-slurm-singularity-rmaccsummit/singularity-notebook-ipyparallel/Singularity @@ -0,0 +1,58 @@ +Bootstrap: docker +From: centos:7 + +%labels + MAINTAINER sampedro@colorado.edu + +%environment + export JUPYTER_DATA_DIR=$HOME/.singularity-jupyter + +%setup + mkdir -p /usr/local/share/profile_example-shas + +%files + profile_example-shas/ /usr/local/share/profile_example-shas/ + +%post + # Install dependencies for Slurm, SSSD, and PAM + useradd -u 515 -m slurm + useradd -u 992 -m munge + yum -y update + yum -y install epel-release + yum -y groupinstall 'Development Tools' + yum -y install sssd curl wget strace iproute munge munge-devel pam-devel openssl openssl-devel readline-devel perl-devel + cd ~ && wget https://download.schedmd.com/slurm/slurm-17.02.9.tar.bz2 + rpmbuild -ta slurm-17.02.9.tar.bz2 + cd ~/rpmbuild/RPMS/x86_64 + rpm -ivh slurm-pam_slurm-17.02.9-1.el7.centos.x86_64.rpm slurm-plugins-17.02.9-1.el7.centos.x86_64.rpm slurm-munge-17.02.9-1.el7.centos.x86_64.rpm slurm-perlapi-17.02.9-1.el7.centos.x86_64.rpm slurm-17.02.9-1.el7.centos.x86_64.rpm slurm-devel-17.02.9-1.el7.centos.x86_64.rpm + + # Install Omnipath and OpenMPI user libraries for Summit + yum install -y libhfi1 libpsm2 libpsm2-devel libpsm2-compat + yum install -y perftest qperf + yum install -y libibverbs libibverbs-devel rdma + yum install -y numactl-libs numactl-devel + yum install -y pciutils + yum install -y which + wget https://download.open-mpi.org/release/open-mpi/v2.0/openmpi-2.0.1.tar.gz + tar -xf openmpi-2.0.1.tar.gz + cd openmpi-2.0.1/ + ./configure \ + --with-verbs \ + --with-psm2 \ + --enable-mpi-thread-multiple + make -j2 + make install + + # Install Jupyter* + yum -y install python34-devel python34-pip + pip3 install --upgrade pip + pip3 install jupyterhub==0.7.2 + pip3 install --upgrade notebook + pip3 install ipyparallel==6.2 + pip3 install matplotlib numpy scipy pandas + +%runscript + # Copy example ipyparallel profile to home directory + mkdir -p $HOME/.ipython + cp -rf /usr/local/share/profile_example-shas $HOME/.ipython/profile_example-shas + exec "$@" diff --git a/optionsspawner-slurm-singularity-rmaccsummit/singularity-notebook-ipyparallel/profile_example-shas/ipcluster_config.py b/optionsspawner-slurm-singularity-rmaccsummit/singularity-notebook-ipyparallel/profile_example-shas/ipcluster_config.py new file mode 100644 index 0000000..03cc728 --- /dev/null +++ b/optionsspawner-slurm-singularity-rmaccsummit/singularity-notebook-ipyparallel/profile_example-shas/ipcluster_config.py @@ -0,0 +1,34 @@ +# Configuration file for ipcluster. + +c.IPClusterEngines.engine_launcher_class = 'ipyparallel.apps.launcher.SlurmEngineSetLauncher' +c.SlurmLauncher.qos = 'normal' +c.SlurmLauncher.timelimit = '1:00:00' + +c.SlurmEngineSetLauncher.batch_template = """#!/bin/bash +#SBATCH --partition shas +#SBATCH --qos {qos} +#SBATCH --job-name ipengine +#SBATCH --ntasks {n} +# This will run a single ipengine per CPU +#SBATCH --cpus-per-task 1 +# Use ntasks-per-node=1 to run one ipengine per node +#SBATCH --time {timelimit} +#SBATCH --output {profile_dir}/log/slurm.out + +ml gcc singularity/2.4.2 +ml openmpi/2.0.1 + +# Run each IPEngine in a new instance of the current container. + +export SINGULARITYENV_JUPYTERHUB_API_TOKEN=$JUPYTERHUB_API_TOKEN +export SINGULARITYENV_XDG_RUNTIME_DIR=$HOME/.singularity-jupyter-run +export SINGULARITYENV_CONTAINER_PATH=$CONTAINER_PATH +mpirun singularity run \ + --bind /var/lib/sss/pipes \ + --bind /home/slurm \ + --bind /var/run/munge \ + --bind /etc/slurm \ + --bind /etc/pam.d \ + $CONTAINER_PATH \ + ipengine --profile-dir="{profile_dir}" --cluster-id="{cluster_id}" +""" diff --git a/optionsspawner-slurm-singularity-rmaccsummit/singularity-notebook-ipyparallel/profile_example-shas/ipcontroller_config.py b/optionsspawner-slurm-singularity-rmaccsummit/singularity-notebook-ipyparallel/profile_example-shas/ipcontroller_config.py new file mode 100644 index 0000000..a46bda2 --- /dev/null +++ b/optionsspawner-slurm-singularity-rmaccsummit/singularity-notebook-ipyparallel/profile_example-shas/ipcontroller_config.py @@ -0,0 +1,3 @@ +# Configuration file for ipcontroller. + +c.HubFactory.ip = '*' diff --git a/optionsspawner-slurm-singularity-rmaccsummit/singularity-notebook-ipyparallel/profile_example-shas/startup/README b/optionsspawner-slurm-singularity-rmaccsummit/singularity-notebook-ipyparallel/profile_example-shas/startup/README new file mode 100644 index 0000000..61d4700 --- /dev/null +++ b/optionsspawner-slurm-singularity-rmaccsummit/singularity-notebook-ipyparallel/profile_example-shas/startup/README @@ -0,0 +1,11 @@ +This is the IPython startup directory + +.py and .ipy files in this directory will be run *prior* to any code or files specified +via the exec_lines or exec_files configurables whenever you load this profile. + +Files will be run in lexicographical order, so you can control the execution order of files +with a prefix, e.g.:: + + 00-first.py + 50-middle.py + 99-last.ipy