Skip to content

Commit

Permalink
Issues/5 (#1) (#6)
Browse files Browse the repository at this point in the history
* Added initial subset of config for hub, spawner, and singularity notebooks.

Not much in the way of documentation yet, mostly just bare config lifted out of our deployment for #5. Still a WIP.

* Added deployment summary.

* Documented the jupyterhub config files.

* Documented Singularity config and ipyparallel profile.

* Added future plans section.

* Corrected regex.
  • Loading branch information
zebulasampedro authored and mbmilligan committed Aug 25, 2018
1 parent fc9fc52 commit 1c4ddef
Show file tree
Hide file tree
Showing 8 changed files with 355 additions and 0 deletions.
62 changes: 62 additions & 0 deletions optionsspawner-slurm-singularity-rmaccsummit/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
# Jupyter-Summit deployment at University of Colorado Boulder Research Computing
---

Author: [Zebula Sampedro](https://github.com/zebulasampedro) <[email protected]>

## Summary

Abridged configuration of the Jupyter deployment at CU Research Computing, which runs JupyterHub on a single node, and spawns Notebook servers on the [RMACC Summit](https://www.colorado.edu/rc/resources/summit) cluster via the Slurm scheduler. To allow users to configure Notebook server scheduling parameters during spawn, we use [OptionsFormSpawner](https://github.com/ResearchComputing/jupyterhub-options-spawner) to wrap [SlurmSpawner](https://github.com/jupyterhub/batchspawner).

The Notebook servers are run in Singularity containers configured to use certain host services directly, like Slurm and SSSD. These containerized Notebook servers use the host Slurm to run multi-node IPyParallel clusters, with each IPEngine also running in a container.

## JupyterHub Config Files
The `jupyterhub-config/` directory contains the three configuration files used in our deployment.

### `jupyterhub_config.py`
The primary configuration file loaded by the `jupyterhub -f`. A rough outline of the configuration:
* Increases the spawner timeout to accommodate longer queue waits.
* Sets up the `OptionsFormSpawner` to wrap the `SlurmSpawner`.
* Sets the default location of the Notebook to the user's home directory.
* Makes sure the JupyterHub server is configured to be contacted by remote Notebook servers running on the compute cluster.

We are currently using JupyterHub's built-in SSL, but are moving towards Nginx for SSL and HTTP -> HTTPS redirection. The Hub uses our PAM stack for auth.

### `form_config.py`
Configuration for the `OptionsFormSpawner` form fields that allow for user-configuration of spawn options and Slurm scheduling parameters. The values specified in these fields will be applied to the corresponding traits of the wrapped child spawner, `SlurmSpawner`.

SlurmSpawner will expose any trait prefixed with `req_` to the Slurm command templates it uses. The variable is usable in the template with the prefix removed. The options form config leverages this to expand the number of queueing options available to the Sbatch script.

### `slurm_config.py`
Configuration for the `SlurmSpawner`, applied by the `OptionsFormSpawner.child_config` trait. There are two major components to this file: the Sbatch script that will start the Notebook server, and the Slurm command configuration that the spawner will use to control the notebook server's lifecycle.

## Singularity Notebook Servers
The `singularity-notebook-ipyparallel/` directory contains the files and configuration for the Jupyter Notebook containers, as well as the Ipyparallel profile.

### Jupyter Notebook Image
Singularity base image for Jupyter Notebook servers running on RC resources. Can be started either via JupyterHub or directly by the end-user in a tunneling setup.

The container is started with a number of non-standard bind mounts _(example below)_ to allow for direct access to host services like Slurm, SSSD, and PAM. The first section of the Singularity recipe `%post` block configures the container-internal dependencies for these mounted services.
```
singularity shell \
--bind /var/lib/sss/pipes \
--bind /home/slurm \
--bind /var/run/munge \
--bind /etc/slurm \
--bind /curc/slurm \
jupyter-notebook.img
```

### IPyParallel
The `singularity-notebook-ipyparallel/profile_example-shas/` directory contains the IPyParallel profile we preload into the notebook image as an example for our end-users. This profile uses the `SlurmEngineSetLauncher` to start ipengines using Slurm. The `$CONTAINER_PATH` environment variable set by the initial SlurmSpawner script ensures that the ipengines start in a new instance of the same image the notebook server is running on.

## Omissions
Our deployment also containerizes JupyterHub using Docker and Docker Compose to improve change management and automation capabilities. We omitted this bit of the config for brevity, and also because there exist other _(likely better and more generic)_ examples for this deployment strategy.

## Future Plans
* Add more dynamic form capabilities to OptionsFormSpawner.
* Make more Singularity stacks available to our end-users. High on the list are R and PySpark.
* Extend SlurmSpawner to provide more informative error messaging and status feedback to end-users.
* Place JupyterHub behind Nginx for SSL, redirection, and better outage messaging.
* Make JupyterLab available to end-users.
* Develop extensions for JupyterLab to allow for interactive creation of Sbatch job scripts, and job queue management.
* Research, evaluate, and document strategies for making Jupyter the central component in our science gateway efforts moving forward.
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
from optionsspawner.forms import (
FormField,
TextInputField,
NumericalInputField,
CheckboxInputField,
SelectField,
)


partition_select = SelectField('req_partition',
label='Select a partition',
attr_required=True,
choices=[
('shas', "Summit - Haswell"),
('sknl', "Summit - Knight's Landing"),
('blanca-csdms', "Blanca - CSDMS"),
('blanca-sol', "Blanca - Sol"),
],
default='shas'
)

qos_select = SelectField('req_qos',
label='Select a QoS',
attr_required=True,
choices=[
('jupyterhub', "Summit - All Partitions"),
('blanca-csdms', "Blanca - CSDMS"),
('blanca-sol', "Blanca - Sol"),
],
default='jupyterhub'
)

account_input = TextInputField('req_account',
label='Specify an account to charge (Required for Blanca users)'
)

cluster_select = SelectField('req_cluster',
label='Select a cluster',
attr_required=True,
choices=[
('summit', "Summit"),
('blanca', "Blanca"),
],
)

runtime_input = TextInputField('req_runtime',
label='Specify runtime (HH:MM:SS format, 12hr max)',
attr_required=True,
attr_value='02:00:00',
attr_pattern="[01]{1}[0-2]{1}:[0-5]{1}[0-9]{1}:[0-5]{1}[0-9]{1}"
)

nodes_input = NumericalInputField('req_nodes',
label='Specify node count',
attr_required=True,
attr_value=1,
attr_min=1,
attr_max=4
)

ntasks_input = NumericalInputField('req_ntasks',
label='Specify tasks per node',
attr_required=True,
attr_value=1,
attr_min=1,
attr_max=24
)

image_select = SelectField('req_image_path',
label='Select a notebook image',
attr_required=True,
choices=[
('/curc/tools/images/jupyter-notebook-base/jupyter-notebook-baselatest.simg', "Python3"),
# TODO: Make more environments available to our end-users.
# ('/curc/tools/images/jupyter-notebook-base/jupyter-notebook-pysparklatest.simg', "PySpark"),
# ('/curc/tools/images/jupyter-notebook-base/jupyter-notebook-rlatest.simg', "R"),
],
default='/curc/tools/images/jupyter-notebook-base/jupyter-notebook-baselatest.simg'
)

form_fields = [
cluster_select,
partition_select,
qos_select,
account_input,
image_select,
runtime_input,
nodes_input,
ntasks_input,
]
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
import os
import importlib.machinery

slurm_config = importlib.machinery.SourceFileLoader('slurm_config','/opt/jupyterhub/config/slurm_config.py').load_module()
form_config = importlib.machinery.SourceFileLoader('form_config','/opt/jupyterhub/config/form_config.py').load_module()



# SPAWNER CONFIGURATION

# Increase spawner timeout to be tolerant of longer queue wait times.
c.Spawner.http_timeout = 300
c.Spawner.start_timeout = 300

# https://github.com/jupyterhub/jupyterhub/issues/929
c.Spawner.notebook_dir = '/'
c.Spawner.default_url = '/tree/home/{username}'

# OptionsSpawner: Attach options forms to any JupyterHub spawner using only configuration.
# https://github.com/ResearchComputing/jupyterhub-options-spawner
c.JupyterHub.spawner_class = 'optionsspawner.OptionsFormSpawner'

# The OptionsSpawner wraps SlurmSpawner from https://github.com/jupyterhub/batchspawner
c.OptionsFormSpawner.child_class = 'batchspawner.SlurmSpawner'
c.OptionsFormSpawner.child_config = slurm_config.spawner_config
c.OptionsFormSpawner.form_fields = form_config.form_fields

# HUB CONFIGURATION

# Set the log level by value or name.
c.JupyterHub.log_level = 'DEBUG'
c.JupyterHub.extra_log_file = '/var/log/jupyterhub/debug-log.log'

# Allow servers to persist between restarts of the hub itself
c.JupyterHub.cleanup_servers = False

# We're doing SSL through the Hub's built-in capabilities, so host on 443
c.JupyterHub.ip = '0.0.0.0'
c.JupyterHub.port = 443

# Make sure that the remote notebook servers can contact the hub
jupyterhub_hostname = os.environ.get('HOSTNAME')
c.JupyterHub.hub_ip = jupyterhub_hostname

c.JupyterHub.cookie_secret_file = '/opt/jupyterhub/jupyterhub_cookie_secret'

c.JupyterHub.db_url = '/opt/jupyterhub/jupyterhub.sqlite'

# SSL Config
c.JupyterHub.ssl_cert = os.environ.get('JUPYTERHUB_CERT_PATH')
c.JupyterHub.ssl_key = os.environ.get('JUPYTERHUB_KEY_PATH')

# Configure the admin interface
admins_env = os.environ.get('JUPYTERHUB_ADMINS', '')
admins = tuple(admins_env.split()) if admins_env else ()
c.Authenticator.admin_users = admins
c.JupyterHub.admin_access = True
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
summit_script = """#!/bin/bash
#SBATCH --partition={partition}
#SBATCH --qos={qos}
#SBATCH --account={account}
#SBATCH --time={runtime}
#SBATCH --nodes={nodes}
#SBATCH --ntasks-per-node={ntasks}
#SBATCH --output={homedir}/.jupyterhub-slurmspawner.log
#SBATCH --open-mode=append
#SBATCH --job-name=spawner-jupyterhub
#SBATCH --workdir={homedir}
#SBATCH --export={keepvars}
#SBATCH --uid={username}
ml singularity/2.4.2
# jupyter-singleuser anticipates that environment will be dropped during sudo, however
# it is retained by batchspawner. The XDG_RUNTIME_DIR variable must be unset to force a
# fallback, otherwise a permissions error occurs when starting the notebook.
# https://github.com/jupyter/notebook/issues/1318
export SINGULARITYENV_JUPYTERHUB_API_TOKEN=$JUPYTERHUB_API_TOKEN
export SINGULARITYENV_XDG_RUNTIME_DIR=$HOME/.singularity-jupyter-run
export SINGULARITYENV_CONTAINER_PATH={image_path}
singularity run \
--bind /var/lib/sss/pipes \
--bind /home/slurm \
--bind /var/run/munge \
--bind /etc/slurm \
--bind /curc/slurm \
--bind /etc/pam.d \
$SINGULARITYENV_CONTAINER_PATH {cmd}
"""

spawner_config = {
'batch_script': summit_script,
'batch_submit_cmd': """sudo -E -u {username} SLURM_CONF=/curc/slurm/{cluster}/etc/slurm.conf sbatch""",
'batch_query_cmd': """sudo -E -u {username} SLURM_CONF=/curc/slurm/{cluster}/etc/slurm.conf squeue -h -j {job_id} -o "%T %B" """,
'batch_cancel_cmd': """sudo -E -u {username} SLURM_CONF=/curc/slurm/{cluster}/etc/slurm.conf scancel {job_id}""",
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
Bootstrap: docker
From: centos:7

%labels
MAINTAINER [email protected]

%environment
export JUPYTER_DATA_DIR=$HOME/.singularity-jupyter

%setup
mkdir -p /usr/local/share/profile_example-shas

%files
profile_example-shas/ /usr/local/share/profile_example-shas/

%post
# Install dependencies for Slurm, SSSD, and PAM
useradd -u 515 -m slurm
useradd -u 992 -m munge
yum -y update
yum -y install epel-release
yum -y groupinstall 'Development Tools'
yum -y install sssd curl wget strace iproute munge munge-devel pam-devel openssl openssl-devel readline-devel perl-devel
cd ~ && wget https://download.schedmd.com/slurm/slurm-17.02.9.tar.bz2
rpmbuild -ta slurm-17.02.9.tar.bz2
cd ~/rpmbuild/RPMS/x86_64
rpm -ivh slurm-pam_slurm-17.02.9-1.el7.centos.x86_64.rpm slurm-plugins-17.02.9-1.el7.centos.x86_64.rpm slurm-munge-17.02.9-1.el7.centos.x86_64.rpm slurm-perlapi-17.02.9-1.el7.centos.x86_64.rpm slurm-17.02.9-1.el7.centos.x86_64.rpm slurm-devel-17.02.9-1.el7.centos.x86_64.rpm

# Install Omnipath and OpenMPI user libraries for Summit
yum install -y libhfi1 libpsm2 libpsm2-devel libpsm2-compat
yum install -y perftest qperf
yum install -y libibverbs libibverbs-devel rdma
yum install -y numactl-libs numactl-devel
yum install -y pciutils
yum install -y which
wget https://download.open-mpi.org/release/open-mpi/v2.0/openmpi-2.0.1.tar.gz
tar -xf openmpi-2.0.1.tar.gz
cd openmpi-2.0.1/
./configure \
--with-verbs \
--with-psm2 \
--enable-mpi-thread-multiple
make -j2
make install

# Install Jupyter*
yum -y install python34-devel python34-pip
pip3 install --upgrade pip
pip3 install jupyterhub==0.7.2
pip3 install --upgrade notebook
pip3 install ipyparallel==6.2
pip3 install matplotlib numpy scipy pandas

%runscript
# Copy example ipyparallel profile to home directory
mkdir -p $HOME/.ipython
cp -rf /usr/local/share/profile_example-shas $HOME/.ipython/profile_example-shas
exec "$@"
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
# Configuration file for ipcluster.

c.IPClusterEngines.engine_launcher_class = 'ipyparallel.apps.launcher.SlurmEngineSetLauncher'
c.SlurmLauncher.qos = 'normal'
c.SlurmLauncher.timelimit = '1:00:00'

c.SlurmEngineSetLauncher.batch_template = """#!/bin/bash
#SBATCH --partition shas
#SBATCH --qos {qos}
#SBATCH --job-name ipengine
#SBATCH --ntasks {n}
# This will run a single ipengine per CPU
#SBATCH --cpus-per-task 1
# Use ntasks-per-node=1 to run one ipengine per node
#SBATCH --time {timelimit}
#SBATCH --output {profile_dir}/log/slurm.out
ml gcc singularity/2.4.2
ml openmpi/2.0.1
# Run each IPEngine in a new instance of the current container.
export SINGULARITYENV_JUPYTERHUB_API_TOKEN=$JUPYTERHUB_API_TOKEN
export SINGULARITYENV_XDG_RUNTIME_DIR=$HOME/.singularity-jupyter-run
export SINGULARITYENV_CONTAINER_PATH=$CONTAINER_PATH
mpirun singularity run \
--bind /var/lib/sss/pipes \
--bind /home/slurm \
--bind /var/run/munge \
--bind /etc/slurm \
--bind /etc/pam.d \
$CONTAINER_PATH \
ipengine --profile-dir="{profile_dir}" --cluster-id="{cluster_id}"
"""
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# Configuration file for ipcontroller.

c.HubFactory.ip = '*'
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
This is the IPython startup directory

.py and .ipy files in this directory will be run *prior* to any code or files specified
via the exec_lines or exec_files configurables whenever you load this profile.

Files will be run in lexicographical order, so you can control the execution order of files
with a prefix, e.g.::

00-first.py
50-middle.py
99-last.ipy

1 comment on commit 1c4ddef

@wixaw
Copy link

@wixaw wixaw commented on 1c4ddef Mar 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello,

It no longer works in 4.0.2, would you know how I could fix this?
I noticed that instead of
"singularity exec --bind $PWD:/run/user $SINGULARITYENV_CONTAINER_PATH jupyterhub-singleuser --ip=$HOSTNAME --port=42440 --notebook-dir=~ --NotebookApp.default_url=/lab"
we now have
"singularity exec --bind $PWD:/run/user $SINGULARITYENV_CONTAINER_PATH batchspawner-singleuser jupyterhub-singleuser ".
Thank you in advance
Best regards.

Please sign in to comment.