Skip to content

Research Clusters

Andrea Antonacci edited this page Aug 31, 2021 · 3 revisions

Computing clusters allow us to outsource computationally expensive jobs to external computer banks. If you are unsure of whether a job should be submitted to a cluster, first ask another RA and then proceed to MG and JMS if necessary.

SLURM

The Simple Linux Utility for Resource Management (SLURM) is used by most computing clusters. One can find a comprehensive guide on SLURM through its homepage and a quick cheatsheet on SLURM at this Slurm 101 website.

The basic workflow in these clusters is the same as in the local computer. One should always check/test the code carefully before submitting to the clusters.

Sherlock

Sherlock is the computing cluster for Stanford which uses SLURM. The following are some tips for quickly getting started on these clusters:

Logging in

You can access Sherlock from your terminal by running:

For first time logging in, please check this Sherlock guide to set up your credentials.

On Sherlock, we should clone our directories to personal folders in gentzkow that can be accessed via running the command cd $OAK.

Installing and loading software

Upon first logging into the cluster, you can install desired softwares by trying the followings in order:

  • use module avail to see if the software is already installed in the cluster. If so, use module load to load the software.
  • use brew install <name> if the software is available in Homebrew.
  • use wget and unzip/tar.
  • contact Sherlock Support support for the cluster if all the above methods fail.

Setting up the environment

You should follow the protocol defined by the points below to set up your environment.

  1. Explicitly specify your Python or Conda path at the beginning of your .bashrc script , by setting :
export PYTHONPATH=$OAK/<YourPythonPathInSherlock>/python3.6/site-packages:$PYTHONPATH

or if using Conda :

source <YourCondaPathInSherlock>/conda.sh
conda activate <CONDAENVIRONMENT>
  1. Load the needed modules.
  2. Initialize your environment as required by your project (usually in a repository README or wiki).

Testing

There are two steps of testing that you should follow.

  1. Test your bash script before submitting a job to be sure that the environment is set up accordingly.

One way to do this is to have your sbatch script call only a specific lib file. These files gather functions that are used throughout the repository, and hence run fast. However, they need your environment to be fully working to run and thus serve as a fast check for your environment setup.

In general, your jobs are assigned a node depending on their priority and the resources needed. If you allocate your job only to the gentzkow partition (see below) then it will run whenever the partition has a free node, regardless of the time or memory requested. However, if you opt for multiple partitions, then the allocation depends on the time and memory requested.

Since your environment testing is fast, we suggest that you adjust the time and memory needed and then allocate your job to multiple partitions : #SBATCH --partition=gentzkow,owners to access whichever is free. Remember to remove the owners partition to run your full job as you do not have priority on it.

  1. Test your full script by running it on a subset of the data. You can also interactively test out your code by running the command sdev -m16GB

Running jobs

After setting up your environment and testing your code you can submit a job. Cluster-specific guides on submitting and running jobs can be found here for Sherlock.

You should follow the protocol defined by the points below before submitting a job.

Stanford gentzkow node

MG has purchased resources on Stanford's Sherlock 2 cluster. Jobs submitted to this partition only compete with other lab members for resources, and interactive jobs can request all available resources. The partition consists of:

  • A single Dell C6320 server with 20 cores, 256G of RAM, and a 200G SSD.

You can submit all your jobs to multiple partitions. This is done by setting #SBATCH --partition=gentzkow,hns,normal at the beginning of your sbatch script.

Stanford Humanities and Sciences nodes

The Humanities and Sciences Dean's Office at Stanford has purchased Sherlock nodes for its researchers' exclusive use. These nodes belong to the hns partition. This partition consists of:

  • 10 CPU nodes, which have 64 GB of RAM and 16 multi-core CPUs, and
  • hns_gpu, a graphical processing unit node with 128 GB of RAM, a 8 Tesla K80 GPU @ 1.87 Tflops (double precision) and a 5.60 Tflops GPU (single precision), and 16 CPUs.
  • a large memory node with 1.5 TB of RAM. Job requests that require over 64 GB of RAM are automatically sent to this node.

Access these nodes by adding -p hns to our job submission requests. Similarly, we can run graphical processing unit (GPU) jobs by adding -p hns_gpu --gres gpu:1 to our request commands.

Clone this wiki locally