diff --git a/.github/workflows/test_and_build.yml b/.github/workflows/test_and_build.yml index 58e77188..ab5ea86d 100644 --- a/.github/workflows/test_and_build.yml +++ b/.github/workflows/test_and_build.yml @@ -62,7 +62,7 @@ jobs: HPC_JEKYLL_CONFIG: - Birmingham_Baskerville_slurm - ComputeCanada_Graham_slurm - - EPCC_Cirrus_pbs + - EPCC_Cirrus_slurm - HPCC_MagicCastle_slurm - Magic_Castle_EESSI_slurm - NIST_CTCMS_slurm diff --git a/_config.yml b/_config.yml index 36f9c871..b2785dfa 100644 --- a/_config.yml +++ b/_config.yml @@ -11,9 +11,9 @@ # `_includes/snippets_library`. To use one, replace options # below with those in `_config_options.yml` from the # library. E.g, to customise for Cirrus at EPCC, running -# PBS, we could replace the options below with those from +# Slurm, we could replace the options below with those from # -# _includes/snippets_library/EPCC_Cirrus_pbs/_config_options.yml +# _includes/snippets_library/EPCC_Cirrus_slurm/_config_options.yml # # If your cluster is not represented in the library, please # copy an existing folder, rename it, and customize for your @@ -74,6 +74,7 @@ sched: info: "sinfo" comment: "#SBATCH" hist: "sacct -u yourUsername" + hist_filter: "" episode_order: - 10-hpc-intro diff --git a/_episodes/14-modules.md b/_episodes/14-modules.md index c62828ec..08a80f6a 100644 --- a/_episodes/14-modules.md +++ b/_episodes/14-modules.md @@ -27,16 +27,16 @@ understand the reasoning behind this approach. The three biggest factors are: Software incompatibility is a major headache for programmers. Sometimes the presence (or absence) of a software package will break others that depend on -it. Two of the most famous examples are Python 2 and 3 and C compiler versions. +it. Two well known examples are Python and C compiler versions. Python 3 famously provides a `python` command that conflicts with that provided by Python 2. Software compiled against a newer version of the C libraries and -then used when they are not present will result in a nasty `'GLIBCXX_3.4.20' -not found` error, for instance. +then run on a machine that has older C libraries installed will result in a +nasty `'GLIBCXX_3.4.20' not found` error. Software versioning is another common issue. A team might depend on a certain package version for their research project - if the software version was to change (for instance, if a package was updated), it might affect their results. -Having access to multiple software versions allow a set of researchers to +Having access to multiple software versions allows a set of researchers to prevent software versioning issues from affecting their results. Dependencies are where a particular software package (or even a particular @@ -89,10 +89,7 @@ message telling you so ``` {: .language-bash} -``` -No Modulefiles Currently Loaded. -``` -{: .output} +{% include {{ site.snippets }}/modules/default-modules.snip %} ## Loading and Unloading Software @@ -198,7 +195,10 @@ Let's examine the output of `module avail` more closely. > > > > ``` > > {{ site.remote.bash_shebang }} -> > +> > {{ site.sched.comment }} {{ site.sched.flag.partition }}{% if site.sched.flag.qos %} +> > {{ site.sched.comment }} {{ site.sched.flag.qos }} +> > {% endif %}{{ site.sched.comment }} {{ site.sched.flag.time }} 00:00:30 +> > > > module load {{ site.remote.module_python3 }} > > > > python3 --version diff --git a/_includes/snippets_library/Birmingham_Baskerville_slurm/_config_options.yml b/_includes/snippets_library/Birmingham_Baskerville_slurm/_config_options.yml index e91ba0b3..471f4fa0 100644 --- a/_includes/snippets_library/Birmingham_Baskerville_slurm/_config_options.yml +++ b/_includes/snippets_library/Birmingham_Baskerville_slurm/_config_options.yml @@ -55,6 +55,7 @@ sched: info: "sinfo" comment: "#SBATCH" hist: "sacct -u $USER" + hist_filter: "" episode_order: - 10-hpc-intro diff --git a/_includes/snippets_library/Birmingham_Baskerville_slurm/modules/default-modules.snip b/_includes/snippets_library/Birmingham_Baskerville_slurm/modules/default-modules.snip new file mode 100644 index 00000000..a448dd96 --- /dev/null +++ b/_includes/snippets_library/Birmingham_Baskerville_slurm/modules/default-modules.snip @@ -0,0 +1,4 @@ +``` +No Modulefiles Currently Loaded. +``` +{: .output} diff --git a/_includes/snippets_library/EPCC_Cirrus_pbs/scheduler/terminate-multiple-jobs.snip b/_includes/snippets_library/Birmingham_Baskerville_slurm/modules/sbatch-options.snip similarity index 100% rename from _includes/snippets_library/EPCC_Cirrus_pbs/scheduler/terminate-multiple-jobs.snip rename to _includes/snippets_library/Birmingham_Baskerville_slurm/modules/sbatch-options.snip diff --git a/_includes/snippets_library/ComputeCanada_Graham_slurm/_config_options.yml b/_includes/snippets_library/ComputeCanada_Graham_slurm/_config_options.yml index b29ad4f4..2236091e 100644 --- a/_includes/snippets_library/ComputeCanada_Graham_slurm/_config_options.yml +++ b/_includes/snippets_library/ComputeCanada_Graham_slurm/_config_options.yml @@ -8,9 +8,9 @@ # `_includes/snippets_library`. To use one, replace options # below with those in `_config_options.yml` from the # library. E.g, to customise for Cirrus at EPCC, running -# PBS, we could replace the options below with those from +# Slurm, we could replace the options below with those from # -# _includes/snippets_library/EPCC_Cirrus_pbs/_config_options.yml +# _includes/snippets_library/EPCC_Cirrus_slurm/_config_options.yml # # If your cluster is not represented in the library, please # copy an existing folder, rename it, and customize for your @@ -55,6 +55,7 @@ sched: info: "sinfo" comment: "#SBATCH" hist: "sacct -u yourUsername" + hist_filter: "" episode_order: - 10-hpc-intro diff --git a/_includes/snippets_library/ComputeCanada_Graham_slurm/cluster/root-folders.snip b/_includes/snippets_library/ComputeCanada_Graham_slurm/cluster/root-folders.snip new file mode 100644 index 00000000..715de741 --- /dev/null +++ b/_includes/snippets_library/ComputeCanada_Graham_slurm/cluster/root-folders.snip @@ -0,0 +1,6 @@ +``` +bin etc lib64 proc sbin sys var +boot {{ site.remote.homedir | replace: "/", "" }} mnt root scratch tmp working +dev lib opt run srv usr +``` +{: .output} diff --git a/_includes/snippets_library/ComputeCanada_Graham_slurm/modules/default-modules.snip b/_includes/snippets_library/ComputeCanada_Graham_slurm/modules/default-modules.snip new file mode 100644 index 00000000..a448dd96 --- /dev/null +++ b/_includes/snippets_library/ComputeCanada_Graham_slurm/modules/default-modules.snip @@ -0,0 +1,4 @@ +``` +No Modulefiles Currently Loaded. +``` +{: .output} diff --git a/_includes/snippets_library/ComputeCanada_Graham_slurm/resources/hist-fields.snip b/_includes/snippets_library/ComputeCanada_Graham_slurm/resources/hist-fields.snip new file mode 100644 index 00000000..f0e215ba --- /dev/null +++ b/_includes/snippets_library/ComputeCanada_Graham_slurm/resources/hist-fields.snip @@ -0,0 +1,6 @@ +* **Hostname**: Where did your job run? +* **MaxRSS**: What was the maximum amount of memory used? +* **Elapsed**: How long did the job take? +* **State**: What is the job currently doing/what happened to it? +* **MaxDiskRead**: Amount of data read from disk. +* **MaxDiskWrite**: Amount of data written to disk. diff --git a/_includes/snippets_library/ComputeCanada_Graham_slurm/scheduler/email-notifications.snip b/_includes/snippets_library/ComputeCanada_Graham_slurm/scheduler/email-notifications.snip new file mode 100644 index 00000000..e681b3c0 --- /dev/null +++ b/_includes/snippets_library/ComputeCanada_Graham_slurm/scheduler/email-notifications.snip @@ -0,0 +1,19 @@ +> Jobs on an HPC system might run for days or even weeks. We probably have +> better things to do than constantly check on the status of our job with +> `{{ site.sched.status }}`. Looking at the manual page for +> `{{ site.sched.submit.name }}`, can you set up our test job to send you an email +> when it finishes? +> +> > ## Hint +> > +> > You can use the *manual pages* for {{ site.sched.name }} utilities to find +> > more about their capabilities. On the command line, these are accessed +> > through the `man` utility: run `man `. You can find the same +> > information online by searching > "man ". +> > +> > ``` +> > {{ site.remote.prompt }} man {{ site.sched.submit.name }} +> > ``` +> > {: .language-bash} +> {: .solution} +{: .challenge} diff --git a/_includes/snippets_library/EPCC_Cirrus_pbs/_config_options.yml b/_includes/snippets_library/EPCC_Cirrus_pbs/_config_options.yml deleted file mode 100644 index 6a9cb8b5..00000000 --- a/_includes/snippets_library/EPCC_Cirrus_pbs/_config_options.yml +++ /dev/null @@ -1,69 +0,0 @@ -#------------------------------------------------------------ -# EPCC, The University of Edinburgh: Cirrus + PBS Pro -#------------------------------------------------------------ - -# Cluster host and scheduler options: the defaults come from -# Graham at Compute Canada, running Slurm. Other options can -# be found in the library of snippets, -# `_includes/snippets_library`. To use one, replace options -# below with those in `_config_options.yml` from the -# library. E.g, to customise for Cirrus at EPCC, running -# PBS, we could replace the options below with those from -# -# _includes/snippets_library/EPCC_Cirrus_pbs/_config_options.yml -# -# If your cluster is not represented in the library, please -# copy an existing folder, rename it, and customize for your -# installation. Remember to keep the leading slash on the -# `snippets` variable below! - -snippets: "/snippets_library/EPCC_Cirrus_pbs" - -local: - prompt: "[user@laptop ~]$" - bash_shebang: "#!/usr/bin/env bash" - -remote: - name: "Cirrus" - login: "login.cirrus.ac.uk" - host: "cirrus-login0" - node: "r1i0n32" - location: "EPCC, The University of Edinburgh" - homedir: "/lustre/home/tc001/lola" - user: "yourUsername" - prompt: "[yourUsername@cirrus-login0 ~]$" - bash_shebang: "#!/usr/bin/env bash" - -sched: - name: "PBS Pro" - submit: - name: "qsub" - options: "-A tc001 -q R387726" - iopt: "" - queue: - debug: "standard" - testing: "standard" - status: "qstat" - flag: - user: "-u yourUsername" - interactive: "-IVl select=1:ncpus=1" - name: "-N" - histdetail: "-f" - time: "-l walltime" - queue: "-q" - del: "qdel" - interactive: "qsub" - info: "pbsnodes -a" - comment: "#PBS" - hist: "tracejob" - -episode_order: - - 10-hpc-intro - - 11-connecting - - 12-cluster - - 13-scheduler - - 14-modules - - 15-transferring-files - - 16-parallel - - 17-resources - - 18-responsibility diff --git a/_includes/snippets_library/EPCC_Cirrus_pbs/cluster/queue-info.snip b/_includes/snippets_library/EPCC_Cirrus_pbs/cluster/queue-info.snip deleted file mode 100644 index 3953b85e..00000000 --- a/_includes/snippets_library/EPCC_Cirrus_pbs/cluster/queue-info.snip +++ /dev/null @@ -1,23 +0,0 @@ -``` -{{ site.remote.node }} - Mom = {{ site.remote.node }}.ib0.icexa.epcc.ed.ac.uk - ntype = PBS - state = offline - pcpus = 72 - resources_available.arch = linux - resources_available.host = {{ site.remote.node }} - resources_available.mem = 263773892kb - resources_available.ncpus = 36 - resources_available.vnode = {{ site.remote.node }} - resources_assigned.accelerator_memory = 0kb - resources_assigned.mem = 0kb - resources_assigned.naccelerators = 0 - resources_assigned.ncpus = 0 - resources_assigned.netwins = 0 - resources_assigned.vmem = 0kb - resv_enable = True - sharing = default_shared - license = l -... -``` -{: .output} diff --git a/_includes/snippets_library/EPCC_Cirrus_pbs/modules/available-modules.snip b/_includes/snippets_library/EPCC_Cirrus_pbs/modules/available-modules.snip deleted file mode 100644 index 049d3076..00000000 --- a/_includes/snippets_library/EPCC_Cirrus_pbs/modules/available-modules.snip +++ /dev/null @@ -1,12 +0,0 @@ -``` ------------- /usr/share/Modules/modulefiles ----------- -dot module-info mpt_2.16 perfboost use.own -module-git modules null perfcatcher - ---------------- /lustre/sw/modulefiles ---------------- -abinit/8.2.3-intel17-mpt214(default) -allinea/7.0.0(default) -altair-hwsolvers/13.0.213 -... -``` -{: .output} diff --git a/_includes/snippets_library/EPCC_Cirrus_pbs/modules/missing-python.snip b/_includes/snippets_library/EPCC_Cirrus_pbs/modules/missing-python.snip deleted file mode 100644 index 7d6ef7c4..00000000 --- a/_includes/snippets_library/EPCC_Cirrus_pbs/modules/missing-python.snip +++ /dev/null @@ -1,7 +0,0 @@ -``` -/usr/bin/which: no python3 in (/lustre/home/z04/aturner/miniconda2/bin: -/opt/sgi/sbin:/opt/sgi/bin:/usr/lib64/qt-3.3/bin:/opt/pbs/default/bin: -/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/opt/c3/bin:/sbin:/bin: -/lustre/home/z04/aturner/bin) -``` -{: .output} diff --git a/_includes/snippets_library/EPCC_Cirrus_pbs/modules/module-load-python.snip b/_includes/snippets_library/EPCC_Cirrus_pbs/modules/module-load-python.snip deleted file mode 100644 index bc13fe96..00000000 --- a/_includes/snippets_library/EPCC_Cirrus_pbs/modules/module-load-python.snip +++ /dev/null @@ -1,5 +0,0 @@ -``` -{{ site.remote.prompt }} module load anaconda/python3 -{{ site.remote.prompt }} which python3 -``` -{: .language-bash} diff --git a/_includes/snippets_library/EPCC_Cirrus_pbs/modules/python-executable-dir.snip b/_includes/snippets_library/EPCC_Cirrus_pbs/modules/python-executable-dir.snip deleted file mode 100644 index cefef21f..00000000 --- a/_includes/snippets_library/EPCC_Cirrus_pbs/modules/python-executable-dir.snip +++ /dev/null @@ -1,4 +0,0 @@ -``` -/lustre/sw/anaconda/anaconda3-5.1.0/bin/python3 -``` -{: .output} diff --git a/_includes/snippets_library/EPCC_Cirrus_pbs/modules/python-ls-dir-command.snip b/_includes/snippets_library/EPCC_Cirrus_pbs/modules/python-ls-dir-command.snip deleted file mode 100644 index e7019b19..00000000 --- a/_includes/snippets_library/EPCC_Cirrus_pbs/modules/python-ls-dir-command.snip +++ /dev/null @@ -1,4 +0,0 @@ -``` -{{ site.remote.prompt }} ls /lustre/sw/anaconda/anaconda3-5.1.0/bin -``` -{: .language-bash} diff --git a/_includes/snippets_library/EPCC_Cirrus_pbs/modules/python-ls-dir-output.snip b/_includes/snippets_library/EPCC_Cirrus_pbs/modules/python-ls-dir-output.snip deleted file mode 100644 index 6235d6d8..00000000 --- a/_includes/snippets_library/EPCC_Cirrus_pbs/modules/python-ls-dir-output.snip +++ /dev/null @@ -1,15 +0,0 @@ -``` -[output truncated] - -2to3 Modules_module-info pip3 python3.5m -2to3-3.5 Modules_modules pip3.5 python3.5m-config -easy_install Modules_mpt_2.16 pydoc3 python3-config -easy_install-3.5 Modules_null pydoc3.5 pyvenv -idle3 Modules_perfboost python pyvenv-3.5 -idle3.5 Modules_perfcatcher python3 virtualenv -Modules_dot Modules_use.own python3.5 wheel -Modules_module-git pip python3.5-config - -[output truncated] -``` -{: .output} diff --git a/_includes/snippets_library/EPCC_Cirrus_pbs/modules/python-module-path.snip b/_includes/snippets_library/EPCC_Cirrus_pbs/modules/python-module-path.snip deleted file mode 100644 index 15e6235d..00000000 --- a/_includes/snippets_library/EPCC_Cirrus_pbs/modules/python-module-path.snip +++ /dev/null @@ -1,4 +0,0 @@ -``` -/lustre/home/z04/aturner/miniconda2/bin:/opt/sgi/sbin:/opt/sgi/bin:/usr/lib64/qt-3.3/bin:/opt/pbs/default/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/opt/c3/bin:/sbin:/bin:/lustre/home/z04/aturner/bin -``` -{: .output} diff --git a/_includes/snippets_library/EPCC_Cirrus_pbs/modules/software-dependencies.snip b/_includes/snippets_library/EPCC_Cirrus_pbs/modules/software-dependencies.snip deleted file mode 100644 index a6cae294..00000000 --- a/_includes/snippets_library/EPCC_Cirrus_pbs/modules/software-dependencies.snip +++ /dev/null @@ -1,53 +0,0 @@ -To demonstrate, let's load the `abinit` module and then use the `module list` -command to show which modules we currently have loaded in our environment. -([Abinit](https://www.abinit.org/) is an open source materials science -modelling software package.) - -``` -{{ site.remote.prompt }} module load abinit -{{ site.remote.prompt }} module list -``` -{: .language-bash} - -``` -Currently Loaded Modulefiles: - 1) anaconda/python3 6) intel-cmkl-17/17.0.2.174 - 2) mpt/2.16 7) gcc/6.2.0 - 3) intel-cc-17/17.0.2.174 8) fftw-3.3.5-intel-17.0.2-dxt2dzn - 4) intel-fc-17/17.0.2.174 9) netcdf/4.4.1 - 5) intel-compilers-17/17.0.2.174 10) abinit/8.2.3-intel17-mpt214 -``` -{: .output} - -So in this case, loading the `abinit` module also loaded a variety of other -modules. Let's try unloading the `abinit` package. - -``` -{{ site.remote.prompt }} module unload abinit -{{ site.remote.prompt }} module list -``` -{: .language-bash} - -``` -Currently Loaded Modulefiles: - 1) anaconda/python3 -``` -{: .output} - -So using `module unload` "un-loads" a module along with its dependencies. If we -wanted to unload everything at once, we could run `module purge` (unloads -everything). - -``` -{{ site.remote.prompt }} module load abinit -{{ site.remote.prompt }} module purge -``` -{: .language-bash} - -``` -No Modulefiles Currently Loaded. -``` -{: .output} - -Note that `module purge` has removed the `anaconda/python3` module as well as -`abinit` and its dependencies. diff --git a/_includes/snippets_library/EPCC_Cirrus_pbs/parallel/four-tasks-jobscript.snip b/_includes/snippets_library/EPCC_Cirrus_pbs/parallel/four-tasks-jobscript.snip deleted file mode 100644 index e1ba5c5c..00000000 --- a/_includes/snippets_library/EPCC_Cirrus_pbs/parallel/four-tasks-jobscript.snip +++ /dev/null @@ -1,13 +0,0 @@ -``` -{{ site.remote.bash_shebang }} -{{ site.sched.comment }} {{ site.sched.flag.name }} parallel-job -{{ site.sched.comment }} {{ site.sched.flag.queue }} {{ site.sched.queue.testing }} -{{ site.sched.comment }} -l nodes=1:ppn=4 - -# Load the computing environment we need -module load python3 - -# Execute the task -mpiexec amdahl -``` -{: .language-bash} diff --git a/_includes/snippets_library/EPCC_Cirrus_pbs/resources/account-history.snip b/_includes/snippets_library/EPCC_Cirrus_pbs/resources/account-history.snip deleted file mode 100644 index fffcbf19..00000000 --- a/_includes/snippets_library/EPCC_Cirrus_pbs/resources/account-history.snip +++ /dev/null @@ -1,12 +0,0 @@ -``` -{{ site.remote.host }}: - Req'd Req'd -Job ID Username Queue Jobname SessID NDS TSK Memory Time S -------------------- -------- -------- ---------- ------ --- --- ------ ----- - -324396.{{ site.remote.host }} user workq test1 57348 1 1 - -324397.{{ site.remote.host }} user workq test2 57456 1 1 - -324401.{{ site.remote.host }} user workq test3 58159 1 1 - -324410.{{ site.remote.host }} user workq test4 34027 1 1 - -324418.{{ site.remote.host }} user workq test5 35243 1 1 - -``` -{: .output} diff --git a/_includes/snippets_library/EPCC_Cirrus_pbs/resources/hist_fields.snip b/_includes/snippets_library/EPCC_Cirrus_pbs/resources/hist_fields.snip deleted file mode 100644 index 54a46fef..00000000 --- a/_includes/snippets_library/EPCC_Cirrus_pbs/resources/hist_fields.snip +++ /dev/null @@ -1,6 +0,0 @@ -* **exec_vnode** - Where did your job run? -* **resources_used.walltime** - How long did the job take? -* **comment** - Any notes on success or errors in the job -* **Output_Path** - The file that stdout from the job was sent to -* **Resource_List.** - Set of resources requested by the job -* **resources_used.** - Set of resources used by the job diff --git a/_includes/snippets_library/EPCC_Cirrus_pbs/scheduler/basic-job-status.snip b/_includes/snippets_library/EPCC_Cirrus_pbs/scheduler/basic-job-status.snip deleted file mode 100644 index 78151c7d..00000000 --- a/_includes/snippets_library/EPCC_Cirrus_pbs/scheduler/basic-job-status.snip +++ /dev/null @@ -1,12 +0,0 @@ -``` -{{ site.remote.host }}: - Req'd Req'd Elap -Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time ------- -------- -------- -------------- ------ --- --- ------ ----- - ----- -387775 yourUser workq example-job.sh 50804 1 1 -- 96:00 R 00:00 -``` -{: .output} - -We can see all the details of our job, most importantly that it is in the `R` -or `RUNNING` state. Sometimes our jobs might need to wait in a queue -(`PENDING`) or have an error (`E`). diff --git a/_includes/snippets_library/EPCC_Cirrus_pbs/scheduler/job-with-name-status.snip b/_includes/snippets_library/EPCC_Cirrus_pbs/scheduler/job-with-name-status.snip deleted file mode 100644 index fe467110..00000000 --- a/_includes/snippets_library/EPCC_Cirrus_pbs/scheduler/job-with-name-status.snip +++ /dev/null @@ -1,8 +0,0 @@ -``` -38778.{{ site.remote.host }} - Req'd Req'd Elap -Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time ------- -------- -------- ---------- ------ --- --- ------ ----- - ----- -38778 yourUser workq hello-worl 51536 1 1 -- 96:00 R 00:00 -``` -{: .output} diff --git a/_includes/snippets_library/EPCC_Cirrus_pbs/scheduler/option-flags-list.snip b/_includes/snippets_library/EPCC_Cirrus_pbs/scheduler/option-flags-list.snip deleted file mode 100644 index 93977169..00000000 --- a/_includes/snippets_library/EPCC_Cirrus_pbs/scheduler/option-flags-list.snip +++ /dev/null @@ -1,10 +0,0 @@ -* `-l select=:ncpus=` — how many nodes does your - job need and how many cores per node? Note that there are 36 cores per node - on Cirrus. - -* `-l walltime=` — How much real-world time - (walltime) will your job take to run? - -* `-l place=scatter:excl` — Reserve your nodes just for yourself. (If you - are using full nodes, you should include this as it stops other users from - interfering with the performance of your job.) diff --git a/_includes/snippets_library/EPCC_Cirrus_pbs/scheduler/runtime-exceeded-job.snip b/_includes/snippets_library/EPCC_Cirrus_pbs/scheduler/runtime-exceeded-job.snip deleted file mode 100644 index d275d2e9..00000000 --- a/_includes/snippets_library/EPCC_Cirrus_pbs/scheduler/runtime-exceeded-job.snip +++ /dev/null @@ -1,4 +0,0 @@ -``` -{{ site.remote.prompt }} cat example-job.sh.e387798 -``` -{: .language-bash} diff --git a/_includes/snippets_library/EPCC_Cirrus_pbs/scheduler/runtime-exceeded-output.snip b/_includes/snippets_library/EPCC_Cirrus_pbs/scheduler/runtime-exceeded-output.snip deleted file mode 100644 index f33a7dc4..00000000 --- a/_includes/snippets_library/EPCC_Cirrus_pbs/scheduler/runtime-exceeded-output.snip +++ /dev/null @@ -1,4 +0,0 @@ -``` -=>> PBS: job killed: walltime 33 exceeded limit 30 -``` -{: .output} diff --git a/_includes/snippets_library/EPCC_Cirrus_pbs/scheduler/terminate-job-begin.snip b/_includes/snippets_library/EPCC_Cirrus_pbs/scheduler/terminate-job-begin.snip deleted file mode 100644 index 53666196..00000000 --- a/_includes/snippets_library/EPCC_Cirrus_pbs/scheduler/terminate-job-begin.snip +++ /dev/null @@ -1,10 +0,0 @@ -``` -38759.{{ site.remote.host }} - -indy2-login0: - Req'd Req'd Elap -Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time ------- -------- -------- -------------- ------ --- --- ------ ----- - ----- -38759 yourUser workq example-job.sh 32085 1 1 -- 00:10 R 00:00 -``` -{: .output} diff --git a/_includes/snippets_library/EPCC_Cirrus_pbs/scheduler/terminate-job-cancel.snip b/_includes/snippets_library/EPCC_Cirrus_pbs/scheduler/terminate-job-cancel.snip deleted file mode 100644 index 69753894..00000000 --- a/_includes/snippets_library/EPCC_Cirrus_pbs/scheduler/terminate-job-cancel.snip +++ /dev/null @@ -1,4 +0,0 @@ -``` -...(no output from qstat when there are no jobs to display)... -``` -{: .output} diff --git a/_includes/snippets_library/EPCC_Cirrus_slurm/_config_options.yml b/_includes/snippets_library/EPCC_Cirrus_slurm/_config_options.yml new file mode 100644 index 00000000..2a4587e3 --- /dev/null +++ b/_includes/snippets_library/EPCC_Cirrus_slurm/_config_options.yml @@ -0,0 +1,75 @@ +#------------------------------------------------------------ +# EPCC, The University of Edinburgh: Cirrus + Slurm +#------------------------------------------------------------ + +# Cluster host and scheduler options: the defaults come from +# Graham at Compute Canada, running Slurm. Other options can +# be found in the library of snippets, +# `_includes/snippets_library`. To use one, replace options +# below with those in `_config_options.yml` from the +# library. E.g, to customise for Cirrus at EPCC, running +# Slurm, we could replace the options below with those from +# +# _includes/snippets_library/EPCC_Cirrus_slurm/_config_options.yml +# +# If your cluster is not represented in the library, please +# copy an existing folder, rename it, and customize for your +# installation. Remember to keep the leading slash on the +# `snippets` variable below! + +snippets: "/snippets_library/EPCC_Cirrus_slurm" + +local: + prompt: "[auser@laptop ~]$" + bash_shebang: "#!/bin/bash" + +remote: + name: "Cirrus" + login: "login.cirrus.ac.uk" + host: "cirrus-login1" + node: "r1i0n32" + location: "EPCC, The University of Edinburgh" + homedir: "/lustre/home/tc001" + user: "auser" + group: "tc001" + prompt: "[auser@cirrus-login1 ~]$" + bash_shebang: "#!/bin/bash" + module_python3: "anaconda/python3-2021.11" + +sched: + name: "Slurm" + submit: + name: "sbatch" + options: "--partition=standard --qos=standard --time=00:02:00" + queue: + debug: "debug" + testing: "testing" + status: "squeue" + flag: + user: "-u auser" + interactive: "--time=00:20:00 --partition=standard --qos=standard --pty /usr/bin/bash --login" + histdetail: "-l -j" + name: "-J" + partition: "-p standard" + qos: "-q standard" + time: "-t" + queue: "-p" + nodes: "-N" + tasks: "-n" + del: "scancel" + interactive: "srun" + info: "sinfo" + comment: "#SBATCH" + hist: "sacct" + hist_filter: "--format=JobID,JobName,State,Elapsed,NodeList,MaxRSS,MaxDiskRead,MaxDiskWrite" + +episode_order: + - 10-hpc-intro + - 11-connecting + - 12-cluster + - 13-scheduler + - 14-modules + - 15-transferring-files + - 16-parallel + - 17-resources + - 18-responsibility diff --git a/_includes/snippets_library/EPCC_Cirrus_slurm/cluster/queue-info.snip b/_includes/snippets_library/EPCC_Cirrus_slurm/cluster/queue-info.snip new file mode 100644 index 00000000..43331c97 --- /dev/null +++ b/_includes/snippets_library/EPCC_Cirrus_slurm/cluster/queue-info.snip @@ -0,0 +1,17 @@ +``` +PARTITION AVAIL TIMELIMIT NODES STATE NODELIST +standard up 4-00:00:00 4 resv r1i0n[0-1],r1i2n[18-19] +standard up 4-00:00:00 6 mix r1i0n[31,33],r1i3n30,r1i4n5,r1i5n... +standard up 4-00:00:00 187 alloc r1i0n[2,5,13,18-30,32,34-35],r1i1... +standard up 4-00:00:00 83 idle r1i0n[3-4,6-12,14-17],r1i3n[4,9,1... +gpu-skylake up 20:00 1 mix r2i3n0 +gpu-skylake up 20:00 1 idle r2i3n1 +gpu-cascade up 20:00 2 maint r2i7n[7-8] +gpu-cascade up 20:00 1 resv r2i5n5 +gpu-cascade up 20:00 4 mix r2i4n[3,8],r2i5n[0,4] +gpu-cascade up 20:00 10 alloc r2i4n[0,2,4,6-7],r2i5n[6-8],r2i6n... +gpu-cascade up 20:00 19 idle r2i4n[1,5],r2i5n[1-3],r2i6n[0-2,4... +tds up 4-00:00:00 4 idle r1i4n[8,17,26,35] +gpu-tds up 10:00 2 maint r2i7n[7-8] +``` +{: .output} diff --git a/_includes/snippets_library/EPCC_Cirrus_slurm/cluster/root-folders.snip b/_includes/snippets_library/EPCC_Cirrus_slurm/cluster/root-folders.snip new file mode 100644 index 00000000..7324c9c5 --- /dev/null +++ b/_includes/snippets_library/EPCC_Cirrus_slurm/cluster/root-folders.snip @@ -0,0 +1,7 @@ +``` +backports beegfs bin boot data dev etc +home lib lib64 lost+found lustre media mnt +opt proc root run sbin scratch srv +sys tmp usr var +``` +{: .output} diff --git a/_includes/snippets_library/EPCC_Cirrus_pbs/cluster/specific-node-info.snip b/_includes/snippets_library/EPCC_Cirrus_slurm/cluster/specific-node-info.snip similarity index 66% rename from _includes/snippets_library/EPCC_Cirrus_pbs/cluster/specific-node-info.snip rename to _includes/snippets_library/EPCC_Cirrus_slurm/cluster/specific-node-info.snip index ca334755..6810e285 100644 --- a/_includes/snippets_library/EPCC_Cirrus_pbs/cluster/specific-node-info.snip +++ b/_includes/snippets_library/EPCC_Cirrus_slurm/cluster/specific-node-info.snip @@ -2,10 +2,10 @@ > > Finally, let's look at the resources available on the worker nodes where your > jobs will actually run. Try running this command to see the name, CPUs and -> memory available on the worker nodes: +> memory (in MB) available on the worker nodes: > > ``` -> {{ site.remote.prompt }} pbsnodes {{ site.remote.node }} +> {{ site.remote.prompt }} sinfo -n {{ site.remote.node }} -o "%n %c %m" > ``` > {: .language-bash} {: .challenge} diff --git a/_includes/snippets_library/EPCC_Cirrus_slurm/modules/available-modules.snip b/_includes/snippets_library/EPCC_Cirrus_slurm/modules/available-modules.snip new file mode 100644 index 00000000..57e19238 --- /dev/null +++ b/_includes/snippets_library/EPCC_Cirrus_slurm/modules/available-modules.snip @@ -0,0 +1,10 @@ +``` +------------------------------------------------- /lustre/sw/modulefiles -------------------------------------------------- +altair-hwsolvers/13.0.213 flacs-cfd/21.1 intel-19.5/mpi libxkbcommon/1.0.1(default) openmpi/4.1.0-cuda-11.2 +altair-hwsolvers/14.0.210 flacs-cfd/21.2 intel-19.5/pxse matlab/R2019a perf/1.0.0 +anaconda/python2 flacs/10.9.1 intel-19.5/tbb matlab/R2019b petsc/3.13.2-intel-mpi-18 +anaconda/python3 flex/2.6.4 intel-19.5/vtune matlab/R2020b(default) petsc/3.13.2-mpt +anaconda/python3-2021.11 forge/20.0.0(default) intel-20.4/cc matlab/R2021b quantum-espresso/6.5-intel +... ... ... ... ... +``` +{: .output} diff --git a/_includes/snippets_library/EPCC_Cirrus_slurm/modules/default-modules.snip b/_includes/snippets_library/EPCC_Cirrus_slurm/modules/default-modules.snip new file mode 100644 index 00000000..b4237781 --- /dev/null +++ b/_includes/snippets_library/EPCC_Cirrus_slurm/modules/default-modules.snip @@ -0,0 +1,5 @@ +``` +Currently Loaded Modulefiles: + 1) git/2.21.0(default) 2) epcc/utils 3) /lustre/sw/modulefiles/epcc/setup-env +``` +{: .output} diff --git a/_includes/snippets_library/EPCC_Cirrus_pbs/scheduler/basic-job-script.snip b/_includes/snippets_library/EPCC_Cirrus_slurm/modules/missing-python.snip similarity index 54% rename from _includes/snippets_library/EPCC_Cirrus_pbs/scheduler/basic-job-script.snip rename to _includes/snippets_library/EPCC_Cirrus_slurm/modules/missing-python.snip index 01363e40..584eae91 100644 --- a/_includes/snippets_library/EPCC_Cirrus_pbs/scheduler/basic-job-script.snip +++ b/_includes/snippets_library/EPCC_Cirrus_slurm/modules/missing-python.snip @@ -1,4 +1,4 @@ ``` -387775 +/usr/bin/python3 ``` {: .output} diff --git a/_includes/snippets_library/EPCC_Cirrus_slurm/modules/module-load-python.snip b/_includes/snippets_library/EPCC_Cirrus_slurm/modules/module-load-python.snip new file mode 100644 index 00000000..d9bab7b4 --- /dev/null +++ b/_includes/snippets_library/EPCC_Cirrus_slurm/modules/module-load-python.snip @@ -0,0 +1,5 @@ +``` +{{ site.remote.prompt }} module load {{ site.remote.module_python3 }} +{{ site.remote.prompt }} which python3 +``` +{: .language-bash} diff --git a/_includes/snippets_library/EPCC_Cirrus_slurm/modules/python-executable-dir.snip b/_includes/snippets_library/EPCC_Cirrus_slurm/modules/python-executable-dir.snip new file mode 100644 index 00000000..f04c8908 --- /dev/null +++ b/_includes/snippets_library/EPCC_Cirrus_slurm/modules/python-executable-dir.snip @@ -0,0 +1,4 @@ +``` +/lustre/sw/anaconda/anaconda3-2021.11/bin/python3 +``` +{: .output} diff --git a/_includes/snippets_library/EPCC_Cirrus_slurm/modules/python-ls-dir-command.snip b/_includes/snippets_library/EPCC_Cirrus_slurm/modules/python-ls-dir-command.snip new file mode 100644 index 00000000..f299be46 --- /dev/null +++ b/_includes/snippets_library/EPCC_Cirrus_slurm/modules/python-ls-dir-command.snip @@ -0,0 +1,4 @@ +``` +{{ site.remote.prompt }} ls /lustre/sw/anaconda/anaconda3-2021.11/bin +``` +{: .language-bash} diff --git a/_includes/snippets_library/EPCC_Cirrus_slurm/modules/python-ls-dir-output.snip b/_includes/snippets_library/EPCC_Cirrus_slurm/modules/python-ls-dir-output.snip new file mode 100644 index 00000000..637ea953 --- /dev/null +++ b/_includes/snippets_library/EPCC_Cirrus_slurm/modules/python-ls-dir-output.snip @@ -0,0 +1,13 @@ +``` +2to3 derb h5fc libtool pyflakes sphinx-quickstart +2to3-3.9 designer h5format_convert libtoolize pyftmerge spyder +acountry djpeg h5import linguist pyftsubset sqlite3 +activate dltest h5jam linkicc pygmentize sqlite3_analyzer +adig dwebp h5ls list_instances pyjson5 symilar +aec dynamodb_dump h5mkgrp lrelease pylint syncqt.pl +ahost dynamodb_load h5perf_serial lsm2bin pylsp tabs +anaconda elbadmin h5redeploy lss3 pylupdate5 taskadmin +anaconda-navigator epylint h5repack lupdate pyrcc5 tclsh +... ... ... ... ... ... +``` +{: .output} diff --git a/_includes/snippets_library/EPCC_Cirrus_slurm/modules/python-module-path.snip b/_includes/snippets_library/EPCC_Cirrus_slurm/modules/python-module-path.snip new file mode 100644 index 00000000..60de9cfd --- /dev/null +++ b/_includes/snippets_library/EPCC_Cirrus_slurm/modules/python-module-path.snip @@ -0,0 +1,4 @@ +``` +/lustre/sw/anaconda/anaconda3-2021.11/bin:/lustre/sw/spack-cirrus/opt/spack/linux-centos7-x86_64/gcc-8.2.0/git-2.21.0-rcchd4zgfdherdlklrr2y3amq7p73svi/bin:/lustre/sw/epcc-utils/bin:/opt/clmgr/sbin:/opt/clmgr/bin:/opt/sgi/sbin:/opt/sgi/bin:/usr/share/Modules/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/opt/c3/bin:/sbin:/bin:/lustre/home/tc001/auser/.local/bin +``` +{: .output} diff --git a/_includes/snippets_library/EPCC_Cirrus_slurm/modules/software-dependencies.snip b/_includes/snippets_library/EPCC_Cirrus_slurm/modules/software-dependencies.snip new file mode 100644 index 00000000..7a8e5e3b --- /dev/null +++ b/_includes/snippets_library/EPCC_Cirrus_slurm/modules/software-dependencies.snip @@ -0,0 +1,35 @@ +To demonstrate, let's load the `namd` module and then use the `module list` +command to show which modules we currently have loaded in our environment. +([NAMD](https://www.ks.uiuc.edu/Research/namd/) is parallel molecular dynamics code +designed for high-performance simulation of large biomolecular systems.) + +``` +{{ site.remote.prompt }} module load namd +{{ site.remote.prompt }} module list +``` +{: .language-bash} + +``` +Currently Loaded Modulefiles: + 1) git/2.21.0(default) 2) epcc/utils 3) /lustre/sw/modulefiles/epcc/setup-env + 4) gcc/8.2.0 5) intel-license 6) intel-mpi-19/19.0.0.117 + 7) fftw/3.3.9-impi19-gcc8 8) namd/2.14(default) +``` +{: .output} + +So in this case, loading the `namd` module also loaded a variety of other +modules. Let's try unloading the `namd` package. + +``` +{{ site.remote.prompt }} module unload namd +{{ site.remote.prompt }} module list +``` +{: .language-bash} + +``` +Currently Loaded Modulefiles: + 1) git/2.21.0(default) 2) epcc/utils 3) /lustre/sw/modulefiles/epcc/setup-env +``` +{: .output} + +So using `module unload` "un-loads" a module along with its dependencies. diff --git a/_includes/snippets_library/EPCC_Cirrus_pbs/modules/wrong-gcc-version.snip b/_includes/snippets_library/EPCC_Cirrus_slurm/modules/wrong-gcc-version.snip similarity index 73% rename from _includes/snippets_library/EPCC_Cirrus_pbs/modules/wrong-gcc-version.snip rename to _includes/snippets_library/EPCC_Cirrus_slurm/modules/wrong-gcc-version.snip index 39f37a33..424703d5 100644 --- a/_includes/snippets_library/EPCC_Cirrus_pbs/modules/wrong-gcc-version.snip +++ b/_includes/snippets_library/EPCC_Cirrus_slurm/modules/wrong-gcc-version.snip @@ -1,10 +1,10 @@ Let's take a closer look at the `gcc` module. GCC is an extremely widely used C/C++/Fortran compiler. Lots of software is dependent on the GCC version, and might not compile or run if the wrong version is loaded. In this case, there -are three different versions: `gcc/6.2.0`, `gcc/6.3.0` and `gcc/7.2.0`. How do -we load each copy and which copy is the default? +are four different versions: `gcc/6.2.0`, `gcc/6.3.0`, `gcc/8.2.0` and `gcc/10.2.0`. +How do we load each copy and which copy is the default? -In this case, `gcc/6.2.0` has a `(default)` next to it. This indicates that it +In this case, `gcc/6.3.0` has a `(default)` next to it. This indicates that it is the default - if we type `module load gcc`, this is the copy that will be loaded. @@ -15,7 +15,7 @@ loaded. {: .language-bash} ``` -gcc (GCC) 6.2.0 +gcc (GCC) 6.3.0 Copyright (C) 2016 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. @@ -24,19 +24,18 @@ warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. So how do we load the non-default copy of a software package? In this case, the only change we need to make is be more specific about the module we are -loading. There are three GCC modules: `gcc/6.2.0`, `gcc/6.3.0` and `gcc/7.2.0` +loading. There are four GCC modules: `gcc/6.2.0`, `gcc/6.3.0`, `gcc/8.2.0` and `gcc/10.2.0` To load a non-default module, we need to make add the version number after the `/` in our `module load` command ``` -{{ site.remote.prompt }} module load gcc/7.2.0 +{{ site.remote.prompt }} module load gcc/10.2.0 ``` {: .language-bash} ``` -gcc/7.2.0(17):ERROR:150: Module 'gcc/7.2.0' conflicts with the currently loaded -module(s) 'gcc/6.2.0' -gcc/7.2.0(17):ERROR:102: Tcl command execution failed: conflict gcc +WARNING: gcc/10.2.0 cannot be loaded due to a conflict. +HINT: Might try "module unload gcc" first. ``` {: .output} @@ -47,42 +46,42 @@ new version. ``` {{ site.remote.prompt }} module unload gcc -{{ site.remote.prompt }} module load gcc/7.2.0 +{{ site.remote.prompt }} module load gcc/10.2.0 {{ site.remote.prompt }} gcc --version ``` {: .language-bash} ``` -gcc (GCC) 7.2.0 -Copyright (C) 2017 Free Software Foundation, Inc. +gcc (GCC) 10.2.0 +Copyright (C) 2020 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. ``` {: .output} -We now have successfully switched from GCC 6.2.0 to GCC 7.2.0. +We now have successfully switched from GCC 6.3.0 to GCC 10.2.0. As switching between different versions of the same module is often used you can use `module swap` rather than unloading one version before loading another. The equivalent of the steps above would be: ``` -{{ site.remote.prompt }} module purge +{{ site.remote.prompt }} module unload gcc/10.2.0 {{ site.remote.prompt }} module load gcc {{ site.remote.prompt }} gcc --version -{{ site.remote.prompt }} module swap gcc gcc/7.2.0 +{{ site.remote.prompt }} module swap gcc gcc/10.2.0 {{ site.remote.prompt }} gcc --version ``` {: .language-bash} ``` -gcc (GCC) 6.2.0 +gcc (GCC) 6.3.0 Copyright (C) 2016 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. -gcc (GCC) 7.2.0 -Copyright (C) 2017 Free Software Foundation, Inc. +gcc (GCC) 10.2.0 +Copyright (C) 2020 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. ``` diff --git a/_includes/snippets_library/EPCC_Cirrus_pbs/parallel/eight-tasks-jobscript.snip b/_includes/snippets_library/EPCC_Cirrus_slurm/parallel/eight-tasks-jobscript.snip similarity index 100% rename from _includes/snippets_library/EPCC_Cirrus_pbs/parallel/eight-tasks-jobscript.snip rename to _includes/snippets_library/EPCC_Cirrus_slurm/parallel/eight-tasks-jobscript.snip diff --git a/_includes/snippets_library/EPCC_Cirrus_slurm/parallel/four-tasks-jobscript.snip b/_includes/snippets_library/EPCC_Cirrus_slurm/parallel/four-tasks-jobscript.snip new file mode 100644 index 00000000..6b4aae76 --- /dev/null +++ b/_includes/snippets_library/EPCC_Cirrus_slurm/parallel/four-tasks-jobscript.snip @@ -0,0 +1,18 @@ +``` +{{ site.remote.bash_shebang }} +{{ site.sched.comment }} {{ site.sched.flag.name }} parallel-pi +{{ site.sched.comment }} {{ site.sched.flag.partition }} +{{ site.sched.comment }} {{ site.sched.flag.qos }} +{{ site.sched.comment }} --exclusive +{{ site.sched.comment }} --time=00:20:00 +{{ site.sched.comment }} --nodes=1 +{{ site.sched.comment }} --tasks-per-node=4 +{{ site.sched.comment }} --cpus-per-task=1 + +# Load the computing environment we need +module load mpi4py + +# Execute the task +srun --cpu-bind=cores python parallel-pi.py 100000000 +``` +{: .language-bash} diff --git a/_includes/snippets_library/EPCC_Cirrus_pbs/parallel/one-task-jobscript.snip b/_includes/snippets_library/EPCC_Cirrus_slurm/parallel/one-task-jobscript.snip similarity index 100% rename from _includes/snippets_library/EPCC_Cirrus_pbs/parallel/one-task-jobscript.snip rename to _includes/snippets_library/EPCC_Cirrus_slurm/parallel/one-task-jobscript.snip diff --git a/_includes/snippets_library/EPCC_Cirrus_slurm/parallel/one-task-with-memory-jobscript.snip b/_includes/snippets_library/EPCC_Cirrus_slurm/parallel/one-task-with-memory-jobscript.snip new file mode 100644 index 00000000..baefb638 --- /dev/null +++ b/_includes/snippets_library/EPCC_Cirrus_slurm/parallel/one-task-with-memory-jobscript.snip @@ -0,0 +1,16 @@ +``` +{{ site.remote.bash_shebang }} +{{ site.sched.comment }} {{ site.sched.flag.name }} serial-pi +{{ site.sched.comment }} {{ site.sched.flag.partition }} +{{ site.sched.comment }} {{ site.sched.flag.qos }} +{{ site.sched.comment }} --exclusive +{{ site.sched.comment }} --time=00:20:00 +{{ site.sched.comment }} --ntasks=1 + +# Load the computing environment we need +module load {{ site.remote.module_python3 }} + +# Execute the task +python serial-pi.py 100000000 +``` +{: .language-bash} diff --git a/_includes/snippets_library/EPCC_Cirrus_slurm/resources/account-history.snip b/_includes/snippets_library/EPCC_Cirrus_slurm/resources/account-history.snip new file mode 100644 index 00000000..8791e7f6 --- /dev/null +++ b/_includes/snippets_library/EPCC_Cirrus_slurm/resources/account-history.snip @@ -0,0 +1,12 @@ +``` +JobID JobName Partition Account AllocCPUS State ExitCode +-------------- ----------- --------- ------- --------- --------- -------- +2168130 serial-pi standard tc001 36 COMPLETED 0:0 +2168130.batch batch tc001 36 COMPLETED 0:0 +2168130.extern extern tc001 36 COMPLETED 0:0 +2168132 parallel-pi standard tc001 36 COMPLETED 0:0 +2168132.batch batch tc001 36 COMPLETED 0:0 +2168132.extern extern tc001 36 COMPLETED 0:0 +2168132.0 python tc001 4 COMPLETED 0:0 +``` +{: .output} diff --git a/_includes/snippets_library/EPCC_Cirrus_pbs/resources/cfd_bench.snip b/_includes/snippets_library/EPCC_Cirrus_slurm/resources/cfd-bench.snip similarity index 84% rename from _includes/snippets_library/EPCC_Cirrus_pbs/resources/cfd_bench.snip rename to _includes/snippets_library/EPCC_Cirrus_slurm/resources/cfd-bench.snip index 1a0443de..638e6112 100644 --- a/_includes/snippets_library/EPCC_Cirrus_pbs/resources/cfd_bench.snip +++ b/_includes/snippets_library/EPCC_Cirrus_slurm/resources/cfd-bench.snip @@ -4,14 +4,14 @@ using the following command: > > ``` -> {{ site.workshop_host.prompt }} wget {{ site.url }}{{ site.baseurl }}/files/cfd.tar.gz +> {{ site.remote.prompt }} wget {{ site.url }}{{ site.baseurl }}/files/cfd.tar.gz > ``` > {: .language-bash} > > Then unpack it using > > ``` -> {{ site.workshop_host.prompt }} tar -xvf cfd.tar.gz +> {{ site.remote.prompt }} tar -xvf cfd.tar.gz > ``` > {: .language-bash} > @@ -19,7 +19,7 @@ > `cfd.py` program. > > ``` -> module load anaconda/python2 +> module load {{ site.remote.module_python3 }} > python cfd.py 3 20000 > ``` > {: .language-bash} diff --git a/_includes/snippets_library/EPCC_Cirrus_slurm/resources/hist-fields.snip b/_includes/snippets_library/EPCC_Cirrus_slurm/resources/hist-fields.snip new file mode 100644 index 00000000..c0eac7f7 --- /dev/null +++ b/_includes/snippets_library/EPCC_Cirrus_slurm/resources/hist-fields.snip @@ -0,0 +1,6 @@ +* **NodeList**: The node(s) on which your job ran. +* **MaxRSS**: What was the maximum amount of memory used? +* **Elapsed**: How long did the job take? +* **State**: What is the job currently doing/what happened to it? +* **MaxDiskRead**: Amount of data read from disk. +* **MaxDiskWrite**: Amount of data written to disk. diff --git a/_includes/snippets_library/EPCC_Cirrus_pbs/resources/monitor-processes-top.snip b/_includes/snippets_library/EPCC_Cirrus_slurm/resources/monitor-processes-top.snip similarity index 79% rename from _includes/snippets_library/EPCC_Cirrus_pbs/resources/monitor-processes-top.snip rename to _includes/snippets_library/EPCC_Cirrus_slurm/resources/monitor-processes-top.snip index 5d4255c2..8961c52f 100644 --- a/_includes/snippets_library/EPCC_Cirrus_pbs/resources/monitor-processes-top.snip +++ b/_includes/snippets_library/EPCC_Cirrus_slurm/resources/monitor-processes-top.snip @@ -5,14 +5,13 @@ Tasks: 1526 total, 4 running, 1495 sleeping, 8 stopped, 19 zombie KiB Mem : 26377216+total, 11843416+free, 10668532 used, 13466947+buff/cache KiB Swap: 2097148 total, 105600 free, 1991548 used. 22326803+avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND -21917 root 20 0 374324 233452 6584 R 55.6 0.1 308:13.19 pbs_server. -30680 marius 20 0 152436 20772 5872 R 17.8 0.0 0:00.08 cc1 -27287 aturner 20 0 157312 3768 1600 R 8.9 0.0 0:00.59 top -30681 kfindlay 20 0 16744 2176 932 S 4.4 0.0 0:00.02 pbsnodes +30680 user8 20 0 152436 20772 5872 R 17.8 0.0 0:00.08 cc1 +27287 user2 20 0 157312 3768 1600 R 8.9 0.0 0:00.59 top +30681 user9 20 0 16744 2176 932 S 4.4 0.0 0:00.02 pbsnodes 2765 root 20 0 20940 32 0 S 2.2 0.0 5:59.78 aksusbd 7361 root 20 0 0 0 0 S 2.2 0.0 36:53.49 ptlrpcd_35 -26386 hallen 20 0 4321956 123520 6740 S 2.2 0.0 0:03.81 conda -30830 pcerro 20 0 117344 1656 1312 S 2.2 0.0 0:05.70 deployer_oo +26386 user3 20 0 4321956 123520 6740 S 2.2 0.0 0:03.81 conda +30830 user5 20 0 117344 1656 1312 S 2.2 0.0 0:05.70 deployer_oo 1 root 20 0 196108 3932 1644 S 0.0 0.0 82:49.29 systemd 2 root 20 0 0 0 0 S 0.0 0.0 6:14.69 kthreadd 3 root 20 0 0 0 0 S 0.0 0.0 0:06.40 ksoftirqd/0 diff --git a/_includes/snippets_library/EPCC_Cirrus_pbs/resources/system-memory-free.snip b/_includes/snippets_library/EPCC_Cirrus_slurm/resources/system-memory-free.snip similarity index 100% rename from _includes/snippets_library/EPCC_Cirrus_pbs/resources/system-memory-free.snip rename to _includes/snippets_library/EPCC_Cirrus_slurm/resources/system-memory-free.snip diff --git a/_includes/snippets_library/EPCC_Cirrus_slurm/scheduler/basic-job-script.snip b/_includes/snippets_library/EPCC_Cirrus_slurm/scheduler/basic-job-script.snip new file mode 100644 index 00000000..d19fd486 --- /dev/null +++ b/_includes/snippets_library/EPCC_Cirrus_slurm/scheduler/basic-job-script.snip @@ -0,0 +1,4 @@ +``` +Submitted batch job 2128732 +``` +{: .output} diff --git a/_includes/snippets_library/EPCC_Cirrus_slurm/scheduler/basic-job-status.snip b/_includes/snippets_library/EPCC_Cirrus_slurm/scheduler/basic-job-status.snip new file mode 100644 index 00000000..cbc0621c --- /dev/null +++ b/_includes/snippets_library/EPCC_Cirrus_slurm/scheduler/basic-job-status.snip @@ -0,0 +1,9 @@ +``` + JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) +2165985 standard example-job.sh auser R 0:24 1 r1i0n24 +``` +{: .output} + +We can see all the details of our job, most importantly that it is in the `R` +or `RUNNING` state. Sometimes our jobs might need to wait in a queue +(`PENDING`) or they might have failed (`F`) with some non-zero exit code. diff --git a/_includes/snippets_library/EPCC_Cirrus_slurm/scheduler/email-notifications.snip b/_includes/snippets_library/EPCC_Cirrus_slurm/scheduler/email-notifications.snip new file mode 100644 index 00000000..c72732a7 --- /dev/null +++ b/_includes/snippets_library/EPCC_Cirrus_slurm/scheduler/email-notifications.snip @@ -0,0 +1,5 @@ +> Jobs on an HPC system might run for days or even weeks. It is possible to configure +> the {{ site.sched.name }} scheduler such that an email notification is sent when a +> job starts running and/or when the job terminates. Unfortunately, {{ site.sched.name }} email +> notifications are not enabled on {{ site.remote.name }}. +{: .challenge} diff --git a/_includes/snippets_library/EPCC_Cirrus_slurm/scheduler/job-with-name-status.snip b/_includes/snippets_library/EPCC_Cirrus_slurm/scheduler/job-with-name-status.snip new file mode 100644 index 00000000..53261c17 --- /dev/null +++ b/_includes/snippets_library/EPCC_Cirrus_slurm/scheduler/job-with-name-status.snip @@ -0,0 +1,5 @@ +``` + JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) +2165985 standard new_name auser R 0:35 1 r1i0n24 +``` +{: .output} diff --git a/_includes/snippets_library/EPCC_Cirrus_slurm/scheduler/option-flags-list.snip b/_includes/snippets_library/EPCC_Cirrus_slurm/scheduler/option-flags-list.snip new file mode 100644 index 00000000..03ee5fe0 --- /dev/null +++ b/_includes/snippets_library/EPCC_Cirrus_slurm/scheduler/option-flags-list.snip @@ -0,0 +1,17 @@ +* `-J, --job-name=` — set a name for the job to help identify it in Slurm command output. + +* `-A, --account=` — your budget ID is usually something like tc01 or tc01-test. + +* `-p, --partition=` — the partition specifies the set of nodes you want to run on. + +* `-q, --qos=` — the Quality of Service (QoS) specifies the limits of your job (e.g., maximum number of nodes, maximum walltime). + +* `-t, --time=` — the maximum walltime for your job, e.g. for a 6.5 hour walltime, you would use `--time=06:30:00`. + +* `--exclusive` — setting this flag ensures that you have exclusive access to a compute node. + +* `-N, --nodes=` — the number of nodes to use for the job. + +* `--ntasks-per-node=` — the number of parallel processes (e.g. MPI ranks) per node. + +* `-c, --cpus-per-task=` — the number of threads per parallel process (e.g. the number of OpenMP threads per MPI task for hybrid MPI/OpenMP jobs). diff --git a/_includes/snippets_library/EPCC_Cirrus_pbs/scheduler/print-sched-variables.snip b/_includes/snippets_library/EPCC_Cirrus_slurm/scheduler/print-sched-variables.snip similarity index 62% rename from _includes/snippets_library/EPCC_Cirrus_pbs/scheduler/print-sched-variables.snip rename to _includes/snippets_library/EPCC_Cirrus_slurm/scheduler/print-sched-variables.snip index bcda18c6..bca87ac4 100644 --- a/_includes/snippets_library/EPCC_Cirrus_pbs/scheduler/print-sched-variables.snip +++ b/_includes/snippets_library/EPCC_Cirrus_slurm/scheduler/print-sched-variables.snip @@ -2,10 +2,10 @@ > > When {{ site.sched.name }} runs a job, it sets a number of environment > variables for the job. One of these will let us check what directory our job -> script was submitted from. The `PBS_O_WORKDIR` variable is set to the +> script was submitted from. The `SLURM_SUBMIT_DIR` variable is set to the > directory from which our job was submitted. > -> Using the `PBS_O_WORKDIR` variable, modify your job so that it prints out the +> Using the `SLURM_SUBMIT_DIR` variable, modify your job so that it prints out the > location from which the job was submitted. > > > ## Solution @@ -18,13 +18,17 @@ > > > > ``` > > {{ site.remote.bash_shebang }} -> > #PBS -l 00:00:30 +> > {{ site.sched.comment }} {{ site.sched.flag.partition }} +> > {{ site.sched.comment }} {{ site.sched.flag.qos }} +> > {{ site.sched.comment }} {{ site.sched.flag.time }} 00:01:15 +> > +> > sleep 60 # time in seconds > > > > echo -n "This script is running on " > > hostname > > > > echo "This job was launched in the following directory:" -> > echo ${PBS_O_WORKDIR} +> > echo ${SLURM_SUBMIT_DIR} > > ``` > > {: .output} > {: .solution} diff --git a/_includes/snippets_library/EPCC_Cirrus_slurm/scheduler/runtime-exceeded-job.snip b/_includes/snippets_library/EPCC_Cirrus_slurm/scheduler/runtime-exceeded-job.snip new file mode 100644 index 00000000..6bca2938 --- /dev/null +++ b/_includes/snippets_library/EPCC_Cirrus_slurm/scheduler/runtime-exceeded-job.snip @@ -0,0 +1,4 @@ +``` +{{ site.remote.prompt }} cat slurm-2166477.out +``` +{: .language-bash} diff --git a/_includes/snippets_library/EPCC_Cirrus_slurm/scheduler/runtime-exceeded-output.snip b/_includes/snippets_library/EPCC_Cirrus_slurm/scheduler/runtime-exceeded-output.snip new file mode 100644 index 00000000..9cdd6366 --- /dev/null +++ b/_includes/snippets_library/EPCC_Cirrus_slurm/scheduler/runtime-exceeded-output.snip @@ -0,0 +1,4 @@ +``` +slurmstepd: error: *** JOB 2166477 ON r1i0n24 CANCELLED AT 2022-02-09T14:34:34 DUE TO TIME LIMIT *** +``` +{: .output} diff --git a/_includes/snippets_library/EPCC_Cirrus_slurm/scheduler/terminate-job-begin.snip b/_includes/snippets_library/EPCC_Cirrus_slurm/scheduler/terminate-job-begin.snip new file mode 100644 index 00000000..4fbbc9ae --- /dev/null +++ b/_includes/snippets_library/EPCC_Cirrus_slurm/scheduler/terminate-job-begin.snip @@ -0,0 +1,5 @@ +``` + JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) +2166487 standard overrun auser R 0:20 1 r1i0n24 +``` +{: .output} diff --git a/_includes/snippets_library/EPCC_Cirrus_slurm/scheduler/terminate-job-cancel.snip b/_includes/snippets_library/EPCC_Cirrus_slurm/scheduler/terminate-job-cancel.snip new file mode 100644 index 00000000..7f3ff115 --- /dev/null +++ b/_includes/snippets_library/EPCC_Cirrus_slurm/scheduler/terminate-job-cancel.snip @@ -0,0 +1,4 @@ +``` +...(no output from squeue when there are no jobs to display)... +``` +{: .output} diff --git a/_includes/snippets_library/EPCC_Cirrus_slurm/scheduler/terminate-multiple-jobs.snip b/_includes/snippets_library/EPCC_Cirrus_slurm/scheduler/terminate-multiple-jobs.snip new file mode 100644 index 00000000..e69de29b diff --git a/_includes/snippets_library/EPCC_Cirrus_pbs/scheduler/using-nodes-interactively.snip b/_includes/snippets_library/EPCC_Cirrus_slurm/scheduler/using-nodes-interactively.snip similarity index 87% rename from _includes/snippets_library/EPCC_Cirrus_pbs/scheduler/using-nodes-interactively.snip rename to _includes/snippets_library/EPCC_Cirrus_slurm/scheduler/using-nodes-interactively.snip index 3c031537..063fb1e4 100644 --- a/_includes/snippets_library/EPCC_Cirrus_pbs/scheduler/using-nodes-interactively.snip +++ b/_includes/snippets_library/EPCC_Cirrus_slurm/scheduler/using-nodes-interactively.snip @@ -9,7 +9,7 @@ uses a single core: {: .language-bash} You should be presented with a bash prompt. Note that the prompt will likely -change to reflect your new location, in this case the worker node we are logged +change to reflect your new location, in this case the compute node we are logged on. You can also verify this with `hostname`. When you are done with the interactive job, type `exit` to quit your session. diff --git a/_includes/snippets_library/HPCC_MagicCastle_slurm/_config_options.yml b/_includes/snippets_library/HPCC_MagicCastle_slurm/_config_options.yml index 5ef04ec5..c57051fa 100644 --- a/_includes/snippets_library/HPCC_MagicCastle_slurm/_config_options.yml +++ b/_includes/snippets_library/HPCC_MagicCastle_slurm/_config_options.yml @@ -51,6 +51,7 @@ sched: info: "sinfo" comment: "#SBATCH" hist: "sacct -u yourUsername" + hist_filter: "" episode_order: - 10-hpc-intro diff --git a/_includes/snippets_library/HPCC_MagicCastle_slurm/modules/default-modules.snip b/_includes/snippets_library/HPCC_MagicCastle_slurm/modules/default-modules.snip new file mode 100644 index 00000000..a448dd96 --- /dev/null +++ b/_includes/snippets_library/HPCC_MagicCastle_slurm/modules/default-modules.snip @@ -0,0 +1,4 @@ +``` +No Modulefiles Currently Loaded. +``` +{: .output} diff --git a/_includes/snippets_library/Magic_Castle_EESSI_slurm/_config_options.yml b/_includes/snippets_library/Magic_Castle_EESSI_slurm/_config_options.yml index f5a838c8..724d0868 100644 --- a/_includes/snippets_library/Magic_Castle_EESSI_slurm/_config_options.yml +++ b/_includes/snippets_library/Magic_Castle_EESSI_slurm/_config_options.yml @@ -70,7 +70,8 @@ sched: info: "sinfo" comment: "#SBATCH" hist: "sacct -u yourUsername" - + hist_filter: "" + episode_order: - 10-hpc-intro - 11-connecting diff --git a/_includes/snippets_library/Magic_Castle_EESSI_slurm/cluster/root-folders.snip b/_includes/snippets_library/Magic_Castle_EESSI_slurm/cluster/root-folders.snip new file mode 100644 index 00000000..715de741 --- /dev/null +++ b/_includes/snippets_library/Magic_Castle_EESSI_slurm/cluster/root-folders.snip @@ -0,0 +1,6 @@ +``` +bin etc lib64 proc sbin sys var +boot {{ site.remote.homedir | replace: "/", "" }} mnt root scratch tmp working +dev lib opt run srv usr +``` +{: .output} diff --git a/_includes/snippets_library/Magic_Castle_EESSI_slurm/modules/default-modules.snip b/_includes/snippets_library/Magic_Castle_EESSI_slurm/modules/default-modules.snip new file mode 100644 index 00000000..a448dd96 --- /dev/null +++ b/_includes/snippets_library/Magic_Castle_EESSI_slurm/modules/default-modules.snip @@ -0,0 +1,4 @@ +``` +No Modulefiles Currently Loaded. +``` +{: .output} diff --git a/_includes/snippets_library/Magic_Castle_EESSI_slurm/resources/hist-fields.snip b/_includes/snippets_library/Magic_Castle_EESSI_slurm/resources/hist-fields.snip new file mode 100644 index 00000000..f0e215ba --- /dev/null +++ b/_includes/snippets_library/Magic_Castle_EESSI_slurm/resources/hist-fields.snip @@ -0,0 +1,6 @@ +* **Hostname**: Where did your job run? +* **MaxRSS**: What was the maximum amount of memory used? +* **Elapsed**: How long did the job take? +* **State**: What is the job currently doing/what happened to it? +* **MaxDiskRead**: Amount of data read from disk. +* **MaxDiskWrite**: Amount of data written to disk. diff --git a/_includes/snippets_library/Magic_Castle_EESSI_slurm/scheduler/email-notifications.snip b/_includes/snippets_library/Magic_Castle_EESSI_slurm/scheduler/email-notifications.snip new file mode 100644 index 00000000..e681b3c0 --- /dev/null +++ b/_includes/snippets_library/Magic_Castle_EESSI_slurm/scheduler/email-notifications.snip @@ -0,0 +1,19 @@ +> Jobs on an HPC system might run for days or even weeks. We probably have +> better things to do than constantly check on the status of our job with +> `{{ site.sched.status }}`. Looking at the manual page for +> `{{ site.sched.submit.name }}`, can you set up our test job to send you an email +> when it finishes? +> +> > ## Hint +> > +> > You can use the *manual pages* for {{ site.sched.name }} utilities to find +> > more about their capabilities. On the command line, these are accessed +> > through the `man` utility: run `man `. You can find the same +> > information online by searching > "man ". +> > +> > ``` +> > {{ site.remote.prompt }} man {{ site.sched.submit.name }} +> > ``` +> > {: .language-bash} +> {: .solution} +{: .challenge} diff --git a/_includes/snippets_library/NIST_CTCMS_slurm/_config_options.yml b/_includes/snippets_library/NIST_CTCMS_slurm/_config_options.yml index 114b0fa4..14948029 100644 --- a/_includes/snippets_library/NIST_CTCMS_slurm/_config_options.yml +++ b/_includes/snippets_library/NIST_CTCMS_slurm/_config_options.yml @@ -8,9 +8,9 @@ # `_includes/snippets_library`. To use one, replace options # below with those in `_config_options.yml` from the # library. E.g, to customise for Cirrus at EPCC, running -# PBS, we could replace the options below with those from +# Slurm, we could replace the options below with those from # -# _includes/snippets_library/EPCC_Cirrus_pbs/_config_options.yml +# _includes/snippets_library/EPCC_Cirrus_slurm/_config_options.yml # # If your cluster is not represented in the library, please # copy an existing folder, rename it, and customize for your @@ -50,11 +50,13 @@ sched: name: "-J" time: "-t" queue: "-p" + partition: "-p serial" del: "scancel" interactive: "srun" info: "sinfo" comment: "#SBATCH" hist: "sacct -u yourUsername" + hist_filter: "" episode_order: - 10-hpc-intro diff --git a/_includes/snippets_library/NIST_CTCMS_slurm/cluster/root-folders.snip b/_includes/snippets_library/NIST_CTCMS_slurm/cluster/root-folders.snip new file mode 100644 index 00000000..715de741 --- /dev/null +++ b/_includes/snippets_library/NIST_CTCMS_slurm/cluster/root-folders.snip @@ -0,0 +1,6 @@ +``` +bin etc lib64 proc sbin sys var +boot {{ site.remote.homedir | replace: "/", "" }} mnt root scratch tmp working +dev lib opt run srv usr +``` +{: .output} diff --git a/_includes/snippets_library/NIST_CTCMS_slurm/modules/default-modules.snip b/_includes/snippets_library/NIST_CTCMS_slurm/modules/default-modules.snip new file mode 100644 index 00000000..a448dd96 --- /dev/null +++ b/_includes/snippets_library/NIST_CTCMS_slurm/modules/default-modules.snip @@ -0,0 +1,4 @@ +``` +No Modulefiles Currently Loaded. +``` +{: .output} diff --git a/_includes/snippets_library/NIST_CTCMS_slurm/resources/hist-fields.snip b/_includes/snippets_library/NIST_CTCMS_slurm/resources/hist-fields.snip new file mode 100644 index 00000000..f0e215ba --- /dev/null +++ b/_includes/snippets_library/NIST_CTCMS_slurm/resources/hist-fields.snip @@ -0,0 +1,6 @@ +* **Hostname**: Where did your job run? +* **MaxRSS**: What was the maximum amount of memory used? +* **Elapsed**: How long did the job take? +* **State**: What is the job currently doing/what happened to it? +* **MaxDiskRead**: Amount of data read from disk. +* **MaxDiskWrite**: Amount of data written to disk. diff --git a/_includes/snippets_library/NIST_CTCMS_slurm/scheduler/email-notifications.snip b/_includes/snippets_library/NIST_CTCMS_slurm/scheduler/email-notifications.snip new file mode 100644 index 00000000..e681b3c0 --- /dev/null +++ b/_includes/snippets_library/NIST_CTCMS_slurm/scheduler/email-notifications.snip @@ -0,0 +1,19 @@ +> Jobs on an HPC system might run for days or even weeks. We probably have +> better things to do than constantly check on the status of our job with +> `{{ site.sched.status }}`. Looking at the manual page for +> `{{ site.sched.submit.name }}`, can you set up our test job to send you an email +> when it finishes? +> +> > ## Hint +> > +> > You can use the *manual pages* for {{ site.sched.name }} utilities to find +> > more about their capabilities. On the command line, these are accessed +> > through the `man` utility: run `man `. You can find the same +> > information online by searching > "man ". +> > +> > ``` +> > {{ site.remote.prompt }} man {{ site.sched.submit.name }} +> > ``` +> > {: .language-bash} +> {: .solution} +{: .challenge} diff --git a/_includes/snippets_library/NIST_CTCMS_slurm/scheduler/print-sched-variables.snip b/_includes/snippets_library/NIST_CTCMS_slurm/scheduler/print-sched-variables.snip index 5234a4ed..90e7dbf8 100644 --- a/_includes/snippets_library/NIST_CTCMS_slurm/scheduler/print-sched-variables.snip +++ b/_includes/snippets_library/NIST_CTCMS_slurm/scheduler/print-sched-variables.snip @@ -17,7 +17,8 @@ > > > > ``` > > {{ site.remote.bash_shebang }} -> > #SBATCH -t 00:00:30 +> > {{ site.sched.comment }} {{ site.sched.flag.partition }} +> > {{ site.sched.comment }} {{ site.sched.flag.time }} 00:00:20 > > > > echo -n "This script is running on " > > hostname diff --git a/_includes/snippets_library/Norway_SIGMA2_SAGA_slurm/_config_options.yml b/_includes/snippets_library/Norway_SIGMA2_SAGA_slurm/_config_options.yml index 1b68b14d..647ce293 100644 --- a/_includes/snippets_library/Norway_SIGMA2_SAGA_slurm/_config_options.yml +++ b/_includes/snippets_library/Norway_SIGMA2_SAGA_slurm/_config_options.yml @@ -8,9 +8,9 @@ # `_includes/snippets_library`. To use one, replace options # below with those in `_config_options.yml` from the # library. E.g, to customise for Cirrus at EPCC, running -# PBS, we could replace the options below with those from +# Slurm, we could replace the options below with those from # -# _includes/snippets_library/EPCC_Cirrus_pbs/_config_options.yml +# _includes/snippets_library/EPCC_Cirrus_slurm/_config_options.yml # # If your cluster is not represented in the library, please # copy an existing folder, rename it, and customize for your @@ -55,6 +55,7 @@ sched: info: "sinfo" comment: "#SBATCH" hist: "sacct -u $USER" + hist_filter: "" episode_order: - 10-hpc-intro diff --git a/_includes/snippets_library/Norway_SIGMA2_SAGA_slurm/cluster/root-folders.snip b/_includes/snippets_library/Norway_SIGMA2_SAGA_slurm/cluster/root-folders.snip new file mode 100644 index 00000000..715de741 --- /dev/null +++ b/_includes/snippets_library/Norway_SIGMA2_SAGA_slurm/cluster/root-folders.snip @@ -0,0 +1,6 @@ +``` +bin etc lib64 proc sbin sys var +boot {{ site.remote.homedir | replace: "/", "" }} mnt root scratch tmp working +dev lib opt run srv usr +``` +{: .output} diff --git a/_includes/snippets_library/Norway_SIGMA2_SAGA_slurm/modules/default-modules.snip b/_includes/snippets_library/Norway_SIGMA2_SAGA_slurm/modules/default-modules.snip new file mode 100644 index 00000000..a448dd96 --- /dev/null +++ b/_includes/snippets_library/Norway_SIGMA2_SAGA_slurm/modules/default-modules.snip @@ -0,0 +1,4 @@ +``` +No Modulefiles Currently Loaded. +``` +{: .output} diff --git a/_includes/snippets_library/Norway_SIGMA2_SAGA_slurm/resources/hist-fields.snip b/_includes/snippets_library/Norway_SIGMA2_SAGA_slurm/resources/hist-fields.snip new file mode 100644 index 00000000..f0e215ba --- /dev/null +++ b/_includes/snippets_library/Norway_SIGMA2_SAGA_slurm/resources/hist-fields.snip @@ -0,0 +1,6 @@ +* **Hostname**: Where did your job run? +* **MaxRSS**: What was the maximum amount of memory used? +* **Elapsed**: How long did the job take? +* **State**: What is the job currently doing/what happened to it? +* **MaxDiskRead**: Amount of data read from disk. +* **MaxDiskWrite**: Amount of data written to disk. diff --git a/_includes/snippets_library/Norway_SIGMA2_SAGA_slurm/scheduler/email-notifications.snip b/_includes/snippets_library/Norway_SIGMA2_SAGA_slurm/scheduler/email-notifications.snip new file mode 100644 index 00000000..e681b3c0 --- /dev/null +++ b/_includes/snippets_library/Norway_SIGMA2_SAGA_slurm/scheduler/email-notifications.snip @@ -0,0 +1,19 @@ +> Jobs on an HPC system might run for days or even weeks. We probably have +> better things to do than constantly check on the status of our job with +> `{{ site.sched.status }}`. Looking at the manual page for +> `{{ site.sched.submit.name }}`, can you set up our test job to send you an email +> when it finishes? +> +> > ## Hint +> > +> > You can use the *manual pages* for {{ site.sched.name }} utilities to find +> > more about their capabilities. On the command line, these are accessed +> > through the `man` utility: run `man `. You can find the same +> > information online by searching > "man ". +> > +> > ``` +> > {{ site.remote.prompt }} man {{ site.sched.submit.name }} +> > ``` +> > {: .language-bash} +> {: .solution} +{: .challenge} diff --git a/_includes/snippets_library/UCL_Myriad_sge/_config_options.yml b/_includes/snippets_library/UCL_Myriad_sge/_config_options.yml index b23d7cfc..149b3d97 100644 --- a/_includes/snippets_library/UCL_Myriad_sge/_config_options.yml +++ b/_includes/snippets_library/UCL_Myriad_sge/_config_options.yml @@ -33,6 +33,7 @@ sched: info: "qhost" comment: "#$ " hist: "jobhist" + hist_filter: "" bash_shebang: "#!/bin/bash -l" episode_order: diff --git a/_includes/snippets_library/UCL_Myriad_sge/cluster/root-folders.snip b/_includes/snippets_library/UCL_Myriad_sge/cluster/root-folders.snip new file mode 100644 index 00000000..715de741 --- /dev/null +++ b/_includes/snippets_library/UCL_Myriad_sge/cluster/root-folders.snip @@ -0,0 +1,6 @@ +``` +bin etc lib64 proc sbin sys var +boot {{ site.remote.homedir | replace: "/", "" }} mnt root scratch tmp working +dev lib opt run srv usr +``` +{: .output} diff --git a/_includes/snippets_library/UCL_Myriad_sge/modules/default-modules.snip b/_includes/snippets_library/UCL_Myriad_sge/modules/default-modules.snip new file mode 100644 index 00000000..a448dd96 --- /dev/null +++ b/_includes/snippets_library/UCL_Myriad_sge/modules/default-modules.snip @@ -0,0 +1,4 @@ +``` +No Modulefiles Currently Loaded. +``` +{: .output} diff --git a/_includes/snippets_library/UCL_Myriad_sge/resources/hist-fields.snip b/_includes/snippets_library/UCL_Myriad_sge/resources/hist-fields.snip new file mode 100644 index 00000000..f0e215ba --- /dev/null +++ b/_includes/snippets_library/UCL_Myriad_sge/resources/hist-fields.snip @@ -0,0 +1,6 @@ +* **Hostname**: Where did your job run? +* **MaxRSS**: What was the maximum amount of memory used? +* **Elapsed**: How long did the job take? +* **State**: What is the job currently doing/what happened to it? +* **MaxDiskRead**: Amount of data read from disk. +* **MaxDiskWrite**: Amount of data written to disk. diff --git a/_includes/snippets_library/UCL_Myriad_sge/scheduler/email-notifications.snip b/_includes/snippets_library/UCL_Myriad_sge/scheduler/email-notifications.snip new file mode 100644 index 00000000..e681b3c0 --- /dev/null +++ b/_includes/snippets_library/UCL_Myriad_sge/scheduler/email-notifications.snip @@ -0,0 +1,19 @@ +> Jobs on an HPC system might run for days or even weeks. We probably have +> better things to do than constantly check on the status of our job with +> `{{ site.sched.status }}`. Looking at the manual page for +> `{{ site.sched.submit.name }}`, can you set up our test job to send you an email +> when it finishes? +> +> > ## Hint +> > +> > You can use the *manual pages* for {{ site.sched.name }} utilities to find +> > more about their capabilities. On the command line, these are accessed +> > through the `man` utility: run `man `. You can find the same +> > information online by searching > "man ". +> > +> > ``` +> > {{ site.remote.prompt }} man {{ site.sched.submit.name }} +> > ``` +> > {: .language-bash} +> {: .solution} +{: .challenge} diff --git a/setup.md b/setup.md index 68bfdd15..35aa55dd 100644 --- a/setup.md +++ b/setup.md @@ -52,9 +52,9 @@ the Windows start menu. > you can run Bash commands on a remote computer or server that already has a > Unix Shell, from your Windows machine. This can usually be done through a > Secure Shell (SSH) client. One such client available for free for Windows -> computers is PuTTY. See the reference below for information on installing and -> using PuTTY, using the Windows 10 command-line tool, or installing and using -> a Unix/Linux emulator. +> computers is [PuTTY][putty]. See the reference below for information on +> installing and using PuTTY, using the Windows 10 command-line tool, or +> installing and using a Unix/Linux emulator. > > For advanced users, you may choose one of the following alternatives: > @@ -137,5 +137,6 @@ anything. [ms-wsl]: https://docs.microsoft.com/en-us/windows/wsl/install-win10 [ms-shell]: https://docs.microsoft.com/en-us/powershell/scripting/learn/remoting/ssh-remoting-in-powershell-core?view=powershell-7 [mobax-gen]: https://mobaxterm.mobatek.net/documentation.html -[unix-emulator]: https://faculty.smu.edu/reynolds/unixtut/windows.html +[putty]: https://www.chiark.greenend.org.uk/~sgtatham/putty/ +[unix-emulator]: https://www.cygwin.com/ [wsl]: https://docs.microsoft.com/en-us/windows/wsl/install-win10