Merge pull request #6 from RuiApostolo/gh-pages

Added GPU exercise and info on KOKKOS package
EPCCed · Oct 16, 2024 · 23b5b83 · 23b5b83
2 parents 3bbc561 + c642b5a
commit 23b5b83
Show file tree

Hide file tree

Showing 12 changed files with 159 additions and 6 deletions.
diff --git a/_episodes/06-extra-software.md b/_episodes/06-extra-software.md
@@ -15,16 +15,58 @@ keypoints:
 
 ## GPU acceleration
 
-LAMMPS has the capability to use GPUs to accelerate the calculations needed to run a simulation, but the program needs to be compiled with the correct parameters for this option to be available.
+LAMMPS has the capability to use GPUs to accelerate the calculations needed to run a simulation,
+but the program needs to be compiled with the correct parameters for this option to be available.
 Furthermore, LAMMPS can exploit multiple GPUs on the same system, although the performance scaling depends heavily on the particular system.
 As always, we recommend that each user should run benchmarks for their particular use-case to ensure that they are getting performance benefits.
 While not every LAMMPS force field or fix is available for GPU, a vast majority are, and more are added with each new version.
 Check the LAMMPS documentation for GPU compatibility with a specific command.
 
-To use the GPU accelerated commands, you will need to be an extra flag when calling the LAMMPS binary: `-pk gpu <number_of_gpus_to_use>`.
+There are two main LAMMPS packages that add GPU acceleration: `GPU` and `KOKKOS`.
+The differences are many, including which `fixes` and `computes` are implemented for each package (as always, consult the manual),
+but the main difference is the underlying framework used in each.
+The flags needed for each are also different, we have some examples of each below.
+
+ARCHER2 has the `lammps-gpu` module, which has lammps compiled with KOKKOS.
+Due to the model of AMD GPUs and compiler/driver versions available, it hasn't been possible to compile LAMMPS with the `GPU` package.
+Cirrus, a Tier2 HPC system also hosted at EPCC, has a `lammps-gpu` module installed with the `GPU` package.
+
+### KOKKOS package
+
+In `exercises/4-gpu-simulation` you can find the Lennard-Jones input file used in exercise 2, with a larger number of atoms,
+and a slurm script that loads the `lammps-gpu` module, and runs the simulation on a GPU.
+
+The main differences are some of the `#SBATCH` lines, where we now have to set the number of GPUs to request, and use a different partition/qos combo:
+
+```bash
+...
+#SBATCH --nodes=1
+#SBATCH --gpus=1
+...
+#SBATCH --partition=gpu
+#SBATCH --qos=gpu-shd
+
+```
+
+and the `srun` line, where we have to set the number of tasks, CPUS, the hints, and the distribution (slurm won't allow this on the `#SBATCH` lines),
+as well as adding the `KOKKOS`-specific flags `-k on g 1 -pk kokkos -sf kk`
+
+
+```bash
+srun --ntasks=1 --cpus-per-task=1 --hint=nomultithread --distribution=block:block \
+lmp -k on g 1 -pk kokkos -sf kk -i in.lj_exercise -l log_gpu.$SLURM_JOB_ID
+```
+
+The `-k on g 1` flag tells `KOKKOS` to use 1 GPU, and `-sf kk` adds the `\kk` suffic to all styles that support it.
+
+
+### GPU package
+
+To use the GPU accelerated commands, you would need to be an extra flag when calling the LAMMPS binary: `-pk gpu <number_of_gpus_to_use>`.
 You will also need to add the `\gpu` suffix to all the styles intended to be accelerated this way or,
 alternatively, you can use the `-sf gpu` flag to append the `\gpu` suffix to all styles that support it (though this is at your own risk).
-So, for example, if ARCHER2 had GPUs, you would change the `srun` line from:
+
+For example, if you were to run exercise 4 on Cirrus, you could change the line:
 
 ```
 srun lmp -i in.ethanol -l log.$SLURM_JOB_ID

diff --git a/exercises/1-performance-exercise/sub.slurm b/exercises/1-performance-exercise/sub.slurm
@@ -3,7 +3,7 @@
 # Slurm job options (name, number of compute nodes, job time)
 #SBATCH --job-name=lmp_ex1
 #SBATCH --nodes=1
-#SBATCH --time=0:20:0
+#SBATCH --time=0:10:0
 #SBATCH --hint=nomultithread
 #SBATCH --distribution=block:block
 #SBATCH --tasks-per-node=64

diff --git a/exercises/2-lj-exercise/sub.slurm b/exercises/2-lj-exercise/sub.slurm
@@ -3,7 +3,7 @@
 # Slurm job options (name, number of compute nodes, job time)
 #SBATCH --job-name=lmp_ex2
 #SBATCH --nodes=1
-#SBATCH --time=0:20:0
+#SBATCH --time=0:10:0
 #SBATCH --hint=nomultithread
 #SBATCH --distribution=block:block
 #SBATCH --tasks-per-node=128

diff --git a/exercises/3-advanced-inputs-exercise/sub.slurm b/exercises/3-advanced-inputs-exercise/sub.slurm
@@ -3,7 +3,7 @@
 # Slurm job options (name, number of compute nodes, job time)
 #SBATCH --job-name=lmp_ex3
 #SBATCH --nodes=1
-#SBATCH --time=0:20:0
+#SBATCH --time=0:10:0
 #SBATCH --hint=nomultithread
 #SBATCH --distribution=block:block
 #SBATCH --tasks-per-node=128

diff --git a/exercises/4-gpu-simulation/in.lj_exercise b/exercises/4-gpu-simulation/in.lj_exercise
@@ -0,0 +1,85 @@
+####################################
+# Example LAMMPS input script      #
+# for a simple Lennard Jones fluid #
+####################################
+
+####################################
+# 0) Define vairables
+####################################
+
+variable  DENSITY equal 0.8
+
+####################################
+# 1) Set up simulation box
+#   - We set a 3D periodic box
+#   - Our box has 10x10x10 atom 
+#     positions, evenly distributed
+#   - The atom starting sites are
+#     separated such that the box density
+#     is 0.6
+####################################
+
+units         lj
+atom_style    atomic
+dimension     3
+boundary      p p p
+
+lattice       sc ${DENSITY}
+region        box block 0 50 0 50 0 50
+create_box    1 box
+create_atoms  1 box
+
+####################################
+# 2) Define interparticle interactions
+#   - Here, we use truncated & shifted LJ
+#   - All atoms of type 1 (in this case, all atoms)
+#     have a mass of 1.0
+####################################
+
+pair_style  lj/cut 3.5
+pair_modify shift yes
+pair_coeff  1 1 1.0 1.0
+mass        1 1.0
+
+####################################
+# 3) Neighbour lists
+#   - Each atom will only consider neighbours
+#     within a distance of 2.8 of each other
+#   - The neighbour lists are recalculated
+#     every timestep
+####################################
+
+neighbor        0.3 bin
+neigh_modify    delay 10 every 1
+
+####################################
+# 4) Define simulation parameters
+#   - We fix the temperature and 
+#     linear and angular momenta
+#     of the system 
+#   - We run with fixed number (n),
+#     volume (v), temperature (t)
+####################################
+
+fix   LinMom all momentum 50 linear 1 1 1 angular
+fix   1 all nvt temp 1.00 1.00 5.0
+#fix    1 all npt temp 1.0 1.0 25.0 iso 1.5150 1.5150  10.0
+
+####################################
+# 5) Final setup
+#   - Define starting particle velocity
+#   - Define timestep
+#   - Define output system properties (temp, energy, etc.)
+#   - Define simulation length
+####################################
+
+velocity      all create 1.0 199085 mom no
+
+timestep      0.005
+
+thermo_style  custom step temp etotal pe ke press vol density
+thermo        500
+
+run_style     verlet
+
+run           50000
diff --git a/exercises/4-gpu-simulation/sub.slurm b/exercises/4-gpu-simulation/sub.slurm
@@ -0,0 +1,26 @@
+#!/bin/bash
+
+# Slurm job options (name, number of compute nodes, job time)
+#SBATCH --job-name=lmp_ex4
+#SBATCH --nodes=1
+#SBATCH --gpus=1
+#SBATCH --time=0:05:0
+
+# The budget code of the project
+#SBATCH --account=ta176
+# Standard partition
+#SBATCH --partition=gpu
+# Short QoS since our runtime is under 20m
+#SBATCH --qos=gpu-shd
+
+# load the lammps module
+module load lammps-gpu
+
+# Set the number of threads to 1
+#   This prevents any threaded system libraries from automatically
+#   using threading.
+export OMP_NUM_THREADS=1
+
+# Launch the parallel job
+srun --ntasks=1 --cpus-per-task=1 --hint=nomultithread --distribution=block:block \
+lmp -k on g 1 -pk kokkos -sf kk -i in.lj_exercise -l log_gpu.$SLURM_JOB_ID
diff --git a/exercises/4-creating-topology/ethanol.tcl → exercises/5-creating-topology/ethanol.tcl b/exercises/4-creating-topology/ethanol.tcl → exercises/5-creating-topology/ethanol.tcl
diff --git a/exercises/4-creating-topology/ethanol.xyz → exercises/5-creating-topology/ethanol.xyz b/exercises/4-creating-topology/ethanol.xyz → exercises/5-creating-topology/ethanol.xyz
diff --git a/exercises/4-creating-topology/pack.inp → exercises/5-creating-topology/pack.inp b/exercises/4-creating-topology/pack.inp → exercises/5-creating-topology/pack.inp
diff --git a/exercises/4-creating-topology/topo.tcl → exercises/5-creating-topology/topo.tcl b/exercises/4-creating-topology/topo.tcl → exercises/5-creating-topology/topo.tcl
diff --git a/exercises/4-creating-topology/water.tcl → exercises/5-creating-topology/water.tcl b/exercises/4-creating-topology/water.tcl → exercises/5-creating-topology/water.tcl
diff --git a/exercises/4-creating-topology/water.xyz → exercises/5-creating-topology/water.xyz b/exercises/4-creating-topology/water.xyz → exercises/5-creating-topology/water.xyz