diff --git a/content/_static/siesta/SIESTA-Analysis_tools.pdf b/content/_static/siesta/SIESTA-Analysis_tools.pdf new file mode 100644 index 0000000..f7045c7 Binary files /dev/null and b/content/_static/siesta/SIESTA-Analysis_tools.pdf differ diff --git a/content/_static/siesta/SIESTA-Convergence.pdf b/content/_static/siesta/SIESTA-Convergence.pdf new file mode 100644 index 0000000..667b2fb Binary files /dev/null and b/content/_static/siesta/SIESTA-Convergence.pdf differ diff --git a/content/_static/siesta/SIESTA-Features.pdf b/content/_static/siesta/SIESTA-Features.pdf new file mode 100644 index 0000000..272de28 Binary files /dev/null and b/content/_static/siesta/SIESTA-Features.pdf differ diff --git a/content/_static/siesta/SIESTA-MD.pdf b/content/_static/siesta/SIESTA-MD.pdf new file mode 100644 index 0000000..f7fed7f Binary files /dev/null and b/content/_static/siesta/SIESTA-MD.pdf differ diff --git a/content/_static/siesta/SIESTA-Solvers.pdf b/content/_static/siesta/SIESTA-Solvers.pdf new file mode 100644 index 0000000..9430066 Binary files /dev/null and b/content/_static/siesta/SIESTA-Solvers.pdf differ diff --git a/content/day5/yambo-tutorial.md b/content/day5/yambo-tutorial.md index a61373d..b3b677c 100644 --- a/content/day5/yambo-tutorial.md +++ b/content/day5/yambo-tutorial.md @@ -892,7 +892,7 @@ What can we learn from this plot? In particular, try to answer the following que - How can we decide at which point adding more nodes to the calculation becomes a waste of resources? ```{callout} Note -Keep in mind that the MPI scaling we are seeing here is not the true Yambo scaling, but depends on the small size of our tutorial system. In a realistic calculation for a large-sized system, __Yambo has been shown to scale well up to tens of thousands of MPI tasks__! +Keep in mind that the MPI scaling we are seeing here is not the true Yambo scaling, but depends on the small size of our tutorial system. In a realistic calculation for a large-sized system, __Yambo has been shown to scale well up to tens of thousands of MPI tasks__! (See the next optional box for an example) ``` ````{solution} [OPTIONAL] Comparison with CPU calculation with hybrid parallelization strategy @@ -901,30 +901,23 @@ We have run the same calculation using a version of Yambo compiled in order to r For a CPU calculation, we can use a hybrid parallel structure with threads. The OPENMP threads are controlled by modifying `cpus-per-task` and `OMP_NUM_THREADS` in the submission file. The product of the number of OpenMP threads and MPI tasks is equal to the total number of CPUs. -We have adopted two strategies. First, run 4 MPI tasks per node like in the GPU case, while adding also 8 OPENMP threads (`ntasks*nthreads=ncpu=4*8=32`). -Second, run 32 MPI tasks per node with no multiple threads (`ntasks*nthreads=ncpu=32*1=32`). +For our test, we have used larger convergence parameters than in the previous run, and selected a hybrid parallel scheme with 16 MPI tasks per node, with 2 OPENMP threads (`ntasks*nthreads=ncpu=16*2=32`), since it gives the best scaling in this case. - -For example, in the first case we have: -```bash= -#!/bin/bash -#SBATCH --nodes=4 -#SBATCH --ntasks-per-node=4 -#SBATCH --cpus-per-task=8 -... -export OMP_NUM_THREADS=8 +```{callout} Note +In general (for larger systems) we have tested that the best CPU scaling on Leonardo is actually 4 MPI tasks times 8 OPENMP threads. ``` -while in the second case we have: + +Therefore, in the new CPU submission script we have: ```bash= #!/bin/bash #SBATCH --nodes=4 -#SBATCH --ntasks-per-node=32 -#SBATCH --cpus-per-task=1 +#SBATCH --ntasks-per-node=16 +#SBATCH --cpus-per-task=2 ... -export OMP_NUM_THREADS=1 +export OMP_NUM_THREADS=2 ``` -Actually, we don't need to change the related openMP variables for the yambo input, since the value `0` means "use the value of `OMP_NUM_THREADS`" and we have now set this environment variable to our liking via the submission script. +Actually, we don't need to change the openMP-related variables appearing in the yambo input, since the value `0` means "use the value of `OMP_NUM_THREADS`" and we have now set this environment variable to our liking via the submission script. Otherwise, any positive number can directly specify the number of threads to be used in each section of the code. ``` @@ -935,9 +928,18 @@ X_Threads= 0 # [OPENMP/X] Number of threads for response functions SE_Threads= 0 # [OPENMP/GW] Number of threads for self-energy ``` -You can try to run these calculations and compare the timings with the previous GPU-based runs. +Actively looking for the best scaling on both GPU and CPU for our enlarged MoS2 system we find: + +```{figure} img/CPU_scaling.jpeg +:scale: 80% +``` + +We can see that already for this reasonably small and half-converged system run on a few nodes the GPU calculation easily reaches a speedup of 2x. The speedup vastly increases in larger systems where the calculations are more demanding, as you can see from the scaling tests below (run on the Juwels Booster machine) on a graphene-cobalt interface supercell. -**FIGURE AND EXPLANATION** +```{figure} img/grCo_scaling.png +:scale: 40% +``` +_Scaling comparison of graphene@Co(0001) interface on CPU (left, 48 cpus per node) and GPU (right, 4 GPUs per node). Tests done by Nicola Spallanzani. Data available at: http://www.gitlab.com/max-centre/Benchmarks_ ```{callout} Note - In real-life CPU-based calculations running on {math}`n_{cores} > 100`, as we have seen, it may be a good idea to adopt a hybrid approach. @@ -1117,4 +1119,4 @@ _Dashed lines: DFT, thick lines: GW._ As you can see, the general result is not too bad, but there are some differences both at the DFT and GW levels. The magnitude of the band gap is too large, and the relative energy of the two conduction band minima is not correct. One obvious issue is the lack of convergence of our tutorial calculations. As we know, we should include more vacuum space and many, many more k-points. Additionally, this is a transition metal dichalcogenide: for this class of systems, the details of the band structure can strongly depend on small variations in the lattice parameters and on the type of pseudopotential used. A great deal of care must be taken when performing these calculations! -In order to learn more about Yambo, we suggest visiting the [Yambo website](https://www.yambo-code.eu/). For technical information and tutorials, you can check ou the [Yambo wiki](https://www.yambo-code.eu/wiki/index.php/Main_Page). If you have issues and questions about installing and running the code, you can write them on the [Yambo forum](https://www.yambo-code.eu/forum/index.php). +In order to learn more about Yambo, we suggest visiting the [Yambo website](https://www.yambo-code.eu/). For technical information and tutorials, you can check out the [Yambo wiki](https://www.yambo-code.eu/wiki/index.php/Main_Page). If you have issues and questions about installing and running the code, you can write about them on the [Yambo forum](https://www.yambo-code.eu/forum/index.php). diff --git a/content/days3+4/day3.rst b/content/days3+4/day3.rst index 4dbe71b..8684d05 100644 --- a/content/days3+4/day3.rst +++ b/content/days3+4/day3.rst @@ -28,14 +28,14 @@ Tutorials covered: /leonardo_work/EUHPC_TD02_030/siesta-tutorials/day3-Wed/02-FirstEncounter_II -Introductory slides available here: :download:`SIESTA-First_encounter.pdf ` +Introductory slides available here: :download:`SIESTA-First_encounter.pdf `. Basis sets ---------- Lecture by Dr. Miguel Pruneda (CINN-CSIC). -Slides available here: :download:`SIESTA-Basis_sets.pdf ` +Slides available here: :download:`SIESTA-Basis_sets.pdf `. Basis set optimization @@ -73,5 +73,5 @@ Tutorials covered: /leonardo_work/EUHPC_TD02_030/siesta-tutorials/day3-Wed/04c-SCF -Introductory slides available here: (TBA). +Introductory slides available here: :download:`SIESTA-Convergence.pdf `. diff --git a/content/days3+4/day4.rst b/content/days3+4/day4.rst index a364fa3..ec01043 100644 --- a/content/days3+4/day4.rst +++ b/content/days3+4/day4.rst @@ -16,7 +16,7 @@ Tutorials covered: /leonardo_work/EUHPC_TD02_030/siesta-tutorials/day4-Thu/01-MolecularDynamics -Introductory slides available here: (TBA). +Introductory slides available here: :download:`SIESTA-MD.pdf `. Analysis tools @@ -30,27 +30,30 @@ Tutorials covered: /leonardo_work/EUHPC_TD02_030/siesta-tutorials/day4-Thu/02-Analysis -Introductory slides available here: (TBA). +Introductory slides available here: :download:`SIESTA-Analysis_tools.pdf `. Features available in SIESTA: spin-orbit couplings, TranSIESTA, and others -------------------------------------------------------------------------- -Lecture by Dr. Nick Papior (Technical University of Denmark) +Lecture by Dr. Nick Papior (Technical University of Denmark). -Slides available here: (TBA). +Slides available here: :download:`SIESTA-Features.pdf `. Pushing the boundaries of SIESTA: accelerated and massively parallel solvers ---------------------------------------------------------------------------- -Practical session led by Dr. Alberto García (ICMAB-CSIC) +Practical session led by Dr. Alberto García (ICMAB-CSIC). Tutorials covered: -- TBA. Files avilable at:: +- ELSI-ELPA. +- ELSI-PEXSI. + +Files for the tutorial:: /leonardo_work/EUHPC_TD02_030/siesta-tutorials/day4-Thu/03-SiestaSolvers -Introductory slides available here: (TBA). +Slides available here: :download:`SIESTA-Solvers.pdf `.