diff --git a/.buildinfo b/.buildinfo
new file mode 100644
index 0000000..03c62c8
--- /dev/null
+++ b/.buildinfo
@@ -0,0 +1,4 @@
+# Sphinx build info version 1
+# This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done.
+config: be9b4b929f98b38a87c4babc79888bea
+tags: d77d1c0d9ca2f4c8421862c7c5a0d620
diff --git a/.nojekyll b/.nojekyll
new file mode 100644
index 0000000..e69de29
diff --git a/Day-1/ex0-run/README/index.html b/Day-1/ex0-run/README/index.html
new file mode 100644
index 0000000..55547bd
--- /dev/null
+++ b/Day-1/ex0-run/README/index.html
@@ -0,0 +1,153 @@
+
+
+
We will only use pw.x for this hands-on.
+The executable of QEv7.2 for CPU is compiled in the directory:
+
/leonardo_work/EUHPC_TD02_030/builds/qe7.2/bin
+
+
+
Check that the module works by submitting a quick serial test.
+Open the Slurm batch file, fill the dots with the right numbers/commands (dots at lines 3, 8, 9, 22, 25) and submit it:
+
sbatch ex0-run.slurm
+
+
+
To see the submission status of your job:
+
squeue -u YOUR_USERNAME
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
\ No newline at end of file
diff --git a/Day-1/ex1-pools/README/index.html b/Day-1/ex1-pools/README/index.html
new file mode 100644
index 0000000..4868b89
--- /dev/null
+++ b/Day-1/ex1-pools/README/index.html
@@ -0,0 +1,163 @@
+
+
+
+
+
+
+ EXERCISE 1 - Parallelization with pools — Efficient materials modelling on HPC with Quantum ESPRESSO, Siesta and Yambo documentation
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
Try to predict the best value of npool and check it by performing a series of runs.
+
+
Open the batch file ‘ex1-pools.slurm’ and customize the user-related SLURM options like job-name and mail-user (not essential at all);
+Replace the dots with a list of proper values for npool, e.g:
+
for ip in 1 2 3 4 5 6 -not necessarily the right values here!
+ do
+
+
+
+
Submit the job file:
+
sbatch ex1-pools.slurm
+
+
+
+
Look for total WALL time at the end of each output file:
+
PWSCF : 2m 0.69s CPU 2m 5.06s WALL
+
+
+
+
+
and plot TIME(npool). Which is the best npool value? Why?
+
+
You can try with different numbers of MPI tasks.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
\ No newline at end of file
diff --git a/Day-1/ex2-diag/README/index.html b/Day-1/ex2-diag/README/index.html
new file mode 100644
index 0000000..2bd6748
--- /dev/null
+++ b/Day-1/ex2-diag/README/index.html
@@ -0,0 +1,156 @@
+
+
+
+
+
+
+ EXERCISE 2 - Parallelization of the eigenvalue problem — Efficient materials modelling on HPC with Quantum ESPRESSO, Siesta and Yambo documentation
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
EXERCISE 2 - Parallelization of the eigenvalue problem
EXERCISE 2 - Parallelization of the eigenvalue problem
+
Play with the ndiag parameter by performing a number of runs and seeing variations in the WALL time (if any).
+You can also change the fixed value of npools (the default value for this exercise is 4).
+
+
Replace the dots with a list of proper values for ndiag, e.g:
+
for id in 1 2 3 4 5 6 -not necessarily the right values here!
+ do
+
+
+
+
Submit the job file:
+
sbatch ./ex2-diag.slurm
+
+
+
+
Check the total WALL time at the end of the output file and plot TIME(ndiag).
+Which is the best ndiag value (if any)? Why?
Find out how to best exploit the available CPU resources, by playing with the MPI-related parameters (number of tasks, npools) together with the number of threads.
+Use the batch file ‘ex3-omp.slurm’ to submit your jobs (modify it at your convenience).
+Hints:
+
+
Know the size of your node, e.g. the amount of cores at your disposal (https://wiki.u-gov.it/confluence/display/SCAIUS/UG3.2%3A+LEONARDO+UserGuide);
+
See how the time scaling of your jobs changes just by varying the number of tasks (keep just 1 thread each at first).
+Adapt the npool parameter at each run.
+
Now you can start to explore the OpenMP parallelization by varying the number of threads.
+
Do multiple WALL_TIME plots as a function of the number of MPI tasks and OpenMP threads.
+Which is the best configuration for this exercise?
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
\ No newline at end of file
diff --git a/Day-1/ex4-gpu/README/index.html b/Day-1/ex4-gpu/README/index.html
new file mode 100644
index 0000000..b8a514a
--- /dev/null
+++ b/Day-1/ex4-gpu/README/index.html
@@ -0,0 +1,149 @@
+
+
+
+
+
+
+ EXERCISE 4 - Hands on the GPUs — Efficient materials modelling on HPC with Quantum ESPRESSO, Siesta and Yambo documentation
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
Know the size of your node. Look at: https://wiki.u-gov.it/confluence/display/SCAIUS/UG3.2%3A+LEONARDO+UserGuide
+How many cores? How many GPUs?
+
Use the batch file ‘ex4-gpu.slurm’ to submit a few MPI+GPU runs. You can also check what happens with more MPI tasks than GPUs.
+Dots at lines 8, 9, 10, 24, 26, 28.
+
Enable openMP threading. Do you see any improvement?
+
Consider your best CPU run. How many GPUs were necessary to match the performance?
+If you don’t have your optimized CPU batch file from exercise 3, you can use the one in the reference folder.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
\ No newline at end of file
diff --git a/Day-2/exercise_eels/README/index.html b/Day-2/exercise_eels/README/index.html
new file mode 100644
index 0000000..5f31956
--- /dev/null
+++ b/Day-2/exercise_eels/README/index.html
@@ -0,0 +1,159 @@
+
+
+
+
+
+
+ DAY-2, EXERCISE - Electron energy-loss spectroscopy (EELS) for bulk silicon — Efficient materials modelling on HPC with Quantum ESPRESSO, Siesta and Yambo documentation
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
DAY-2, EXERCISE - Electron energy-loss spectroscopy (EELS) for bulk silicon
DAY-2, EXERCISE - Electron energy-loss spectroscopy (EELS) for bulk silicon
+
+
Run the SCF ground-state calculation with pw.x
+
sbatchsubmit_pw.slurm
+
+
+
+
Perform Lanczos recursions with turbo_eels.x
+
sbatchsubmit_eels.slurm
+
+
+
+
Run the post-processing spectrum calculation with turbo_spectrum.x
+
sbatchsubmit_spectrum.slurm
+
+
+
+
+
The output files Sieels.plot_chi.dat and Sieels.plot_eps.dat contain, respectively,
+the dynamical charge susceptibility and the dynamical dielectric function computed for the
+momentum transfer q chosen for the turbo_eels calculation of step 2.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
\ No newline at end of file
diff --git a/Day-2/exercise_magnon/README/index.html b/Day-2/exercise_magnon/README/index.html
new file mode 100644
index 0000000..7b93075
--- /dev/null
+++ b/Day-2/exercise_magnon/README/index.html
@@ -0,0 +1,163 @@
+
+
+
+
+
+
+ DAY-2, EXERCISE - Dynamical magnetic susceptibility of bulk iron — Efficient materials modelling on HPC with Quantum ESPRESSO, Siesta and Yambo documentation
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
DAY-2, EXERCISE - Dynamical magnetic susceptibility of bulk iron
DAY-2, EXERCISE - Dynamical magnetic susceptibility of bulk iron
+
+
Run the SCF ground-state calculation with pw.x
+
sbatchsubmit_pw.slurm
+
+
+
+
Perform Lanczos recursions with turbo_magnon.x
+
sbatchsubmit_eels.slurm
+
+
+
+
Run the post-processing spectrum calculation with turbo_spectrum.x
+
sbatchsubmit_spectrum.slurm
+
+
+
+
+
The output file Femag.plot_chi.dat contains the column ipol of the dynamical magnetic susceptibility
+for a given momentum transfer q chosen during the turbo_magnon calculation of step 2.
+
In the case of an iron bulk sample polarized along the z direction, it can be shown that the inelastic neutron scattering cross section is
+proportional to the imaginary part of the chi_yy component, which can be extracted, eg, by doing
+
grepchi_2_2Femag.plot_chi.dat>Chi_2_2.dat
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
\ No newline at end of file
diff --git a/Day-2/exercise_ph/solution/step1/README/index.html b/Day-2/exercise_ph/solution/step1/README/index.html
new file mode 100644
index 0000000..c8b6bae
--- /dev/null
+++ b/Day-2/exercise_ph/solution/step1/README/index.html
@@ -0,0 +1,156 @@
+
+
+
+
+
+
+ DAY-2, EXERCISE - Phonon modes of CnSnI3 at Gamma — Efficient materials modelling on HPC with Quantum ESPRESSO, Siesta and Yambo documentation
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
Yambo tutorial: Quasiparticles in the GW approximation
+
In this tutorial you will learn how to run a GW simulation using Yambo on a HPC machine.
+
You will compute the quasiparticle corrections to the band structure of a free-standing single layer of MoS\(_2\) while learning about convergence studies, parallel strategies, and GPU calculations.
+
In the end, you will obtain a quasiparticle band structure based on the simulations, the first step towards the reproduction of an ARPES spectrum. Beware: we will not use fully converged parameters, so the final result should not be considered very accurate.
+
+
MoS\(_2\) monolayer (top and side views). Gray: Mo atoms, yellow: S atoms.
Here \(\epsilon_{nk}\) and \(\psi_{nk}\) are the Kohn-Sham energies and wavefunctions, respectively, while \(V_{xc}\) is the DFT exchange-correlation potential.
+
For each electronic state \(nk\), the self-energy can be separated into two components: a static, gap-opening term called the exchange self-energy (\(\Sigma^x\)), and an energy-dependent, usually gap-closing term called the correlation self-energy (\(\Sigma^c\)). These contributions are tackled separately by the code:
The energy-dependent dynamical electronic screening \(\varepsilon^{-1}(\omega)\) is included in \(\Sigma^c\) and must therefore be calculated as well.
+
In this way, we can compute the “quasiparticle” corrections \(E^{QP}_{nk}\) to the single-particle Kohn-Sham eigenvalues \(\epsilon_{nk}\).
+The typical workflow for a GW calculation is:
First of all, we need to convert some of the data produced in a previous non-self-consistent DFT calculation (in this case using Quantum ESPRESSO) into a convenient format for Yambo.
+
The QE save folder for MoS\(_2\) is already present at 00_QE-DFT. We should move inside it and then run the p2y executable to generate the uninitialised SAVE.
+
But first, we need to access a node interactively:
Now, we need to run the initialization step. Every Yambo run must start with this step. This will be automatically performed when you run Yambo on a SAVE directory for the first time. Just type
+
mpirun -np 1 yambo
+
+
+
and then check the yambo log called l_setup. The initialization step determines the \(G\)-vector shells and the \(k\)- and \(q\)-point grids based on the DFT calculation. If you check inside the SAVE you will see two types of databases. The static ones, starting with ns.*, are directly converted in the p2y step, while the dynamical ones, ndb.* are generated during the initialisation.
+
ls SAVE/
+
+
+
ndb.gops # info on G-vector shells, etc
+ndb.kindx # info and k and q meshes
+ns.db1 # info on geometry and KS bands
+ns.kb_pp_pwscf # pseudopotential info
+ns.wf # wave functions info
+...
+
+
+
The databases are written in netCDF format.
+Yambo has produced also a human readable output, r_setup, reporting relevant information such as lattice parameters, symmetries, atomic positions, k-points, DFT eigenvalues and band gaps. We can have a quick look at the sections.
+
vim r_setup
+
+
+
[02.01] Unit cells # Lattice geometry info
+==================
+ ...
+[02.02] Symmetries # Symmetry ops. written explicitly
+==================
+ ...
+[02.03] Reciprocal space # Reciprocal lattice info
+========================
+...
+[02.04] K-grid lattice # k-point coords. written explicitly
+======================
+...
+[02.05] Energies & Occupations # Info on band gaps and occupations
+============================== # DFT eigenvalues
+...
+[03] Transferred momenta grid and indexing # q-points (momentum transfer grid)
+==========================================
+
+
+
Finally, let us move the SAVE and the report file to the directory where we will run the first GW calculation.
+
mv SAVE r_setup ../../01_GW_first_run/
+cd ../../01_GW_first_run/
+
In order to build our input, we need to use the options for a GW calculation. We want to use the plasmon pole approximation for the dynamical screening, solve the quasiparticle equation with the Newton method, and add a truncation of the Coulomb potential which is useful for 2D systems. In addition, we want to set up explicitly the parameters for parallel runs. Therefore we type:
+
yambo -gw0 p -g n -r -V par -F gw.in
+
+
+
Now we can exit the computing node, since all remaining calculations will be run by submitting jobs with the slurm scheduler.
+
exit
+
+
+
You can now inspect the input file gw.in and try to familiarize with some of the parameters. The input will come with default values for many parameters that we might need to change.
The [R] keyword refers to the runlevels: these flag tell Yambo which parts of the code should be executed. Each runlevel enables its own set of input variables. In particular here we have:
+
+
rim_cut: Coulomb potential random integration method and cutoff (enables [RIM] and [CUT] variables).
+
gw0: Yambo learns that it has to run a GW calculation (enables [GW] variables).
+
HF_and_locXC: calculation of exchange part of the self-energy \(\Sigma^x\) (i.e., Hartree-Fock approximation).
+
em1d: enables the calculation of the dynamical screening of the electrons, i.e. the dielectric matrix ([X] variables). In this way Yambo can go beyond Hartree-Fock and compute \(\Sigma^c\).
+
ppa: tells Yambo that the dynamical screening should be computed in the plasmon pole approximation ([Xp] variables).
+
dyson: Yambo will solve the Dyson-like quasiparticle equation.
EXXRLvcs controls the number of Reciprocal Lattice vectors (i.e., G-vectors) used to build the exchange self-energy, while VXCRLvcs does the same for the exchange-correlation potential reconstructed from DFT. Since these two quantities are to be subtracted, it is important to keep the same values here (and possibly not change the default maximum value).
+
+
+
Let us now have a look at the parameters for the calculation of the correlation part of the self-energy. Recall that we have:
(Here, the \(\rho\)-terms represent the screening matrix elements which are computed separately by yambo and stored in their own database.)
+
The calculation is divided in two steps. First, the response function in the plasmon pole approximation (em1dppa), under the keywords [X] and [Xp], i.e., \(\varepsilon^{-1}_{GG'}(q,\omega)\).
+
Chimod="HARTREE"# [X] IP/Hartree/ALDA/LRC/PF/BSfxc
+%BndsRnXp
+ 1|300|# [Xp] Polarization function bands
+%
+NGsBlkXp=1RL# [Xp] Response block size
+%LongDrXp
+ 1.000000|0.000000|0.000000|# [Xp] [cc] Electric Field
+%
+PPAPntXp=27.21138eV# [Xp] PPA imaginary energy
+XTermKind="none"# [X] X terminator ("none","BG" Bruneval-Gonze)
+
+
+
+
Response function
+
+
Chimod="Hartree" indicates that we compute the response function in the Random Phase Approximation (RPA).
+
BndsRnXp represents the electronic states included in the response function \(\varepsilon\), and is a convergence parameter.
+
NGsBlkXp is the number of G-vectors used to calculate the RPA response function \(\varepsilon^{-1}_{GG'}\). It is a convergence parameter and can be expressed in number of reciprocal lattice vectors (RL) or energy (Ry, suggested).
+
LongDrXp represents the direction of the long-range auxiliary external electric field used to compute \(\varepsilon^{-1}_{GG'}(q)\) at \(q,G\rightarrow 0\). In general you have to be mindful of the system symmetries. In our case, we will put 1|1|1 to cover all directions.
+
PPAPntXp=27.21138eV is the energy of the plasmon pole. We don’t normally change this.
+
XTermKind is used to specify a “terminator”: this accelerates numerical convergence with respect to the number of bands BndsRnXp.
+
+
+
Next, we have the [GW] group of parameters controlling the next part of the correlation self-energy calculation:
GbndRnge is the number of bands used to build the correlation self-energy. It is a convergence parameter and can be accelerated with the terminator GTermKind.
+
DysSolver="n" specifies the method used to solve the linearised quasiparticle equation. In most cases, we use the Newton method "n".
+
QPkrange indicates the range of electronic (nk) states for which the GW correction \(\Sigma_{nk}\) is computed. The first pair of numbers represents the range of k-point indices, the second pair the range of band indices.
+
+
+
We now take a look at the parameters relative to the Coulomb interaction at small momenta and for 2D systems, which we should edit now once and for all.
+
RandQpts=1000000# [RIM] Number of random q-points in the BZ
+RandGvec=100RL# [RIM] Coulomb interaction RS components
+CUTGeo="slab z"# [CUT] Coulomb Cutoff geometry: box/cylinder/sphere/ws/slab X/Y/Z/XY..
+
+
+
+
Coulomb potential
+
+
The [RIM] keyword refers to a Monte Carlo random integration method performed to avoid numerical instabilities close to \(q=0\) and \(G=0\) in the \(q\)-integration of the bare Coulomb interaction - i.e. \(4\pi/(q+G)^2\) - for 2D systems.
+
The [CUT] keyword refers to the truncation of the Coulomb interaction to avoid spurious interaction between periodically repeated copies of the simulation supercell along the \(z\)-direction (we are working with a plane-wave code). Keep in mind that the vacuum space between two copies of the system should be converged: here we are using 20 bohr but a value of 40 bohr would be more realistic.
+
+
+
Finally, we have the parallel parameters. We are going to discuss them at the end of the parallel section, we can skip them for now.
+
X_and_IO_CPU=""# [PARALLEL] CPUs for each role
+X_and_IO_ROLEs=""# [PARALLEL] CPUs roles (q,g,k,c,v)
+X_and_IO_nCPU_LinAlg_INV=-1# [PARALLEL] CPUs for Linear Algebra (if -1 it is automatically set)
+X_Threads=0# [OPENMP/X] Number of threads for response functions
+DIP_CPU=""# [PARALLEL] CPUs for each role
+DIP_ROLEs=""# [PARALLEL] CPUs roles (k,c,v)
+DIP_Threads=0# [OPENMP/X] Number of threads for dipoles
+SE_CPU=""# [PARALLEL] CPUs for each role
+SE_ROLEs=""# [PARALLEL] CPUs roles (q,qp,b)
+SE_Threads=0# [OPENMP/GW] Number of threads for self-energy
+
+
+
In a GW calculation, the most important parameters to be numerically converged are:
From the above discussion you can easily guess that many-body perturbation theory calculations are much more numerically expensive than DFT calculations.
We will start by running a single GW calculation. Here we will focus on the magnitude of the quasiparticle gap. This means that we only need to calculate two quasi-particle corrections, i.e., valence and conduction bands at the k-point where the minimum gap occurs. This information can be found by inspecting the report file r_setup produced when the SAVE folder was initialised. Just search for the string ‘Direct Gap’ and you’ll see that the latter occurs at k-point 7 between bands 13 and 14:
In addition, we will set the number of bands in BndsRnXp and GbndRnge to a small value, just to have it run fast. Hence, we modify the input file accordingly (check BndsRnXp, GbndRnge, LongDrXp, QPkrange):
+
rim_cut# [R] Coulomb potential
+gw0# [R] GW approximation
+ppa# [R][Xp] Plasmon Pole Approximation for the Screened Interaction
+dyson# [R] Dyson Equation solver
+HF_and_locXC# [R] Hartree-Fock
+em1d# [R][X] Dynamically Screened Interaction
+X_Threads=0# [OPENMP/X] Number of threads for response functions
+DIP_Threads=0# [OPENMP/X] Number of threads for dipoles
+SE_Threads=0# [OPENMP/GW] Number of threads for self-energy
+RandQpts=1000000# [RIM] Number of random q-points in the BZ
+RandGvec=100RL# [RIM] Coulomb interaction RS components
+CUTGeo="slab z"# [CUT] Coulomb Cutoff geometry: box/cylinder/sphere/ws/slab X/Y/Z/XY..
+%CUTBox
+ 0.000000|0.000000|0.000000|# [CUT] [au] Box sides
+%
+CUTRadius=0.000000# [CUT] [au] Sphere/Cylinder radius
+CUTCylLen=0.000000# [CUT] [au] Cylinder length
+CUTwsGvec=0.700000# [CUT] WS cutoff: number of G to be modified
+EXXRLvcs=37965RL# [XX] Exchange RL components
+VXCRLvcs=37965RL# [XC] XCpotential RL components
+Chimod="HARTREE"# [X] IP/Hartree/ALDA/LRC/PF/BSfxc
+%BndsRnXp
+ 1|20|# [Xp] Polarization function bands
+%
+NGsBlkXp=1RL# [Xp] Response block size
+%LongDrXp
+ 1.000000|1.000000|1.000000|# [Xp] [cc] Electric Field
+%
+PPAPntXp=27.21138eV# [Xp] PPA imaginary energy
+XTermKind="none"# [X] X terminator ("none","BG" Bruneval-Gonze)
+%GbndRnge
+ 1|20|# [GW] G[W] bands range
+%
+GTermKind="none"# [GW] GW terminator ("none","BG" Bruneval-Gonze,"BRS" Berger-Reining-Sottile)
+DysSolver="n"# [GW] Dyson Equation solver ("n","s","g")
+%QPkrange# [GW] QP generalized Kpoint/Band indices
+7|7|13|14|
+%
+
+
+
We are now ready to run this calculation. Since you should never run a Yambo calculation on the login node, we will need a submission script to add our job to the queue. A submission script optimized for Leonardo Booster (running on GPUs) is provided as an example. Modify it to suit your specific machine.
We will ignore all details regarding parallelization, as it will be covered in the next section. Since there are no lowercase flags after yambo, it is not going to generate an input file, but rather, run the one specified by -F. Now, go ahead an submit this job
+
sbatch run_first_job.sh
+
+
+
The status of the jobs can be monitored via:
+
squeue -u $USER # to inspect the status of jobs
+ # (hint: make a unix alias, if you like)
+scancel <jobid> # to delete jobs in the queue
+
+
+
The newly generated databases will be stored in the job directory, as specified by -J, while the report, log and output files will be stored in the communications directory (-C). As this is your first yambo run, take a moment to inspect the report and log files, which you can find inside the -C directory. In these report and log files, you can see the steps performed by yambo. For instance, the code calculates the screening at every k-point and stores it in the PPA database called ndb.pp. By opening the report
+
vim out_00_first_run/r-job_00_first_run_HF_and_locXC_gw0_dyson_rim_cut_em1d_ppa
+
In this file, Eo is our starting point (DFT) while E-Eo shows the GW correction one should apply to obtain the quasi-particle energies. In order to calculate the gap (automatically from the command line), we’ll use some simple commands. First, we get everything that is not a # symbol grep-v'#' and we pass that to another command with a “pipe” |. Then, tail-n1/head-n1 will retain the first/last line, and awk'{print$3+$4}' will get us the sum of the third and fourth columns. Altogether, this would be as follows
In this part of the tutorial, we will study convergence with respect to some of the parameters mentioned above. In order to complete this tutorial within a single hands-on session, we will restrict ourselves to a very coarse \(k\)-point grid.
+Hence, we will perform our convergence studies on top of a DFT calculation done with a \(6 \times 6 \times 1\) k-point grid and without spin-orbit coupling: the SAVE we generated earlier.
+While this will speed up calculations and could be run even on a single GPU card, you should be aware that such coarse sampling of the BZ is significantly underconverged and should only be used for educational purposes. In addition, spin-orbit interaction is extremely relevant for the valley physics of MoS2 and should not be neglected in realistic calculations.
+Let’s move into the appropriate directory
+
cd ../02_GW_convergence
+
+
+
+
Response function \(\varepsilon^{-1}_{GG'}\) - Bands and G-vectors
+
We are now ready to start our convergence tests. We’ll begin with the variables controlling the polarization function, i.e., NGsBlkXp for the number of G-vectors and BndsRnXp for the number of bands. For this, we will keep GbndRnge constant at a reasonably high value - you can inspect the input i01-GW
+
vim i01-GW
+
+
+
and check that you have:
+
%GbndRnge
+ 1|80|# [GW] G[W] bands range
+%
+
+
+
Since we need to run yambo for several values of NGsBlkXp and BndsRnXp, it makes sense to use two nested loops. That is exactly what we did in the submission script run01_converge_pol.sh. Since this will take a few minutes, save time by submitting it straight away and we will have a look at it while it runs:
+
sbatch run01_converge_pol.sh
+
+
+
+
Monitoring
+
You can monitor that the job is running by the squeue command
+
sbatch squeue -u $USER
+
+
+
and also by checking the files created in your folder
Let’s now have look into the job we just submitted.
+
vimrun01_converge_pol.sh
+
+
+
First, we defined the double loop and we intialize a summary file for each iteration of the outer loop by printing a header to it. The input file i01-GW is used as a template for every calculation in the loops, so we assign it to a variable.
Inside the loops, we generate some useful labels which will come in handy to distinguish between runs. Then, we pass the variables from the loops to the sed command, in order to generate new files in an automated way (sed replaces any matching string with whatever is provided by the loop variable). Next, we run yambo using the labels to specify different job -J and communication -C directories every time. Finally, we get the quasiparticle energies with grep commands as shown before and append a new line to the summary file. So, inside each loop, we have
Finally, let us plot this data. First, check that the job has finished with
+
squeue -u $USER
+
+
+
[$USER@login01 02_GW_convergence]$ squeue-u$USER
+ JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
+[$USER@login01 02_GW_convergence]$
+
+
+
and verify that the energies were extracted correctly by inspecting the summary files. Remember to load the python module if you haven’t done so yet, and then plot:
The plot will produce a fig-01.png file.
+You can copy and open it in your local machine with something like
+
[Run this on another terminal in your local machine, fixing $USER, $LOGIN and $TUTORIALPATH]
+scp $USER@$LOGIN:$TUTORIALPATH/MoS2_HPC_tutorial_Leonardo/02_GW_convergence/fig-01.png ./
+
+
+
You should get:
+
+
For the purpose of the tutorial, we will choose 80 bands and 10 Ry as our converged parameters and move on. An error within 10 meV is usually acceptable. To retain the chosen variables, we’ll make a copy of the corresponding input file:
We will now proceed to converge the number of bands for the correlation part of the self-energy, i.e., GbndRnge. This step is actually simpler, since it only involves one loop. This is coded in the provided script run02_converge_Gbnds.noBG.sh. You can look into it
+
vimrun02_converge_Gbnds.noBG.sh
+
+
+
and go ahead and submit it.
+
sbatchrun02_converge_Gbnds.noBG.sh
+
+
+
+
[OPTIONAL]: Use the terminator to accelerate convergence
+
While that runs, we’ll have a look at the so-called Bruneval-Gonze (BG) terminator, which is a method to accelerate convergence with respect to empty bands. The variable that controls this for the bands in the correlation self-energy is GTermKind. This is currently set to “none” in i02-GW, so create a new input file i02-GW_BG and set this variable to “BG”. We can do this in the command line by simply typing
Note that XTermKind also offers the same terminator for the sum over bands of the polarization function (we just chose not to use it in the previous section of this excercise, and we will keep it as “none”). Now, copy the last submission script and edit it to run the same convergence test using the BG terminator.
Try and do this yourself first, and then continue reading to check your understanding.
+You will have to change the input file template, i.e., use i02-GW_BG where the terminator has been activated. Modify also the name of the newly generated input files in order to avoid overwriting. Change the name of the summary file for the same reason and, finally, modify the communications and job directories of yambo. Make sure you’ve done all the changes as outlined below.
While this runs, check if the previous job has finished, i.e., you should have a complete summary_02_noBG.txt file by now.
+For a visual result, proceed to plot them with
+
pythonplot-02.py
+
+
+
You should get
+
+
+
OPTIONAL
+
If you also did the optional step, you can compare summary_02_noBG.txt with summary_03_BG.txt once run03_converge_Gbnds.BG.sh has finished - you’ll see the effect of the terminator immediately.
+Just open the plot-02.py script and uncomment the line #list_of_files=['summary_02_noBG.txt','summary_03_BG.txt'], then rerun it with pythonplot-02.py.
+You can see that the terminator does a great job at accelerating convergence, and it allows us to use 60 bands incurring an error of only 3 meV (while the latter would have been larger than 0.1 eV had we not used the terminator).
+
+
We will end the convergence part of the tutorial with an important consideration about k-points convergence. The latter is the most cumbersome and computationally intensive among the various convergence tests, and it involves re-running the DFT step. For this reason (and for this reason only) it was ignored in this tutorial. However, it absolutely cannot be overlooked since it is crucial for the accuracy of the calculated GW corrections. You can read about k-points convergence in GW and, importantly, a very efficient workaround for 2D systems in a recent publication (here). MoS\(_2\) was one of the materials studied there, and it shows that our result, obtained with a \(6 \times 6 \times 1\) k-grid, is simply off the chart (blue line).
+
+
Guandalini, D’Amico, Ferretti & Varsano. npj Comput Mater 9
+
+
[OPTIONAL]: Use the RIM-W accelerator
+
However you can try to get a reasonable correction via the RIM-W approach. Create a new input file copying a suitable one:
Note that we also added FFTGvecs to reduce the size of the Fourier transforms (the default corresponds to Quantum ESPRESSO ecutwfc, i.e. 60 Ry in this case).
+
In addition, we have deleted all the parallel parameters since we will be setting them via the submission script.
+
Actually we are now dealing with a heavier system than before: as you can see from the QPkrange values, we have switched to a 12x12x1 k-point grid - having 19 points in the irreducible Brillouin zone - and turned spin-orbit coupling on in the DFT calculation (now the top valence band is number 26 instead of 13 because the bands are spin-polarized).
+
For this part of the tutorial, we will be using the slurm submission script job_parallel.sh, which is available in the calculation directory.
+If you inspect it, you will see that the script adds additional variables to the yambo input file.
+These variables control the parallel execution of the code:
+
DIP_CPU="1 $ngpu 1"# [PARALLEL] CPUs for each role
+DIP_ROLEs="k c v"# [PARALLEL] CPUs roles (k,c,v)
+DIP_Threads=0# [OPENMP/X] Number of threads for dipoles
+X_and_IO_CPU="1 1 1 $ngpu 1"# [PARALLEL] CPUs for each role
+X_and_IO_ROLEs="q g k c v"# [PARALLEL] CPUs roles (q,g,k,c,v)
+X_and_IO_nCPU_LinAlg_INV=1# [PARALLEL] CPUs for Linear Algebra (if -1 it is automatically set)
+X_Threads=0# [OPENMP/X] Number of threads for response functions
+SE_CPU="1 $ngpu 1"# [PARALLEL] CPUs for each role
+SE_ROLEs="q qp b"# [PARALLEL] CPUs roles (q,qp,b)
+SE_Threads=0# [OPENMP/GW] Number of threads for self-energy
+
+
+
The keyword DIP refers to the calculations of the screening matrix elements (also called “dipoles”) needed for the screening function, X is the screening function itself (it stands for \(\chi\) since it is a response function), SE the self-energy.
+These three sections of the code can be parallelised independently.
+
We are running on GPUs. In particular, each node hosts four GPU cards. Yambo is coded in such a way that each MPI task is run on a single card, therefore ntasks=ngpu.
+
+
Note
+
+
In this subsection we are mainly concerned with the [PARALLEL] variables which refer to MPI tasks (distributed memory).
+
What about [OPENMP] parallelisation (i.e., adding threads with shared memory)? When Yambo is run on GPUs, the explicit threading that you can set in the input and submission script only applies to the very few sections of the code that are not run on GPU cards, but stay on the CPUs. Therefore, in a GPU calculation, CPU-only threads are not going to be a relevant factor in the performance of the code. We keep them fixed to 8 since on Leonardo Booster (32 CPUs and 4 GPUs per node) the best hybrid parallel setup for CPUs is 4 tasks times 8 threads. We will see an example of the impact of threads in a CPU-only calculation later.
+
+
+
We start by calculating the QP corrections using 4 MPI tasks / GPUs. We leave the number of openMP threads at 8, the optimized value for Yambo on Leonardo. Therefore, edit the submission script as:
This will create a new input file and run it. The calculation databases and the human-readable files will be put in separate directories. Check the location of the report r-* file and the log l-* files, and inspect them while the calculation runs.
+For simplicity you can just type
+
tail-frun_MPI4_OMP8.out/LOG/l-*_CPU_1
+
+
+
to monitor the progress in the master thread (Ctrl+c to exit).
+As you can see, the run takes some time, even though we are using minimal parameters.
+
Meanwhile, we can run other jobs increasing the parallelisation. Let’s employ 16 MPI tasks / GPUs (i.e., 4 nodes on Leonardo). To this end modify the job_parallel.sh script changing
This time the code should be much faster. Once the run is over, try to run the simulation also on 8 MPI tasks by changing nodes appropriately. Finally, you can try to produce a scaling plot.
+
The timings are contained in the r-* report files. You can already have a look at them typing
+
grepTime-Profilerun_MPI*/r-*
+
+
+
The python script parse_ytiming.py is useful for the post-processing of report files. You can already find it in the directory, together with the reports for the longer calculations with 1 and 2 MPI tasks which have been provided.
+
If you didn’t do so already, load the python module
+
moduleloadanaconda3/2023.03
+
+
+
Then, after your jobs have finished, run the script as
+
pythonparse_ytiming.pyrun_MPI
+
+
+
to look for a report file in each run_MPI*.out folder. Make sure you have only one report file per folder.
+You can also play with the script to make it print detailed timing information, however you should already see that it produced a png plot showing times-to-completion on y axis against number of MPI tasks (i.e., GPUs in this case) on the x axis.
+
+
What can we learn from this plot? In particular, try to answer the following questions:
+
+
Up to which number of MPI tasks our system scales efficiently?
+
How can we decide at which point adding more nodes to the calculation becomes a waste of resources?
+
+
+
Note
+
Keep in mind that the MPI scaling we are seeing here is not the true Yambo scaling, but depends on the small size of our tutorial system. In a realistic calculation for a large-sized system, Yambo has been shown to scale well up to tens of thousands of MPI tasks! (See the next optional box for an example)
+
+
+
[OPTIONAL] Comparison with CPU calculation with hybrid parallelization strategy
+
We have run the same calculation using a version of Yambo compiled in order to run on CPUs. This is not the preferred approach in an accelerator-based machine like Leonardo, but it can be instructive.
+
For a CPU calculation, we can use a hybrid parallel structure with threads. The OPENMP threads are controlled by modifying cpus-per-task and OMP_NUM_THREADS in the submission file. The product of the number of OpenMP threads and MPI tasks is equal to the total number of CPUs.
+
For our test, we have used larger convergence parameters than in the previous run, and selected a hybrid parallel scheme with 16 MPI tasks per node, with 2 OPENMP threads (ntasks*nthreads=ncpu=16*2=32), since it gives the best scaling in this case.
+
+
Note
+
In general (for larger systems) we have tested that the best CPU scaling on Leonardo is actually 4 MPI tasks times 8 OPENMP threads.
+
+
Therefore, in the new CPU submission script we have:
Actually, we don’t need to change the openMP-related variables appearing in the yambo input, since the value 0 means “use the value of OMP_NUM_THREADS” and we have now set this environment variable to our liking via the submission script.
+Otherwise, any positive number can directly specify the number of threads to be used in each section of the code.
+
DIP_Threads=0# [OPENMP/X] Number of threads for dipoles
+...
+X_Threads=0# [OPENMP/X] Number of threads for response functions
+...
+SE_Threads=0# [OPENMP/GW] Number of threads for self-energy
+
+
+
Actively looking for the best scaling on both GPU and CPU for our enlarged MoS2 system we find:
+
+
We can see that already for this reasonably small and half-converged system run on a few nodes the GPU calculation easily reaches a speedup of 2x. The speedup vastly increases in larger systems where the calculations are more demanding, as you can see from the scaling tests below (run on the Juwels Booster machine) on a graphene-cobalt interface supercell.
+
+
Scaling comparison of graphene@Co(0001) interface on CPU (left, 48 cpus per node) and GPU (right, 4 GPUs per node). Tests done by Nicola Spallanzani. Data available at: http://www.gitlab.com/max-centre/Benchmarks
+
+
Note
+
+
In real-life CPU-based calculations running on \(n_{cores} > 100\), as we have seen, it may be a good idea to adopt a hybrid approach.
+
The most efficient scaling can depend both on your system and on the HPC facility you’re running on. For a full CPU node on Leonardo (32 cores), using a large-scale system, we have found that 4 tasks times 8 threads gives the best performance.
+
OpenMP can help lower memory requirements within a node. You can try to increase the OpenMP share of threads if you are getting Out Of Memory errors.
+
+
+
+
.
+
+
[OPTIONAL]: Comparing different parallelisation schemes
+
Up to now we always parallelised over a single parameter, i.e. c or qp. However, Yambo allows for tuning the parallelisation scheme over several parameters broadly corresponding to “loops” (i.e., summations or discrete integrations) in the code.
+
To this end you can open again the run_parallel.sh script and modify the section where the yambo input variables are set.
+
For example, X_CPU sets how the MPI Tasks are distributed in the calculation of the response function. The possibilities are shown in the X_ROLEs. The same holds for SE_CPU and SE_ROLEs which control how MPI Tasks are distributed in the calculation of the self-energy.
+
X_CPU= "1 1 1 $ncpu 1" # [PARALLEL] CPUs for each role
+X_ROLEs= "q g k c v" # [PARALLEL] CPUs roles (q,g,k,c,v)
+X_nCPU_LinAlg_INV= $ncpu # [PARALLEL] CPUs for Linear Algebra
+X_Threads= 0 # [OPENMP/X] Number of threads for response functions
+DIP_Threads= 0 # [OPENMP/X] Number of threads for dipoles
+SE_CPU= "1 $ncpu 1" # [PARALLEL] CPUs for each role
+SE_ROLEs= "q qp b" # [PARALLEL] CPUs roles (q,qp,b)
+SE_Threads= 0 # [OPENMP/GW] Number of threads for self-energy
+
+
+
You can try different parallelization schemes and check the performances of Yambo. In doing so, you should also change the jobname label=MPI${ncpu}_OMP${nthreads} in the run_parallel.sh script to avoid confusion with previous calculations.
+
You may then check how speed, memory and load balance between the CPUs are affected. You could modify the script parse_ytiming.py to parse the new data, read and distinguish between more file names, new parallelisation options, etc.
+
+
Note
+
+
The product of the numbers entering each variable (i.e. X_CPU and SE_CPU) times the number of threads should always match the total number of cores (unless you want to overload the cores taking advantage of multi-threads).
+
Using the X_Threads and SE_Threads variables you can think about setting different hybrid schemes between the screening and the self-energy runlevels.
+
Memory scales better if you parallelize on bands (cvb).
+
Parallelization on k-points performs similarly to parallelization on bands, but requires more memory.
+
Parallelization on q-points requires much less communication between the MPI tasks. It may be useful if you run on more than one node and the inter-node connection is slow.
This is the final section of the tutorial, in which we want to compute the full correction to the band structure of single-layer MoS\(_2\).
+
This is a massive calculation, so run it right now and we’ll discuss it in the meantime:
+
cd../04_GW_bands
+sbatchgpu_job.sh
+
+
+
In order to get somewhat realistic results, we will use the larger values for the convergence parameters we have identified in the convergence section. In addition, we also increased the vacuum separation (to 30 au) and the k-point mesh (to 18x18x1) in the DFT calculation, and of course we consider spin-orbit coupling.
+
Now we have a heavier calculation, and we have to do it not just for the band gap, but the entire band structure which includes 37 kpoints in the irreducible Brillouin zone, two spin-orbit-split valence bands, and two spin-orbit-split conduction bands. Let us check the new input:
+
vimgw.in
+
+
+
%QPkrange# [GW] QP generalized Kpoint/Band indices
+1|37|25|28|
+%
+
+
+
After about 3 minutes the calculation should be over and the results collected in folder GW_bnds. The quasiparticle corrections are stored in human-readable form in the file o-GW_bnds.QP, and in netCDF format in the quasiparticle database ndb.QP.
+
In order to visualize the results in the form of a GW band structure, we will first interpolate the calculated points - recall that we have just 37 points, few of which lie on high-symmetry lines - with ypp, the yambo pre- and post-processing executable.
+
+
Note
+
We also have a python-based interface for advanced treatment of all the Yambo databases, called Yambopy. You can check it out here on the Yambo wiki.
We can review the options with ypp-h and generate an input file for band structure interpolation with
+
ypp-sb-Fypp_bands.in
+
+
+
Let us modify the resulting input file by selecting the ‘boltztrap’ approach to interpolation, the last two valence and first two conduction bands, and a path in the Brillouin zone along the the points \(\Gamma-M-K-\Gamma\). We also set 100 points for each high-symmetry line.
+
electrons# [R] Electronic properties
+bnds# [R] Bands
+PROJECT_mode="none"# Instruct ypp how to project the DOS. ATOM, LINE, PLANE.
+INTERP_mode="BOLTZ"# Interpolation mode (NN=nearest point, BOLTZ=boltztrap aproach)
+INTERP_Shell_Fac=20.00000# The bigger it is a higher number of shells is used
+INTERP_NofNN=1# Number of Nearest sites in the NN method
+OutputAlat=5.90008# [a.u.] Lattice constant used for "alat" ouput format
+cooIn="rlu"# Points coordinates (in) cc/rlu/iku/alat
+cooOut="rlu"# Points coordinates (out) cc/rlu/iku/alat
+%BANDS_bands
+ 25|28|# Number of bands
+%
+CIRCUIT_E_DB_path="none"# SAVE obtained from the QE `bands` run (alternative to %BANDS_kpts)
+BANDS_path=""# High-Symmetry points labels (G,M,K,L...) also using composed positions (0.5xY+0.5xL).
+BANDS_steps=100# Number of divisions
+#BANDS_built_in # Print the bands of the generating points of the circuit using the nearest internal point
+%BANDS_kpts# K points of the bands circuit
+ 0.00000|0.00000|0.00000|
+ 0.00000|0.50000|0.00000|
+ 0.33333|0.33333|0.00000|
+ 0.00000|0.00000|0.00000|
+%
+
+
+
Now, let’s run ypp:
+
mpirun-np1ypp-Fypp_bands.in
+
+
+
This run will produce the file o.bands_interpolated. You can inspect it and see that it contains a plottable band structure, but beware: these are the DFT eigevalues! We didn’t tell ypp where to look for the quasiparticle corrections, so it went into the SAVE folder and interpolated the DFT data.
+Let’s rename the output:
Now we are ready to visualize the band structures. In order to do so, you can use the script plt_bands.py that should be already available in the directory.
+
We load the python module
+
moduleloadanaconda3/2023.03
+
+
+
and run the script as
+
pythonplt_bands.pyo.DFT_bandso.GW_bands4
+
+
+
Now we can also exit the computing node
+
exit
+
+
+
The python script should have produced a GW_bands.png file containing the following visualization, which you can copy and open it in your local machine using scp:
+
+
You may compare this plot with a converged result from this paper (also done with Yambo):
+
+
Dashed lines: DFT, thick lines: GW.
+
As you can see, the general result is not too bad, but there are some differences both at the DFT and GW levels. The magnitude of the band gap is too large, and the relative energy of the two conduction band minima is not correct. One obvious issue is the lack of convergence of our tutorial calculations. As we know, we should include more vacuum space and many, many more k-points. Additionally, this is a transition metal dichalcogenide: for this class of systems, the details of the band structure can strongly depend on small variations in the lattice parameters and on the type of pseudopotential used. A great deal of care must be taken when performing these calculations!
+
In order to learn more about Yambo, we suggest visiting the Yambo website. For technical information and tutorials, you can check out the Yambo wiki. If you have issues and questions about installing and running the code, you can write about them on the Yambo forum.
We will only use pw.x for this hands-on.
+The executable of QEv7.2 for CPU is compiled in the directory:
+
/leonardo_work/EUHPC_TD02_030/builds/qe7.2/bin
+
+
+
Check that the module works by submitting a quick serial test.
+Open the Slurm batch file, fill the dots with the right numbers/commands (dots at lines 3, 8, 9, 22, 25) and submit it:
+
sbatch ex0-run.slurm
+
+
+
To see the submission status of your job:
+
squeue -u YOUR_USERNAME
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
\ No newline at end of file
diff --git a/branch/main/Day-1/ex1-pools/README/index.html b/branch/main/Day-1/ex1-pools/README/index.html
new file mode 100644
index 0000000..4868b89
--- /dev/null
+++ b/branch/main/Day-1/ex1-pools/README/index.html
@@ -0,0 +1,163 @@
+
+
+
+
+
+
+ EXERCISE 1 - Parallelization with pools — Efficient materials modelling on HPC with Quantum ESPRESSO, Siesta and Yambo documentation
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
Try to predict the best value of npool and check it by performing a series of runs.
+
+
Open the batch file ‘ex1-pools.slurm’ and customize the user-related SLURM options like job-name and mail-user (not essential at all);
+Replace the dots with a list of proper values for npool, e.g:
+
for ip in 1 2 3 4 5 6 -not necessarily the right values here!
+ do
+
+
+
+
Submit the job file:
+
sbatch ex1-pools.slurm
+
+
+
+
Look for total WALL time at the end of each output file:
+
PWSCF : 2m 0.69s CPU 2m 5.06s WALL
+
+
+
+
+
and plot TIME(npool). Which is the best npool value? Why?
+
+
You can try with different numbers of MPI tasks.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
\ No newline at end of file
diff --git a/branch/main/Day-1/ex2-diag/README/index.html b/branch/main/Day-1/ex2-diag/README/index.html
new file mode 100644
index 0000000..2bd6748
--- /dev/null
+++ b/branch/main/Day-1/ex2-diag/README/index.html
@@ -0,0 +1,156 @@
+
+
+
+
+
+
+ EXERCISE 2 - Parallelization of the eigenvalue problem — Efficient materials modelling on HPC with Quantum ESPRESSO, Siesta and Yambo documentation
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
EXERCISE 2 - Parallelization of the eigenvalue problem
EXERCISE 2 - Parallelization of the eigenvalue problem
+
Play with the ndiag parameter by performing a number of runs and seeing variations in the WALL time (if any).
+You can also change the fixed value of npools (the default value for this exercise is 4).
+
+
Replace the dots with a list of proper values for ndiag, e.g:
+
for id in 1 2 3 4 5 6 -not necessarily the right values here!
+ do
+
+
+
+
Submit the job file:
+
sbatch ./ex2-diag.slurm
+
+
+
+
Check the total WALL time at the end of the output file and plot TIME(ndiag).
+Which is the best ndiag value (if any)? Why?
Find out how to best exploit the available CPU resources, by playing with the MPI-related parameters (number of tasks, npools) together with the number of threads.
+Use the batch file ‘ex3-omp.slurm’ to submit your jobs (modify it at your convenience).
+Hints:
+
+
Know the size of your node, e.g. the amount of cores at your disposal (https://wiki.u-gov.it/confluence/display/SCAIUS/UG3.2%3A+LEONARDO+UserGuide);
+
See how the time scaling of your jobs changes just by varying the number of tasks (keep just 1 thread each at first).
+Adapt the npool parameter at each run.
+
Now you can start to explore the OpenMP parallelization by varying the number of threads.
+
Do multiple WALL_TIME plots as a function of the number of MPI tasks and OpenMP threads.
+Which is the best configuration for this exercise?
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
\ No newline at end of file
diff --git a/branch/main/Day-1/ex4-gpu/README/index.html b/branch/main/Day-1/ex4-gpu/README/index.html
new file mode 100644
index 0000000..b8a514a
--- /dev/null
+++ b/branch/main/Day-1/ex4-gpu/README/index.html
@@ -0,0 +1,149 @@
+
+
+
+
+
+
+ EXERCISE 4 - Hands on the GPUs — Efficient materials modelling on HPC with Quantum ESPRESSO, Siesta and Yambo documentation
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
Know the size of your node. Look at: https://wiki.u-gov.it/confluence/display/SCAIUS/UG3.2%3A+LEONARDO+UserGuide
+How many cores? How many GPUs?
+
Use the batch file ‘ex4-gpu.slurm’ to submit a few MPI+GPU runs. You can also check what happens with more MPI tasks than GPUs.
+Dots at lines 8, 9, 10, 24, 26, 28.
+
Enable openMP threading. Do you see any improvement?
+
Consider your best CPU run. How many GPUs were necessary to match the performance?
+If you don’t have your optimized CPU batch file from exercise 3, you can use the one in the reference folder.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
\ No newline at end of file
diff --git a/branch/main/Day-2/exercise_eels/README/index.html b/branch/main/Day-2/exercise_eels/README/index.html
new file mode 100644
index 0000000..5f31956
--- /dev/null
+++ b/branch/main/Day-2/exercise_eels/README/index.html
@@ -0,0 +1,159 @@
+
+
+
+
+
+
+ DAY-2, EXERCISE - Electron energy-loss spectroscopy (EELS) for bulk silicon — Efficient materials modelling on HPC with Quantum ESPRESSO, Siesta and Yambo documentation
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
DAY-2, EXERCISE - Electron energy-loss spectroscopy (EELS) for bulk silicon
DAY-2, EXERCISE - Electron energy-loss spectroscopy (EELS) for bulk silicon
+
+
Run the SCF ground-state calculation with pw.x
+
sbatchsubmit_pw.slurm
+
+
+
+
Perform Lanczos recursions with turbo_eels.x
+
sbatchsubmit_eels.slurm
+
+
+
+
Run the post-processing spectrum calculation with turbo_spectrum.x
+
sbatchsubmit_spectrum.slurm
+
+
+
+
+
The output files Sieels.plot_chi.dat and Sieels.plot_eps.dat contain, respectively,
+the dynamical charge susceptibility and the dynamical dielectric function computed for the
+momentum transfer q chosen for the turbo_eels calculation of step 2.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
\ No newline at end of file
diff --git a/branch/main/Day-2/exercise_magnon/README/index.html b/branch/main/Day-2/exercise_magnon/README/index.html
new file mode 100644
index 0000000..7b93075
--- /dev/null
+++ b/branch/main/Day-2/exercise_magnon/README/index.html
@@ -0,0 +1,163 @@
+
+
+
+
+
+
+ DAY-2, EXERCISE - Dynamical magnetic susceptibility of bulk iron — Efficient materials modelling on HPC with Quantum ESPRESSO, Siesta and Yambo documentation
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
DAY-2, EXERCISE - Dynamical magnetic susceptibility of bulk iron
DAY-2, EXERCISE - Dynamical magnetic susceptibility of bulk iron
+
+
Run the SCF ground-state calculation with pw.x
+
sbatchsubmit_pw.slurm
+
+
+
+
Perform Lanczos recursions with turbo_magnon.x
+
sbatchsubmit_eels.slurm
+
+
+
+
Run the post-processing spectrum calculation with turbo_spectrum.x
+
sbatchsubmit_spectrum.slurm
+
+
+
+
+
The output file Femag.plot_chi.dat contains the column ipol of the dynamical magnetic susceptibility
+for a given momentum transfer q chosen during the turbo_magnon calculation of step 2.
+
In the case of an iron bulk sample polarized along the z direction, it can be shown that the inelastic neutron scattering cross section is
+proportional to the imaginary part of the chi_yy component, which can be extracted, eg, by doing
+
grepchi_2_2Femag.plot_chi.dat>Chi_2_2.dat
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
\ No newline at end of file
diff --git a/branch/main/Day-2/exercise_ph/solution/step1/README/index.html b/branch/main/Day-2/exercise_ph/solution/step1/README/index.html
new file mode 100644
index 0000000..c8b6bae
--- /dev/null
+++ b/branch/main/Day-2/exercise_ph/solution/step1/README/index.html
@@ -0,0 +1,156 @@
+
+
+
+
+
+
+ DAY-2, EXERCISE - Phonon modes of CnSnI3 at Gamma — Efficient materials modelling on HPC with Quantum ESPRESSO, Siesta and Yambo documentation
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
Yambo tutorial: Quasiparticles in the GW approximation
+
In this tutorial you will learn how to run a GW simulation using Yambo on a HPC machine.
+
You will compute the quasiparticle corrections to the band structure of a free-standing single layer of MoS\(_2\) while learning about convergence studies, parallel strategies, and GPU calculations.
+
In the end, you will obtain a quasiparticle band structure based on the simulations, the first step towards the reproduction of an ARPES spectrum. Beware: we will not use fully converged parameters, so the final result should not be considered very accurate.
+
+
MoS\(_2\) monolayer (top and side views). Gray: Mo atoms, yellow: S atoms.
Here \(\epsilon_{nk}\) and \(\psi_{nk}\) are the Kohn-Sham energies and wavefunctions, respectively, while \(V_{xc}\) is the DFT exchange-correlation potential.
+
For each electronic state \(nk\), the self-energy can be separated into two components: a static, gap-opening term called the exchange self-energy (\(\Sigma^x\)), and an energy-dependent, usually gap-closing term called the correlation self-energy (\(\Sigma^c\)). These contributions are tackled separately by the code:
The energy-dependent dynamical electronic screening \(\varepsilon^{-1}(\omega)\) is included in \(\Sigma^c\) and must therefore be calculated as well.
+
In this way, we can compute the “quasiparticle” corrections \(E^{QP}_{nk}\) to the single-particle Kohn-Sham eigenvalues \(\epsilon_{nk}\).
+The typical workflow for a GW calculation is:
First of all, we need to convert some of the data produced in a previous non-self-consistent DFT calculation (in this case using Quantum ESPRESSO) into a convenient format for Yambo.
+
The QE save folder for MoS\(_2\) is already present at 00_QE-DFT. We should move inside it and then run the p2y executable to generate the uninitialised SAVE.
+
But first, we need to access a node interactively:
Now, we need to run the initialization step. Every Yambo run must start with this step. This will be automatically performed when you run Yambo on a SAVE directory for the first time. Just type
+
mpirun -np 1 yambo
+
+
+
and then check the yambo log called l_setup. The initialization step determines the \(G\)-vector shells and the \(k\)- and \(q\)-point grids based on the DFT calculation. If you check inside the SAVE you will see two types of databases. The static ones, starting with ns.*, are directly converted in the p2y step, while the dynamical ones, ndb.* are generated during the initialisation.
+
ls SAVE/
+
+
+
ndb.gops # info on G-vector shells, etc
+ndb.kindx # info and k and q meshes
+ns.db1 # info on geometry and KS bands
+ns.kb_pp_pwscf # pseudopotential info
+ns.wf # wave functions info
+...
+
+
+
The databases are written in netCDF format.
+Yambo has produced also a human readable output, r_setup, reporting relevant information such as lattice parameters, symmetries, atomic positions, k-points, DFT eigenvalues and band gaps. We can have a quick look at the sections.
+
vim r_setup
+
+
+
[02.01] Unit cells # Lattice geometry info
+==================
+ ...
+[02.02] Symmetries # Symmetry ops. written explicitly
+==================
+ ...
+[02.03] Reciprocal space # Reciprocal lattice info
+========================
+...
+[02.04] K-grid lattice # k-point coords. written explicitly
+======================
+...
+[02.05] Energies & Occupations # Info on band gaps and occupations
+============================== # DFT eigenvalues
+...
+[03] Transferred momenta grid and indexing # q-points (momentum transfer grid)
+==========================================
+
+
+
Finally, let us move the SAVE and the report file to the directory where we will run the first GW calculation.
+
mv SAVE r_setup ../../01_GW_first_run/
+cd ../../01_GW_first_run/
+
In order to build our input, we need to use the options for a GW calculation. We want to use the plasmon pole approximation for the dynamical screening, solve the quasiparticle equation with the Newton method, and add a truncation of the Coulomb potential which is useful for 2D systems. In addition, we want to set up explicitly the parameters for parallel runs. Therefore we type:
+
yambo -gw0 p -g n -r -V par -F gw.in
+
+
+
Now we can exit the computing node, since all remaining calculations will be run by submitting jobs with the slurm scheduler.
+
exit
+
+
+
You can now inspect the input file gw.in and try to familiarize with some of the parameters. The input will come with default values for many parameters that we might need to change.
The [R] keyword refers to the runlevels: these flag tell Yambo which parts of the code should be executed. Each runlevel enables its own set of input variables. In particular here we have:
+
+
rim_cut: Coulomb potential random integration method and cutoff (enables [RIM] and [CUT] variables).
+
gw0: Yambo learns that it has to run a GW calculation (enables [GW] variables).
+
HF_and_locXC: calculation of exchange part of the self-energy \(\Sigma^x\) (i.e., Hartree-Fock approximation).
+
em1d: enables the calculation of the dynamical screening of the electrons, i.e. the dielectric matrix ([X] variables). In this way Yambo can go beyond Hartree-Fock and compute \(\Sigma^c\).
+
ppa: tells Yambo that the dynamical screening should be computed in the plasmon pole approximation ([Xp] variables).
+
dyson: Yambo will solve the Dyson-like quasiparticle equation.
EXXRLvcs controls the number of Reciprocal Lattice vectors (i.e., G-vectors) used to build the exchange self-energy, while VXCRLvcs does the same for the exchange-correlation potential reconstructed from DFT. Since these two quantities are to be subtracted, it is important to keep the same values here (and possibly not change the default maximum value).
+
+
+
Let us now have a look at the parameters for the calculation of the correlation part of the self-energy. Recall that we have:
(Here, the \(\rho\)-terms represent the screening matrix elements which are computed separately by yambo and stored in their own database.)
+
The calculation is divided in two steps. First, the response function in the plasmon pole approximation (em1dppa), under the keywords [X] and [Xp], i.e., \(\varepsilon^{-1}_{GG'}(q,\omega)\).
+
Chimod="HARTREE"# [X] IP/Hartree/ALDA/LRC/PF/BSfxc
+%BndsRnXp
+ 1|300|# [Xp] Polarization function bands
+%
+NGsBlkXp=1RL# [Xp] Response block size
+%LongDrXp
+ 1.000000|0.000000|0.000000|# [Xp] [cc] Electric Field
+%
+PPAPntXp=27.21138eV# [Xp] PPA imaginary energy
+XTermKind="none"# [X] X terminator ("none","BG" Bruneval-Gonze)
+
+
+
+
Response function
+
+
Chimod="Hartree" indicates that we compute the response function in the Random Phase Approximation (RPA).
+
BndsRnXp represents the electronic states included in the response function \(\varepsilon\), and is a convergence parameter.
+
NGsBlkXp is the number of G-vectors used to calculate the RPA response function \(\varepsilon^{-1}_{GG'}\). It is a convergence parameter and can be expressed in number of reciprocal lattice vectors (RL) or energy (Ry, suggested).
+
LongDrXp represents the direction of the long-range auxiliary external electric field used to compute \(\varepsilon^{-1}_{GG'}(q)\) at \(q,G\rightarrow 0\). In general you have to be mindful of the system symmetries. In our case, we will put 1|1|1 to cover all directions.
+
PPAPntXp=27.21138eV is the energy of the plasmon pole. We don’t normally change this.
+
XTermKind is used to specify a “terminator”: this accelerates numerical convergence with respect to the number of bands BndsRnXp.
+
+
+
Next, we have the [GW] group of parameters controlling the next part of the correlation self-energy calculation:
GbndRnge is the number of bands used to build the correlation self-energy. It is a convergence parameter and can be accelerated with the terminator GTermKind.
+
DysSolver="n" specifies the method used to solve the linearised quasiparticle equation. In most cases, we use the Newton method "n".
+
QPkrange indicates the range of electronic (nk) states for which the GW correction \(\Sigma_{nk}\) is computed. The first pair of numbers represents the range of k-point indices, the second pair the range of band indices.
+
+
+
We now take a look at the parameters relative to the Coulomb interaction at small momenta and for 2D systems, which we should edit now once and for all.
+
RandQpts=1000000# [RIM] Number of random q-points in the BZ
+RandGvec=100RL# [RIM] Coulomb interaction RS components
+CUTGeo="slab z"# [CUT] Coulomb Cutoff geometry: box/cylinder/sphere/ws/slab X/Y/Z/XY..
+
+
+
+
Coulomb potential
+
+
The [RIM] keyword refers to a Monte Carlo random integration method performed to avoid numerical instabilities close to \(q=0\) and \(G=0\) in the \(q\)-integration of the bare Coulomb interaction - i.e. \(4\pi/(q+G)^2\) - for 2D systems.
+
The [CUT] keyword refers to the truncation of the Coulomb interaction to avoid spurious interaction between periodically repeated copies of the simulation supercell along the \(z\)-direction (we are working with a plane-wave code). Keep in mind that the vacuum space between two copies of the system should be converged: here we are using 20 bohr but a value of 40 bohr would be more realistic.
+
+
+
Finally, we have the parallel parameters. We are going to discuss them at the end of the parallel section, we can skip them for now.
+
X_and_IO_CPU=""# [PARALLEL] CPUs for each role
+X_and_IO_ROLEs=""# [PARALLEL] CPUs roles (q,g,k,c,v)
+X_and_IO_nCPU_LinAlg_INV=-1# [PARALLEL] CPUs for Linear Algebra (if -1 it is automatically set)
+X_Threads=0# [OPENMP/X] Number of threads for response functions
+DIP_CPU=""# [PARALLEL] CPUs for each role
+DIP_ROLEs=""# [PARALLEL] CPUs roles (k,c,v)
+DIP_Threads=0# [OPENMP/X] Number of threads for dipoles
+SE_CPU=""# [PARALLEL] CPUs for each role
+SE_ROLEs=""# [PARALLEL] CPUs roles (q,qp,b)
+SE_Threads=0# [OPENMP/GW] Number of threads for self-energy
+
+
+
In a GW calculation, the most important parameters to be numerically converged are:
From the above discussion you can easily guess that many-body perturbation theory calculations are much more numerically expensive than DFT calculations.
We will start by running a single GW calculation. Here we will focus on the magnitude of the quasiparticle gap. This means that we only need to calculate two quasi-particle corrections, i.e., valence and conduction bands at the k-point where the minimum gap occurs. This information can be found by inspecting the report file r_setup produced when the SAVE folder was initialised. Just search for the string ‘Direct Gap’ and you’ll see that the latter occurs at k-point 7 between bands 13 and 14:
In addition, we will set the number of bands in BndsRnXp and GbndRnge to a small value, just to have it run fast. Hence, we modify the input file accordingly (check BndsRnXp, GbndRnge, LongDrXp, QPkrange):
+
rim_cut# [R] Coulomb potential
+gw0# [R] GW approximation
+ppa# [R][Xp] Plasmon Pole Approximation for the Screened Interaction
+dyson# [R] Dyson Equation solver
+HF_and_locXC# [R] Hartree-Fock
+em1d# [R][X] Dynamically Screened Interaction
+X_Threads=0# [OPENMP/X] Number of threads for response functions
+DIP_Threads=0# [OPENMP/X] Number of threads for dipoles
+SE_Threads=0# [OPENMP/GW] Number of threads for self-energy
+RandQpts=1000000# [RIM] Number of random q-points in the BZ
+RandGvec=100RL# [RIM] Coulomb interaction RS components
+CUTGeo="slab z"# [CUT] Coulomb Cutoff geometry: box/cylinder/sphere/ws/slab X/Y/Z/XY..
+%CUTBox
+ 0.000000|0.000000|0.000000|# [CUT] [au] Box sides
+%
+CUTRadius=0.000000# [CUT] [au] Sphere/Cylinder radius
+CUTCylLen=0.000000# [CUT] [au] Cylinder length
+CUTwsGvec=0.700000# [CUT] WS cutoff: number of G to be modified
+EXXRLvcs=37965RL# [XX] Exchange RL components
+VXCRLvcs=37965RL# [XC] XCpotential RL components
+Chimod="HARTREE"# [X] IP/Hartree/ALDA/LRC/PF/BSfxc
+%BndsRnXp
+ 1|20|# [Xp] Polarization function bands
+%
+NGsBlkXp=1RL# [Xp] Response block size
+%LongDrXp
+ 1.000000|1.000000|1.000000|# [Xp] [cc] Electric Field
+%
+PPAPntXp=27.21138eV# [Xp] PPA imaginary energy
+XTermKind="none"# [X] X terminator ("none","BG" Bruneval-Gonze)
+%GbndRnge
+ 1|20|# [GW] G[W] bands range
+%
+GTermKind="none"# [GW] GW terminator ("none","BG" Bruneval-Gonze,"BRS" Berger-Reining-Sottile)
+DysSolver="n"# [GW] Dyson Equation solver ("n","s","g")
+%QPkrange# [GW] QP generalized Kpoint/Band indices
+7|7|13|14|
+%
+
+
+
We are now ready to run this calculation. Since you should never run a Yambo calculation on the login node, we will need a submission script to add our job to the queue. A submission script optimized for Leonardo Booster (running on GPUs) is provided as an example. Modify it to suit your specific machine.
We will ignore all details regarding parallelization, as it will be covered in the next section. Since there are no lowercase flags after yambo, it is not going to generate an input file, but rather, run the one specified by -F. Now, go ahead an submit this job
+
sbatch run_first_job.sh
+
+
+
The status of the jobs can be monitored via:
+
squeue -u $USER # to inspect the status of jobs
+ # (hint: make a unix alias, if you like)
+scancel <jobid> # to delete jobs in the queue
+
+
+
The newly generated databases will be stored in the job directory, as specified by -J, while the report, log and output files will be stored in the communications directory (-C). As this is your first yambo run, take a moment to inspect the report and log files, which you can find inside the -C directory. In these report and log files, you can see the steps performed by yambo. For instance, the code calculates the screening at every k-point and stores it in the PPA database called ndb.pp. By opening the report
+
vim out_00_first_run/r-job_00_first_run_HF_and_locXC_gw0_dyson_rim_cut_em1d_ppa
+
In this file, Eo is our starting point (DFT) while E-Eo shows the GW correction one should apply to obtain the quasi-particle energies. In order to calculate the gap (automatically from the command line), we’ll use some simple commands. First, we get everything that is not a # symbol grep-v'#' and we pass that to another command with a “pipe” |. Then, tail-n1/head-n1 will retain the first/last line, and awk'{print$3+$4}' will get us the sum of the third and fourth columns. Altogether, this would be as follows
In this part of the tutorial, we will study convergence with respect to some of the parameters mentioned above. In order to complete this tutorial within a single hands-on session, we will restrict ourselves to a very coarse \(k\)-point grid.
+Hence, we will perform our convergence studies on top of a DFT calculation done with a \(6 \times 6 \times 1\) k-point grid and without spin-orbit coupling: the SAVE we generated earlier.
+While this will speed up calculations and could be run even on a single GPU card, you should be aware that such coarse sampling of the BZ is significantly underconverged and should only be used for educational purposes. In addition, spin-orbit interaction is extremely relevant for the valley physics of MoS2 and should not be neglected in realistic calculations.
+Let’s move into the appropriate directory
+
cd ../02_GW_convergence
+
+
+
+
Response function \(\varepsilon^{-1}_{GG'}\) - Bands and G-vectors
+
We are now ready to start our convergence tests. We’ll begin with the variables controlling the polarization function, i.e., NGsBlkXp for the number of G-vectors and BndsRnXp for the number of bands. For this, we will keep GbndRnge constant at a reasonably high value - you can inspect the input i01-GW
+
vim i01-GW
+
+
+
and check that you have:
+
%GbndRnge
+ 1|80|# [GW] G[W] bands range
+%
+
+
+
Since we need to run yambo for several values of NGsBlkXp and BndsRnXp, it makes sense to use two nested loops. That is exactly what we did in the submission script run01_converge_pol.sh. Since this will take a few minutes, save time by submitting it straight away and we will have a look at it while it runs:
+
sbatch run01_converge_pol.sh
+
+
+
+
Monitoring
+
You can monitor that the job is running by the squeue command
+
sbatch squeue -u $USER
+
+
+
and also by checking the files created in your folder
Let’s now have look into the job we just submitted.
+
vimrun01_converge_pol.sh
+
+
+
First, we defined the double loop and we intialize a summary file for each iteration of the outer loop by printing a header to it. The input file i01-GW is used as a template for every calculation in the loops, so we assign it to a variable.
Inside the loops, we generate some useful labels which will come in handy to distinguish between runs. Then, we pass the variables from the loops to the sed command, in order to generate new files in an automated way (sed replaces any matching string with whatever is provided by the loop variable). Next, we run yambo using the labels to specify different job -J and communication -C directories every time. Finally, we get the quasiparticle energies with grep commands as shown before and append a new line to the summary file. So, inside each loop, we have
Finally, let us plot this data. First, check that the job has finished with
+
squeue -u $USER
+
+
+
[$USER@login01 02_GW_convergence]$ squeue-u$USER
+ JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
+[$USER@login01 02_GW_convergence]$
+
+
+
and verify that the energies were extracted correctly by inspecting the summary files. Remember to load the python module if you haven’t done so yet, and then plot:
The plot will produce a fig-01.png file.
+You can copy and open it in your local machine with something like
+
[Run this on another terminal in your local machine, fixing $USER, $LOGIN and $TUTORIALPATH]
+scp $USER@$LOGIN:$TUTORIALPATH/MoS2_HPC_tutorial_Leonardo/02_GW_convergence/fig-01.png ./
+
+
+
You should get:
+
+
For the purpose of the tutorial, we will choose 80 bands and 10 Ry as our converged parameters and move on. An error within 10 meV is usually acceptable. To retain the chosen variables, we’ll make a copy of the corresponding input file:
We will now proceed to converge the number of bands for the correlation part of the self-energy, i.e., GbndRnge. This step is actually simpler, since it only involves one loop. This is coded in the provided script run02_converge_Gbnds.noBG.sh. You can look into it
+
vimrun02_converge_Gbnds.noBG.sh
+
+
+
and go ahead and submit it.
+
sbatchrun02_converge_Gbnds.noBG.sh
+
+
+
+
[OPTIONAL]: Use the terminator to accelerate convergence
+
While that runs, we’ll have a look at the so-called Bruneval-Gonze (BG) terminator, which is a method to accelerate convergence with respect to empty bands. The variable that controls this for the bands in the correlation self-energy is GTermKind. This is currently set to “none” in i02-GW, so create a new input file i02-GW_BG and set this variable to “BG”. We can do this in the command line by simply typing
Note that XTermKind also offers the same terminator for the sum over bands of the polarization function (we just chose not to use it in the previous section of this excercise, and we will keep it as “none”). Now, copy the last submission script and edit it to run the same convergence test using the BG terminator.
Try and do this yourself first, and then continue reading to check your understanding.
+You will have to change the input file template, i.e., use i02-GW_BG where the terminator has been activated. Modify also the name of the newly generated input files in order to avoid overwriting. Change the name of the summary file for the same reason and, finally, modify the communications and job directories of yambo. Make sure you’ve done all the changes as outlined below.
While this runs, check if the previous job has finished, i.e., you should have a complete summary_02_noBG.txt file by now.
+For a visual result, proceed to plot them with
+
pythonplot-02.py
+
+
+
You should get
+
+
+
OPTIONAL
+
If you also did the optional step, you can compare summary_02_noBG.txt with summary_03_BG.txt once run03_converge_Gbnds.BG.sh has finished - you’ll see the effect of the terminator immediately.
+Just open the plot-02.py script and uncomment the line #list_of_files=['summary_02_noBG.txt','summary_03_BG.txt'], then rerun it with pythonplot-02.py.
+You can see that the terminator does a great job at accelerating convergence, and it allows us to use 60 bands incurring an error of only 3 meV (while the latter would have been larger than 0.1 eV had we not used the terminator).
+
+
We will end the convergence part of the tutorial with an important consideration about k-points convergence. The latter is the most cumbersome and computationally intensive among the various convergence tests, and it involves re-running the DFT step. For this reason (and for this reason only) it was ignored in this tutorial. However, it absolutely cannot be overlooked since it is crucial for the accuracy of the calculated GW corrections. You can read about k-points convergence in GW and, importantly, a very efficient workaround for 2D systems in a recent publication (here). MoS\(_2\) was one of the materials studied there, and it shows that our result, obtained with a \(6 \times 6 \times 1\) k-grid, is simply off the chart (blue line).
+
+
Guandalini, D’Amico, Ferretti & Varsano. npj Comput Mater 9
+
+
[OPTIONAL]: Use the RIM-W accelerator
+
However you can try to get a reasonable correction via the RIM-W approach. Create a new input file copying a suitable one:
Note that we also added FFTGvecs to reduce the size of the Fourier transforms (the default corresponds to Quantum ESPRESSO ecutwfc, i.e. 60 Ry in this case).
+
In addition, we have deleted all the parallel parameters since we will be setting them via the submission script.
+
Actually we are now dealing with a heavier system than before: as you can see from the QPkrange values, we have switched to a 12x12x1 k-point grid - having 19 points in the irreducible Brillouin zone - and turned spin-orbit coupling on in the DFT calculation (now the top valence band is number 26 instead of 13 because the bands are spin-polarized).
+
For this part of the tutorial, we will be using the slurm submission script job_parallel.sh, which is available in the calculation directory.
+If you inspect it, you will see that the script adds additional variables to the yambo input file.
+These variables control the parallel execution of the code:
+
DIP_CPU="1 $ngpu 1"# [PARALLEL] CPUs for each role
+DIP_ROLEs="k c v"# [PARALLEL] CPUs roles (k,c,v)
+DIP_Threads=0# [OPENMP/X] Number of threads for dipoles
+X_and_IO_CPU="1 1 1 $ngpu 1"# [PARALLEL] CPUs for each role
+X_and_IO_ROLEs="q g k c v"# [PARALLEL] CPUs roles (q,g,k,c,v)
+X_and_IO_nCPU_LinAlg_INV=1# [PARALLEL] CPUs for Linear Algebra (if -1 it is automatically set)
+X_Threads=0# [OPENMP/X] Number of threads for response functions
+SE_CPU="1 $ngpu 1"# [PARALLEL] CPUs for each role
+SE_ROLEs="q qp b"# [PARALLEL] CPUs roles (q,qp,b)
+SE_Threads=0# [OPENMP/GW] Number of threads for self-energy
+
+
+
The keyword DIP refers to the calculations of the screening matrix elements (also called “dipoles”) needed for the screening function, X is the screening function itself (it stands for \(\chi\) since it is a response function), SE the self-energy.
+These three sections of the code can be parallelised independently.
+
We are running on GPUs. In particular, each node hosts four GPU cards. Yambo is coded in such a way that each MPI task is run on a single card, therefore ntasks=ngpu.
+
+
Note
+
+
In this subsection we are mainly concerned with the [PARALLEL] variables which refer to MPI tasks (distributed memory).
+
What about [OPENMP] parallelisation (i.e., adding threads with shared memory)? When Yambo is run on GPUs, the explicit threading that you can set in the input and submission script only applies to the very few sections of the code that are not run on GPU cards, but stay on the CPUs. Therefore, in a GPU calculation, CPU-only threads are not going to be a relevant factor in the performance of the code. We keep them fixed to 8 since on Leonardo Booster (32 CPUs and 4 GPUs per node) the best hybrid parallel setup for CPUs is 4 tasks times 8 threads. We will see an example of the impact of threads in a CPU-only calculation later.
+
+
+
We start by calculating the QP corrections using 4 MPI tasks / GPUs. We leave the number of openMP threads at 8, the optimized value for Yambo on Leonardo. Therefore, edit the submission script as:
This will create a new input file and run it. The calculation databases and the human-readable files will be put in separate directories. Check the location of the report r-* file and the log l-* files, and inspect them while the calculation runs.
+For simplicity you can just type
+
tail-frun_MPI4_OMP8.out/LOG/l-*_CPU_1
+
+
+
to monitor the progress in the master thread (Ctrl+c to exit).
+As you can see, the run takes some time, even though we are using minimal parameters.
+
Meanwhile, we can run other jobs increasing the parallelisation. Let’s employ 16 MPI tasks / GPUs (i.e., 4 nodes on Leonardo). To this end modify the job_parallel.sh script changing
This time the code should be much faster. Once the run is over, try to run the simulation also on 8 MPI tasks by changing nodes appropriately. Finally, you can try to produce a scaling plot.
+
The timings are contained in the r-* report files. You can already have a look at them typing
+
grepTime-Profilerun_MPI*/r-*
+
+
+
The python script parse_ytiming.py is useful for the post-processing of report files. You can already find it in the directory, together with the reports for the longer calculations with 1 and 2 MPI tasks which have been provided.
+
If you didn’t do so already, load the python module
+
moduleloadanaconda3/2023.03
+
+
+
Then, after your jobs have finished, run the script as
+
pythonparse_ytiming.pyrun_MPI
+
+
+
to look for a report file in each run_MPI*.out folder. Make sure you have only one report file per folder.
+You can also play with the script to make it print detailed timing information, however you should already see that it produced a png plot showing times-to-completion on y axis against number of MPI tasks (i.e., GPUs in this case) on the x axis.
+
+
What can we learn from this plot? In particular, try to answer the following questions:
+
+
Up to which number of MPI tasks our system scales efficiently?
+
How can we decide at which point adding more nodes to the calculation becomes a waste of resources?
+
+
+
Note
+
Keep in mind that the MPI scaling we are seeing here is not the true Yambo scaling, but depends on the small size of our tutorial system. In a realistic calculation for a large-sized system, Yambo has been shown to scale well up to tens of thousands of MPI tasks! (See the next optional box for an example)
+
+
+
[OPTIONAL] Comparison with CPU calculation with hybrid parallelization strategy
+
We have run the same calculation using a version of Yambo compiled in order to run on CPUs. This is not the preferred approach in an accelerator-based machine like Leonardo, but it can be instructive.
+
For a CPU calculation, we can use a hybrid parallel structure with threads. The OPENMP threads are controlled by modifying cpus-per-task and OMP_NUM_THREADS in the submission file. The product of the number of OpenMP threads and MPI tasks is equal to the total number of CPUs.
+
For our test, we have used larger convergence parameters than in the previous run, and selected a hybrid parallel scheme with 16 MPI tasks per node, with 2 OPENMP threads (ntasks*nthreads=ncpu=16*2=32), since it gives the best scaling in this case.
+
+
Note
+
In general (for larger systems) we have tested that the best CPU scaling on Leonardo is actually 4 MPI tasks times 8 OPENMP threads.
+
+
Therefore, in the new CPU submission script we have:
Actually, we don’t need to change the openMP-related variables appearing in the yambo input, since the value 0 means “use the value of OMP_NUM_THREADS” and we have now set this environment variable to our liking via the submission script.
+Otherwise, any positive number can directly specify the number of threads to be used in each section of the code.
+
DIP_Threads=0# [OPENMP/X] Number of threads for dipoles
+...
+X_Threads=0# [OPENMP/X] Number of threads for response functions
+...
+SE_Threads=0# [OPENMP/GW] Number of threads for self-energy
+
+
+
Actively looking for the best scaling on both GPU and CPU for our enlarged MoS2 system we find:
+
+
We can see that already for this reasonably small and half-converged system run on a few nodes the GPU calculation easily reaches a speedup of 2x. The speedup vastly increases in larger systems where the calculations are more demanding, as you can see from the scaling tests below (run on the Juwels Booster machine) on a graphene-cobalt interface supercell.
+
+
Scaling comparison of graphene@Co(0001) interface on CPU (left, 48 cpus per node) and GPU (right, 4 GPUs per node). Tests done by Nicola Spallanzani. Data available at: http://www.gitlab.com/max-centre/Benchmarks
+
+
Note
+
+
In real-life CPU-based calculations running on \(n_{cores} > 100\), as we have seen, it may be a good idea to adopt a hybrid approach.
+
The most efficient scaling can depend both on your system and on the HPC facility you’re running on. For a full CPU node on Leonardo (32 cores), using a large-scale system, we have found that 4 tasks times 8 threads gives the best performance.
+
OpenMP can help lower memory requirements within a node. You can try to increase the OpenMP share of threads if you are getting Out Of Memory errors.
+
+
+
+
.
+
+
[OPTIONAL]: Comparing different parallelisation schemes
+
Up to now we always parallelised over a single parameter, i.e. c or qp. However, Yambo allows for tuning the parallelisation scheme over several parameters broadly corresponding to “loops” (i.e., summations or discrete integrations) in the code.
+
To this end you can open again the run_parallel.sh script and modify the section where the yambo input variables are set.
+
For example, X_CPU sets how the MPI Tasks are distributed in the calculation of the response function. The possibilities are shown in the X_ROLEs. The same holds for SE_CPU and SE_ROLEs which control how MPI Tasks are distributed in the calculation of the self-energy.
+
X_CPU= "1 1 1 $ncpu 1" # [PARALLEL] CPUs for each role
+X_ROLEs= "q g k c v" # [PARALLEL] CPUs roles (q,g,k,c,v)
+X_nCPU_LinAlg_INV= $ncpu # [PARALLEL] CPUs for Linear Algebra
+X_Threads= 0 # [OPENMP/X] Number of threads for response functions
+DIP_Threads= 0 # [OPENMP/X] Number of threads for dipoles
+SE_CPU= "1 $ncpu 1" # [PARALLEL] CPUs for each role
+SE_ROLEs= "q qp b" # [PARALLEL] CPUs roles (q,qp,b)
+SE_Threads= 0 # [OPENMP/GW] Number of threads for self-energy
+
+
+
You can try different parallelization schemes and check the performances of Yambo. In doing so, you should also change the jobname label=MPI${ncpu}_OMP${nthreads} in the run_parallel.sh script to avoid confusion with previous calculations.
+
You may then check how speed, memory and load balance between the CPUs are affected. You could modify the script parse_ytiming.py to parse the new data, read and distinguish between more file names, new parallelisation options, etc.
+
+
Note
+
+
The product of the numbers entering each variable (i.e. X_CPU and SE_CPU) times the number of threads should always match the total number of cores (unless you want to overload the cores taking advantage of multi-threads).
+
Using the X_Threads and SE_Threads variables you can think about setting different hybrid schemes between the screening and the self-energy runlevels.
+
Memory scales better if you parallelize on bands (cvb).
+
Parallelization on k-points performs similarly to parallelization on bands, but requires more memory.
+
Parallelization on q-points requires much less communication between the MPI tasks. It may be useful if you run on more than one node and the inter-node connection is slow.
This is the final section of the tutorial, in which we want to compute the full correction to the band structure of single-layer MoS\(_2\).
+
This is a massive calculation, so run it right now and we’ll discuss it in the meantime:
+
cd../04_GW_bands
+sbatchgpu_job.sh
+
+
+
In order to get somewhat realistic results, we will use the larger values for the convergence parameters we have identified in the convergence section. In addition, we also increased the vacuum separation (to 30 au) and the k-point mesh (to 18x18x1) in the DFT calculation, and of course we consider spin-orbit coupling.
+
Now we have a heavier calculation, and we have to do it not just for the band gap, but the entire band structure which includes 37 kpoints in the irreducible Brillouin zone, two spin-orbit-split valence bands, and two spin-orbit-split conduction bands. Let us check the new input:
+
vimgw.in
+
+
+
%QPkrange# [GW] QP generalized Kpoint/Band indices
+1|37|25|28|
+%
+
+
+
After about 3 minutes the calculation should be over and the results collected in folder GW_bnds. The quasiparticle corrections are stored in human-readable form in the file o-GW_bnds.QP, and in netCDF format in the quasiparticle database ndb.QP.
+
In order to visualize the results in the form of a GW band structure, we will first interpolate the calculated points - recall that we have just 37 points, few of which lie on high-symmetry lines - with ypp, the yambo pre- and post-processing executable.
+
+
Note
+
We also have a python-based interface for advanced treatment of all the Yambo databases, called Yambopy. You can check it out here on the Yambo wiki.
We can review the options with ypp-h and generate an input file for band structure interpolation with
+
ypp-sb-Fypp_bands.in
+
+
+
Let us modify the resulting input file by selecting the ‘boltztrap’ approach to interpolation, the last two valence and first two conduction bands, and a path in the Brillouin zone along the the points \(\Gamma-M-K-\Gamma\). We also set 100 points for each high-symmetry line.
+
electrons# [R] Electronic properties
+bnds# [R] Bands
+PROJECT_mode="none"# Instruct ypp how to project the DOS. ATOM, LINE, PLANE.
+INTERP_mode="BOLTZ"# Interpolation mode (NN=nearest point, BOLTZ=boltztrap aproach)
+INTERP_Shell_Fac=20.00000# The bigger it is a higher number of shells is used
+INTERP_NofNN=1# Number of Nearest sites in the NN method
+OutputAlat=5.90008# [a.u.] Lattice constant used for "alat" ouput format
+cooIn="rlu"# Points coordinates (in) cc/rlu/iku/alat
+cooOut="rlu"# Points coordinates (out) cc/rlu/iku/alat
+%BANDS_bands
+ 25|28|# Number of bands
+%
+CIRCUIT_E_DB_path="none"# SAVE obtained from the QE `bands` run (alternative to %BANDS_kpts)
+BANDS_path=""# High-Symmetry points labels (G,M,K,L...) also using composed positions (0.5xY+0.5xL).
+BANDS_steps=100# Number of divisions
+#BANDS_built_in # Print the bands of the generating points of the circuit using the nearest internal point
+%BANDS_kpts# K points of the bands circuit
+ 0.00000|0.00000|0.00000|
+ 0.00000|0.50000|0.00000|
+ 0.33333|0.33333|0.00000|
+ 0.00000|0.00000|0.00000|
+%
+
+
+
Now, let’s run ypp:
+
mpirun-np1ypp-Fypp_bands.in
+
+
+
This run will produce the file o.bands_interpolated. You can inspect it and see that it contains a plottable band structure, but beware: these are the DFT eigevalues! We didn’t tell ypp where to look for the quasiparticle corrections, so it went into the SAVE folder and interpolated the DFT data.
+Let’s rename the output:
Now we are ready to visualize the band structures. In order to do so, you can use the script plt_bands.py that should be already available in the directory.
+
We load the python module
+
moduleloadanaconda3/2023.03
+
+
+
and run the script as
+
pythonplt_bands.pyo.DFT_bandso.GW_bands4
+
+
+
Now we can also exit the computing node
+
exit
+
+
+
The python script should have produced a GW_bands.png file containing the following visualization, which you can copy and open it in your local machine using scp:
+
+
You may compare this plot with a converged result from this paper (also done with Yambo):
+
+
Dashed lines: DFT, thick lines: GW.
+
As you can see, the general result is not too bad, but there are some differences both at the DFT and GW levels. The magnitude of the band gap is too large, and the relative energy of the two conduction band minima is not correct. One obvious issue is the lack of convergence of our tutorial calculations. As we know, we should include more vacuum space and many, many more k-points. Additionally, this is a transition metal dichalcogenide: for this class of systems, the details of the band structure can strongly depend on small variations in the lattice parameters and on the type of pseudopotential used. A great deal of care must be taken when performing these calculations!
+
In order to learn more about Yambo, we suggest visiting the Yambo website. For technical information and tutorials, you can check out the Yambo wiki. If you have issues and questions about installing and running the code, you can write about them on the Yambo forum.
For every practical there is a folder in the shared directory
+/leonardo_work/EUHPC_TD02_030/siesta-tutorials that contains all the files
+you will need for said practical. At the start of every tutorial, please copy
+the required files from that shared directory to your scratch folder. For
+example, for the first tutorial:
You will find a sample script on how to run SIESTA in
+/leonardo_work/EUHPC_TD02_030/softwares/siesta-5.0beta1/runsample.sh:
+
#!/bin/bash
+#SBATCH -J tutorialXX
+#SBATCH -n 8
+#SBATCH -t 0:30:00
+#SBATCH -o %x-%j.out
+#SBATCH -e %x-%j.err
+#SBATCH --partition=boost_usr_prod
+#SBATCH -D .
+
+# DO NOT CHANGE THIS LINE
+source/leonardo_work/EUHPC_TD02_030/softwares/siesta-5.0beta1/siestarc.sh
+
+# EDIT THE CORRECT INPUT AND OUTPUT FILES.
+srun-n8siesta<input.fdf>output.out
+
+
+
Note that input.fdf and output.out are generic input and output file names, for
+each execution of siesta you will need to change them to the actual names of
+your input and output files. It is also convenient to change the name of the job
+(#SBATCH-Joption) to something that allows you to identify (vs. your other
+submissions) when checking the status of the queue.
Efficient materials modelling on HPC with Quantum ESPRESSO, Siesta and Yambo
+
In recent years, computing technologies underlying materials modelling and
+electronic structure calculation have evolved rapidly. High-performance
+computing (HPC) is transitioning from petascale to exascale, while individual
+compute nodes are increasingly based on heterogeneous architectures that every
+year become more diversified due to different vendor choices. In this environment,
+electronic structure codes also have to evolve fast in order to adapt to new
+hardware facilities. Nowadays, state-of-the-art electronic structure codes based
+on modern density functional theory (DFT) methods allow treating realistic
+molecular systems with a very high accuracy. However, due to the increased
+complexity of the codes, some extra skills are required from users in order to
+fully exploit their potential.
+
This training material gives a broad overview of important fundamental concepts
+for molecular and materials modelling on HPC, with a focus on three of the most
+modern codes for electronic structure calculations (QUANTUM ESPRESSO, SIESTA and Yambo).
+Theory sections are interleaved with practical demonstrations and hands-on exercises.
+
QUANTUM ESPRESSO is one of the most
+popular suites of computer codes for electronic-structure calculations and
+materials modelling at the nanoscale, based on density-functional theory, plane
+waves, and pseudopotentials. It is able to predict and give fundamental insights
+about many properties of materials, molecular systems, micro and nanodevices,
+biological systems, in many fields, providing a huge amount of data for
+data-driven science applications.
+
SIESTA is a pseudopotential-based
+DFT software whose strength lies in its use of atomic-like strictly-localised
+basis sets: the use of a “good first approximation” to the full problem
+decreases the number of basis functions needed to achieve a given accuracy, and
+the finite support of the orbitals leads to sparsity in the Hamiltonian and
+overlap matrices, thus enabling the use of reduced-scaling methods. The
+functionalities of SIESTA include, amongst others, the calculation of energies
+and forces, molecular-dynamics simulations, band structures, densities of
+states, spin-orbit couplings, van der Waals functionals, DFT+U for correlated
+systems, real-time TDDFT, and non-equilibrium calculations with TranSIESTA.
+
YAMBO is an open-source code implementing
+first-principles methods based on Green’s function (GF) theory to describe
+excited-state properties of realistic materials. These methods include the GW
+approximation, the Bethe Salpeter equation, nonequilibrium GF (NEGF) and TDDFT,
+allowing for the prediction of accurate quasiparticle energies (e.g. ARPES band
+structures), linear and non-linear optical properties, capturing the physics of
+excitons, plasmons, and magnons. It is also possible to calculate
+temperature-dependent electronic and optical properties via electron-phonon
+coupling and nonequilibrium and non-linear optical properties via NEGF real-time
+simulations (pump-probe experiments, etc).
+
MAX (MAterials design at the eXascale) is a
+European Centre of Excellence which enables materials modelling, simulations,
+discovery and design at the frontiers of the current and future High-Performance
+Computing (HPC), High Throughput Computing (HTC) and data analytics technologies.
+MaX’s challenge lies in bringing the most successful and widely used open-source,
+community codes in quantum simulations of materials towards exascale and extreme
+scaling performance and make them available for a large and growing base of
+researchers in the materials’ domain.
+
+
Prerequisites
+
+
Some familiarity with density functional theory (DFT), self-consistent
+field (SCF) calculations and plane wave basis sets is desirable as the
+workshop will not cover the fundamental theory of these topics.
+
Familiarity with working in a Linux environment and some experience with
+working on an HPC system is needed to participate in the hands-on exercises.
This workshop is aimed towards researchers and engineers who already have some
+previous experience with materials modelling and electronic structure calculations.
In this workshop, participants will learn how to launch the most common
+types of calculations (e.g. scf, phonons, quasi-particle energies,
+time-dependent properties) using QE, SIESTA and Yambo, how to prepare input
+files and how to read output files in order to extract the desired properties.
+
Best practices for efficient exploitation of HPC resources will be discussed.
+On the QE days, there will be particular emphasis on how to use the different schemes of data
+distribution (e.g. plane waves, pools, images) in combination with the different
+parallelization and acceleration schemes (MPI, OpenMP, GPU-offload) available.
+Regarding SIESTA, participants will learn about its algorithmic efficiency,
+and they will be able to explore its accelerated and massively parallel solvers.
Emilio Artacho (CIC NanoGUNE and University of Cambridge)
+
Catalina Coll (ICN2)
+
José Mª Escartín (ICN2)
+
Roberta Farris (ICN2)
+
Ernane de Freitas (ICN2)
+
Alberto García (ICMAB-CSIC)
+
Arnold Kole (Utrecht University)
+
Nick Papior (DTU)
+
Federico Pedron (ICN2)
+
Miguel Pruneda (CINN-CSIC)
+
José Ángel Silva Guillén (IMDEA Nanociencia)
+
+
Varsano (CNR-NANO)
+
+
+
+
Ferretti (CNR-NANO)
+
+
+
+
Sangalli (CNR-ISM)
+
+
+
+
Guandalini (Univ. of Rome, La Sapienza)
+
+
+
+
Paleari (CNR-NANO)
+
+
+
+
D’Alessio (Univ. Modena and Reggio Emilia, CNR-NANO)
+
+
+
+
Sesti (CNR-NANO)
+
+
+
+
Spallanzani (CNR-NANO)
+
+
+
+
The lesson file structure and browsing layout is inspired by and derived from
+work by CodeRefinery licensed under the MIT license. We have copied and adapted
+most of their license text.
share - copy and redistribute the material in any medium or format
+
adapt - remix, transform, and build upon the material for any purpose,
+even commercially.
+
+
The licensor cannot revoke these freedoms as long as you follow these license terms:
+
+
Attribution - You must give appropriate credit (mentioning that your work
+is derived from work that is Copyright (c) ENCCS and individual contributors and, where practical, linking
+to https://enccs.github.io/max-coe-workshop), provide a link to the license, and indicate if changes were
+made. You may do so in any reasonable manner, but not in any way that suggests
+the licensor endorses you or your use.
+
No additional restrictions - You may not apply legal terms or
+technological measures that legally restrict others from doing anything the
+license permits.
+
+
With the understanding that:
+
+
You do not have to comply with the license for elements of the material in
+the public domain or where your use is permitted by an applicable exception
+or limitation.
+
No warranties are given. The license may not give you all of the permissions
+necessary for your intended use. For example, other rights such as
+publicity, privacy, or moral rights may limit how you use the material.
Except where otherwise noted, the example programs and other software provided
+with this repository are made available under the OSI-approved
+MIT license.
Efficient materials modelling on HPC with Quantum ESPRESSO, Siesta and Yambo
+
In recent years, computing technologies underlying materials modelling and
+electronic structure calculation have evolved rapidly. High-performance
+computing (HPC) is transitioning from petascale to exascale, while individual
+compute nodes are increasingly based on heterogeneous architectures that every
+year become more diversified due to different vendor choices. In this environment,
+electronic structure codes also have to evolve fast in order to adapt to new
+hardware facilities. Nowadays, state-of-the-art electronic structure codes based
+on modern density functional theory (DFT) methods allow treating realistic
+molecular systems with a very high accuracy. However, due to the increased
+complexity of the codes, some extra skills are required from users in order to
+fully exploit their potential.
+
This training material gives a broad overview of important fundamental concepts
+for molecular and materials modelling on HPC, with a focus on three of the most
+modern codes for electronic structure calculations (QUANTUM ESPRESSO, SIESTA and Yambo).
+Theory sections are interleaved with practical demonstrations and hands-on exercises.
+
QUANTUM ESPRESSO is one of the most
+popular suites of computer codes for electronic-structure calculations and
+materials modelling at the nanoscale, based on density-functional theory, plane
+waves, and pseudopotentials. It is able to predict and give fundamental insights
+about many properties of materials, molecular systems, micro and nanodevices,
+biological systems, in many fields, providing a huge amount of data for
+data-driven science applications.
+
SIESTA is a pseudopotential-based
+DFT software whose strength lies in its use of atomic-like strictly-localised
+basis sets: the use of a “good first approximation” to the full problem
+decreases the number of basis functions needed to achieve a given accuracy, and
+the finite support of the orbitals leads to sparsity in the Hamiltonian and
+overlap matrices, thus enabling the use of reduced-scaling methods. The
+functionalities of SIESTA include, amongst others, the calculation of energies
+and forces, molecular-dynamics simulations, band structures, densities of
+states, spin-orbit couplings, van der Waals functionals, DFT+U for correlated
+systems, real-time TDDFT, and non-equilibrium calculations with TranSIESTA.
+
YAMBO is an open-source code implementing
+first-principles methods based on Green’s function (GF) theory to describe
+excited-state properties of realistic materials. These methods include the GW
+approximation, the Bethe Salpeter equation, nonequilibrium GF (NEGF) and TDDFT,
+allowing for the prediction of accurate quasiparticle energies (e.g. ARPES band
+structures), linear and non-linear optical properties, capturing the physics of
+excitons, plasmons, and magnons. It is also possible to calculate
+temperature-dependent electronic and optical properties via electron-phonon
+coupling and nonequilibrium and non-linear optical properties via NEGF real-time
+simulations (pump-probe experiments, etc).
+
MAX (MAterials design at the eXascale) is a
+European Centre of Excellence which enables materials modelling, simulations,
+discovery and design at the frontiers of the current and future High-Performance
+Computing (HPC), High Throughput Computing (HTC) and data analytics technologies.
+MaX’s challenge lies in bringing the most successful and widely used open-source,
+community codes in quantum simulations of materials towards exascale and extreme
+scaling performance and make them available for a large and growing base of
+researchers in the materials’ domain.
+
+
Prerequisites
+
+
Some familiarity with density functional theory (DFT), self-consistent
+field (SCF) calculations and plane wave basis sets is desirable as the
+workshop will not cover the fundamental theory of these topics.
+
Familiarity with working in a Linux environment and some experience with
+working on an HPC system is needed to participate in the hands-on exercises.
This workshop is aimed towards researchers and engineers who already have some
+previous experience with materials modelling and electronic structure calculations.
For every practical there is a folder in the shared directory
+/leonardo_work/EUHPC_TD02_030/siesta-tutorials that contains all the files
+you will need for said practical. At the start of every tutorial, please copy
+the required files from that shared directory to your scratch folder. For
+example, for the first tutorial:
You will find a sample script on how to run SIESTA in
+/leonardo_work/EUHPC_TD02_030/softwares/siesta-5.0beta1/runsample.sh:
+
#!/bin/bash
+#SBATCH -J tutorialXX
+#SBATCH -n 8
+#SBATCH -t 0:30:00
+#SBATCH -o %x-%j.out
+#SBATCH -e %x-%j.err
+#SBATCH --partition=boost_usr_prod
+#SBATCH -D .
+
+# DO NOT CHANGE THIS LINE
+source/leonardo_work/EUHPC_TD02_030/softwares/siesta-5.0beta1/siestarc.sh
+
+# EDIT THE CORRECT INPUT AND OUTPUT FILES.
+srun-n8siesta<input.fdf>output.out
+
+
+
Note that input.fdf and output.out are generic input and output file names, for
+each execution of siesta you will need to change them to the actual names of
+your input and output files. It is also convenient to change the name of the job
+(#SBATCH-Joption) to something that allows you to identify (vs. your other
+submissions) when checking the status of the queue.
In this workshop, participants will learn how to launch the most common
+types of calculations (e.g. scf, phonons, quasi-particle energies,
+time-dependent properties) using QE, SIESTA and Yambo, how to prepare input
+files and how to read output files in order to extract the desired properties.
+
Best practices for efficient exploitation of HPC resources will be discussed.
+On the QE days, there will be particular emphasis on how to use the different schemes of data
+distribution (e.g. plane waves, pools, images) in combination with the different
+parallelization and acceleration schemes (MPI, OpenMP, GPU-offload) available.
+Regarding SIESTA, participants will learn about its algorithmic efficiency,
+and they will be able to explore its accelerated and massively parallel solvers.
Emilio Artacho (CIC NanoGUNE and University of Cambridge)
+
Catalina Coll (ICN2)
+
José Mª Escartín (ICN2)
+
Roberta Farris (ICN2)
+
Ernane de Freitas (ICN2)
+
Alberto García (ICMAB-CSIC)
+
Arnold Kole (Utrecht University)
+
Nick Papior (DTU)
+
Federico Pedron (ICN2)
+
Miguel Pruneda (CINN-CSIC)
+
José Ángel Silva Guillén (IMDEA Nanociencia)
+
+
Varsano (CNR-NANO)
+
+
+
+
Ferretti (CNR-NANO)
+
+
+
+
Sangalli (CNR-ISM)
+
+
+
+
Guandalini (Univ. of Rome, La Sapienza)
+
+
+
+
Paleari (CNR-NANO)
+
+
+
+
D’Alessio (Univ. Modena and Reggio Emilia, CNR-NANO)
+
+
+
+
Sesti (CNR-NANO)
+
+
+
+
Spallanzani (CNR-NANO)
+
+
+
+
The lesson file structure and browsing layout is inspired by and derived from
+work by CodeRefinery licensed under the MIT license. We have copied and adapted
+most of their license text.
share - copy and redistribute the material in any medium or format
+
adapt - remix, transform, and build upon the material for any purpose,
+even commercially.
+
+
The licensor cannot revoke these freedoms as long as you follow these license terms:
+
+
Attribution - You must give appropriate credit (mentioning that your work
+is derived from work that is Copyright (c) ENCCS and individual contributors and, where practical, linking
+to https://enccs.github.io/max-coe-workshop), provide a link to the license, and indicate if changes were
+made. You may do so in any reasonable manner, but not in any way that suggests
+the licensor endorses you or your use.
+
No additional restrictions - You may not apply legal terms or
+technological measures that legally restrict others from doing anything the
+license permits.
+
+
With the understanding that:
+
+
You do not have to comply with the license for elements of the material in
+the public domain or where your use is permitted by an applicable exception
+or limitation.
+
No warranties are given. The license may not give you all of the permissions
+necessary for your intended use. For example, other rights such as
+publicity, privacy, or moral rights may limit how you use the material.
Except where otherwise noted, the example programs and other software provided
+with this repository are made available under the OSI-approved
+MIT license.
For every practical there is a folder in the shared directory
+/leonardo_work/EUHPC_TD02_030/siesta-tutorials that contains all the files
+you will need for said practical. At the start of every tutorial, please copy
+the required files from that shared directory to your scratch folder. For
+example, for the first tutorial:
You will find a sample script on how to run SIESTA in
+/leonardo_work/EUHPC_TD02_030/softwares/siesta-5.0beta1/runsample.sh:
+
#!/bin/bash
+#SBATCH -J tutorialXX
+#SBATCH -n 8
+#SBATCH -t 0:30:00
+#SBATCH -o %x-%j.out
+#SBATCH -e %x-%j.err
+#SBATCH --partition=boost_usr_prod
+#SBATCH -D .
+
+# DO NOT CHANGE THIS LINE
+source/leonardo_work/EUHPC_TD02_030/softwares/siesta-5.0beta1/siestarc.sh
+
+# EDIT THE CORRECT INPUT AND OUTPUT FILES.
+srun-n8siesta<input.fdf>output.out
+
+
+
Note that input.fdf and output.out are generic input and output file names, for
+each execution of siesta you will need to change them to the actual names of
+your input and output files. It is also convenient to change the name of the job
+(#SBATCH-Joption) to something that allows you to identify (vs. your other
+submissions) when checking the status of the queue.
Efficient materials modelling on HPC with Quantum ESPRESSO, Siesta and Yambo
+
In recent years, computing technologies underlying materials modelling and
+electronic structure calculation have evolved rapidly. High-performance
+computing (HPC) is transitioning from petascale to exascale, while individual
+compute nodes are increasingly based on heterogeneous architectures that every
+year become more diversified due to different vendor choices. In this environment,
+electronic structure codes also have to evolve fast in order to adapt to new
+hardware facilities. Nowadays, state-of-the-art electronic structure codes based
+on modern density functional theory (DFT) methods allow treating realistic
+molecular systems with a very high accuracy. However, due to the increased
+complexity of the codes, some extra skills are required from users in order to
+fully exploit their potential.
+
This training material gives a broad overview of important fundamental concepts
+for molecular and materials modelling on HPC, with a focus on three of the most
+modern codes for electronic structure calculations (QUANTUM ESPRESSO, SIESTA and Yambo).
+Theory sections are interleaved with practical demonstrations and hands-on exercises.
+
QUANTUM ESPRESSO is one of the most
+popular suites of computer codes for electronic-structure calculations and
+materials modelling at the nanoscale, based on density-functional theory, plane
+waves, and pseudopotentials. It is able to predict and give fundamental insights
+about many properties of materials, molecular systems, micro and nanodevices,
+biological systems, in many fields, providing a huge amount of data for
+data-driven science applications.
+
SIESTA is a pseudopotential-based
+DFT software whose strength lies in its use of atomic-like strictly-localised
+basis sets: the use of a “good first approximation” to the full problem
+decreases the number of basis functions needed to achieve a given accuracy, and
+the finite support of the orbitals leads to sparsity in the Hamiltonian and
+overlap matrices, thus enabling the use of reduced-scaling methods. The
+functionalities of SIESTA include, amongst others, the calculation of energies
+and forces, molecular-dynamics simulations, band structures, densities of
+states, spin-orbit couplings, van der Waals functionals, DFT+U for correlated
+systems, real-time TDDFT, and non-equilibrium calculations with TranSIESTA.
+
YAMBO is an open-source code implementing
+first-principles methods based on Green’s function (GF) theory to describe
+excited-state properties of realistic materials. These methods include the GW
+approximation, the Bethe Salpeter equation, nonequilibrium GF (NEGF) and TDDFT,
+allowing for the prediction of accurate quasiparticle energies (e.g. ARPES band
+structures), linear and non-linear optical properties, capturing the physics of
+excitons, plasmons, and magnons. It is also possible to calculate
+temperature-dependent electronic and optical properties via electron-phonon
+coupling and nonequilibrium and non-linear optical properties via NEGF real-time
+simulations (pump-probe experiments, etc).
+
MAX (MAterials design at the eXascale) is a
+European Centre of Excellence which enables materials modelling, simulations,
+discovery and design at the frontiers of the current and future High-Performance
+Computing (HPC), High Throughput Computing (HTC) and data analytics technologies.
+MaX’s challenge lies in bringing the most successful and widely used open-source,
+community codes in quantum simulations of materials towards exascale and extreme
+scaling performance and make them available for a large and growing base of
+researchers in the materials’ domain.
+
+
Prerequisites
+
+
Some familiarity with density functional theory (DFT), self-consistent
+field (SCF) calculations and plane wave basis sets is desirable as the
+workshop will not cover the fundamental theory of these topics.
+
Familiarity with working in a Linux environment and some experience with
+working on an HPC system is needed to participate in the hands-on exercises.
This workshop is aimed towards researchers and engineers who already have some
+previous experience with materials modelling and electronic structure calculations.
In this workshop, participants will learn how to launch the most common
+types of calculations (e.g. scf, phonons, quasi-particle energies,
+time-dependent properties) using QE, SIESTA and Yambo, how to prepare input
+files and how to read output files in order to extract the desired properties.
+
Best practices for efficient exploitation of HPC resources will be discussed.
+On the QE days, there will be particular emphasis on how to use the different schemes of data
+distribution (e.g. plane waves, pools, images) in combination with the different
+parallelization and acceleration schemes (MPI, OpenMP, GPU-offload) available.
+Regarding SIESTA, participants will learn about its algorithmic efficiency,
+and they will be able to explore its accelerated and massively parallel solvers.
Emilio Artacho (CIC NanoGUNE and University of Cambridge)
+
Catalina Coll (ICN2)
+
José Mª Escartín (ICN2)
+
Roberta Farris (ICN2)
+
Ernane de Freitas (ICN2)
+
Alberto García (ICMAB-CSIC)
+
Arnold Kole (Utrecht University)
+
Nick Papior (DTU)
+
Federico Pedron (ICN2)
+
Miguel Pruneda (CINN-CSIC)
+
José Ángel Silva Guillén (IMDEA Nanociencia)
+
+
Varsano (CNR-NANO)
+
+
+
+
Ferretti (CNR-NANO)
+
+
+
+
Sangalli (CNR-ISM)
+
+
+
+
Guandalini (Univ. of Rome, La Sapienza)
+
+
+
+
Paleari (CNR-NANO)
+
+
+
+
D’Alessio (Univ. Modena and Reggio Emilia, CNR-NANO)
+
+
+
+
Sesti (CNR-NANO)
+
+
+
+
Spallanzani (CNR-NANO)
+
+
+
+
The lesson file structure and browsing layout is inspired by and derived from
+work by CodeRefinery licensed under the MIT license. We have copied and adapted
+most of their license text.
share - copy and redistribute the material in any medium or format
+
adapt - remix, transform, and build upon the material for any purpose,
+even commercially.
+
+
The licensor cannot revoke these freedoms as long as you follow these license terms:
+
+
Attribution - You must give appropriate credit (mentioning that your work
+is derived from work that is Copyright (c) ENCCS and individual contributors and, where practical, linking
+to https://enccs.github.io/max-coe-workshop), provide a link to the license, and indicate if changes were
+made. You may do so in any reasonable manner, but not in any way that suggests
+the licensor endorses you or your use.
+
No additional restrictions - You may not apply legal terms or
+technological measures that legally restrict others from doing anything the
+license permits.
+
+
With the understanding that:
+
+
You do not have to comply with the license for elements of the material in
+the public domain or where your use is permitted by an applicable exception
+or limitation.
+
No warranties are given. The license may not give you all of the permissions
+necessary for your intended use. For example, other rights such as
+publicity, privacy, or moral rights may limit how you use the material.
Except where otherwise noted, the example programs and other software provided
+with this repository are made available under the OSI-approved
+MIT license.
Efficient materials modelling on HPC with Quantum ESPRESSO, Siesta and Yambo
+
In recent years, computing technologies underlying materials modelling and
+electronic structure calculation have evolved rapidly. High-performance
+computing (HPC) is transitioning from petascale to exascale, while individual
+compute nodes are increasingly based on heterogeneous architectures that every
+year become more diversified due to different vendor choices. In this environment,
+electronic structure codes also have to evolve fast in order to adapt to new
+hardware facilities. Nowadays, state-of-the-art electronic structure codes based
+on modern density functional theory (DFT) methods allow treating realistic
+molecular systems with a very high accuracy. However, due to the increased
+complexity of the codes, some extra skills are required from users in order to
+fully exploit their potential.
+
This training material gives a broad overview of important fundamental concepts
+for molecular and materials modelling on HPC, with a focus on three of the most
+modern codes for electronic structure calculations (QUANTUM ESPRESSO, SIESTA and Yambo).
+Theory sections are interleaved with practical demonstrations and hands-on exercises.
+
QUANTUM ESPRESSO is one of the most
+popular suites of computer codes for electronic-structure calculations and
+materials modelling at the nanoscale, based on density-functional theory, plane
+waves, and pseudopotentials. It is able to predict and give fundamental insights
+about many properties of materials, molecular systems, micro and nanodevices,
+biological systems, in many fields, providing a huge amount of data for
+data-driven science applications.
+
SIESTA is a pseudopotential-based
+DFT software whose strength lies in its use of atomic-like strictly-localised
+basis sets: the use of a “good first approximation” to the full problem
+decreases the number of basis functions needed to achieve a given accuracy, and
+the finite support of the orbitals leads to sparsity in the Hamiltonian and
+overlap matrices, thus enabling the use of reduced-scaling methods. The
+functionalities of SIESTA include, amongst others, the calculation of energies
+and forces, molecular-dynamics simulations, band structures, densities of
+states, spin-orbit couplings, van der Waals functionals, DFT+U for correlated
+systems, real-time TDDFT, and non-equilibrium calculations with TranSIESTA.
+
YAMBO is an open-source code implementing
+first-principles methods based on Green’s function (GF) theory to describe
+excited-state properties of realistic materials. These methods include the GW
+approximation, the Bethe Salpeter equation, nonequilibrium GF (NEGF) and TDDFT,
+allowing for the prediction of accurate quasiparticle energies (e.g. ARPES band
+structures), linear and non-linear optical properties, capturing the physics of
+excitons, plasmons, and magnons. It is also possible to calculate
+temperature-dependent electronic and optical properties via electron-phonon
+coupling and nonequilibrium and non-linear optical properties via NEGF real-time
+simulations (pump-probe experiments, etc).
+
MAX (MAterials design at the eXascale) is a
+European Centre of Excellence which enables materials modelling, simulations,
+discovery and design at the frontiers of the current and future High-Performance
+Computing (HPC), High Throughput Computing (HTC) and data analytics technologies.
+MaX’s challenge lies in bringing the most successful and widely used open-source,
+community codes in quantum simulations of materials towards exascale and extreme
+scaling performance and make them available for a large and growing base of
+researchers in the materials’ domain.
+
+
Prerequisites
+
+
Some familiarity with density functional theory (DFT), self-consistent
+field (SCF) calculations and plane wave basis sets is desirable as the
+workshop will not cover the fundamental theory of these topics.
+
Familiarity with working in a Linux environment and some experience with
+working on an HPC system is needed to participate in the hands-on exercises.
This workshop is aimed towards researchers and engineers who already have some
+previous experience with materials modelling and electronic structure calculations.
For every practical there is a folder in the shared directory
+/leonardo_work/EUHPC_TD02_030/siesta-tutorials that contains all the files
+you will need for said practical. At the start of every tutorial, please copy
+the required files from that shared directory to your scratch folder. For
+example, for the first tutorial:
You will find a sample script on how to run SIESTA in
+/leonardo_work/EUHPC_TD02_030/softwares/siesta-5.0beta1/runsample.sh:
+
#!/bin/bash
+#SBATCH -J tutorialXX
+#SBATCH -n 8
+#SBATCH -t 0:30:00
+#SBATCH -o %x-%j.out
+#SBATCH -e %x-%j.err
+#SBATCH --partition=boost_usr_prod
+#SBATCH -D .
+
+# DO NOT CHANGE THIS LINE
+source/leonardo_work/EUHPC_TD02_030/softwares/siesta-5.0beta1/siestarc.sh
+
+# EDIT THE CORRECT INPUT AND OUTPUT FILES.
+srun-n8siesta<input.fdf>output.out
+
+
+
Note that input.fdf and output.out are generic input and output file names, for
+each execution of siesta you will need to change them to the actual names of
+your input and output files. It is also convenient to change the name of the job
+(#SBATCH-Joption) to something that allows you to identify (vs. your other
+submissions) when checking the status of the queue.
In this workshop, participants will learn how to launch the most common
+types of calculations (e.g. scf, phonons, quasi-particle energies,
+time-dependent properties) using QE, SIESTA and Yambo, how to prepare input
+files and how to read output files in order to extract the desired properties.
+
Best practices for efficient exploitation of HPC resources will be discussed.
+On the QE days, there will be particular emphasis on how to use the different schemes of data
+distribution (e.g. plane waves, pools, images) in combination with the different
+parallelization and acceleration schemes (MPI, OpenMP, GPU-offload) available.
+Regarding SIESTA, participants will learn about its algorithmic efficiency,
+and they will be able to explore its accelerated and massively parallel solvers.
Emilio Artacho (CIC NanoGUNE and University of Cambridge)
+
Catalina Coll (ICN2)
+
José Mª Escartín (ICN2)
+
Roberta Farris (ICN2)
+
Ernane de Freitas (ICN2)
+
Alberto García (ICMAB-CSIC)
+
Arnold Kole (Utrecht University)
+
Nick Papior (DTU)
+
Federico Pedron (ICN2)
+
Miguel Pruneda (CINN-CSIC)
+
José Ángel Silva Guillén (IMDEA Nanociencia)
+
+
Varsano (CNR-NANO)
+
+
+
+
Ferretti (CNR-NANO)
+
+
+
+
Sangalli (CNR-ISM)
+
+
+
+
Guandalini (Univ. of Rome, La Sapienza)
+
+
+
+
Paleari (CNR-NANO)
+
+
+
+
D’Alessio (Univ. Modena and Reggio Emilia, CNR-NANO)
+
+
+
+
Sesti (CNR-NANO)
+
+
+
+
Spallanzani (CNR-NANO)
+
+
+
+
The lesson file structure and browsing layout is inspired by and derived from
+work by CodeRefinery licensed under the MIT license. We have copied and adapted
+most of their license text.
share - copy and redistribute the material in any medium or format
+
adapt - remix, transform, and build upon the material for any purpose,
+even commercially.
+
+
The licensor cannot revoke these freedoms as long as you follow these license terms:
+
+
Attribution - You must give appropriate credit (mentioning that your work
+is derived from work that is Copyright (c) ENCCS and individual contributors and, where practical, linking
+to https://enccs.github.io/max-coe-workshop), provide a link to the license, and indicate if changes were
+made. You may do so in any reasonable manner, but not in any way that suggests
+the licensor endorses you or your use.
+
No additional restrictions - You may not apply legal terms or
+technological measures that legally restrict others from doing anything the
+license permits.
+
+
With the understanding that:
+
+
You do not have to comply with the license for elements of the material in
+the public domain or where your use is permitted by an applicable exception
+or limitation.
+
No warranties are given. The license may not give you all of the permissions
+necessary for your intended use. For example, other rights such as
+publicity, privacy, or moral rights may limit how you use the material.
Except where otherwise noted, the example programs and other software provided
+with this repository are made available under the OSI-approved
+MIT license.
At the moment, unless otherwise stated, the contents of the SIESTA tutorials are
+copyrighted - all rights reserved.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
\ No newline at end of file
diff --git a/singlehtml/objects.inv b/singlehtml/objects.inv
new file mode 100644
index 0000000..27bcf8a
Binary files /dev/null and b/singlehtml/objects.inv differ