Problem with parallelization when running CP2K on `quantum-mobile:20.11.2a` #180

sphuber · 2021-03-29T07:53:49Z

Taken from the discussion thread of PR #160 👍

When running a CP2K relax workflow on the quantum-mobile:20.11.2a docker container on an Ubuntu host OS, there seems to be a problem with the parallelization. Many more processes are launched than intended and multiple processes start to write independently to the output file.

@yakutovicha who ran on MacOS host could not reproduce this.

Here is a screenshot from htop once I launch a single CP2K relax workchain with the fast protocol for silicon:

It spawns 24 processes for my 12 core CPU and uses them to the max. Could there be a problem with the parallelization that causes it to run double? In the output file, I see the following


 SCF WAVEFUNCTION OPTIMIZATION

  Step     Update method      Time    Convergence         Total energy    Change
  ------------------------------------------------------------------------------
     1 NoMix/Diag. 0.10E+00  171.5     1.08006898        -7.9431608334 -7.94E+00
     1 NoMix/Diag. 0.10E+00  174.8     1.08006898        -7.9431608334 -7.94E+00
     2 Broy./Diag. 0.10E+00  177.1     0.00168440        -7.9277225536  1.54E-02
     2 Broy./Diag. 0.10E+00  181.4     0.00168440        -7.9277225536  1.54E-02
     3 Broy./Diag. 0.10E+00  171.9     0.02871103        -7.7806865131  1.47E-01
     3 Broy./Diag. 0.10E+00  180.7     0.02871103        -7.7806865131  1.47E-01
     4 Broy./Diag. 0.10E+00  181.0     0.00045755        -7.8172268412 -3.65E-02
     4 Broy./Diag. 0.10E+00  178.3     0.00045755        -7.8172268412 -3.65E-02
     5 Broy./Diag. 0.10E+00  172.8     0.00370116        -7.8395344612 -2.23E-02
     5 Broy./Diag. 0.10E+00  175.5     0.00370116        -7.8395344612 -2.23E-02
     6 Broy./Diag. 0.10E+00  173.2     0.00035810        -7.8469207606 -7.39E-03
     6 Broy./Diag. 0.10E+00  177.5     0.00035810        -7.8469207606 -7.39E-03
     7 Broy./Diag. 0.10E+00  175.3     0.00234601        -7.8636002502 -1.67E-02
     7 Broy./Diag. 0.10E+00  168.5     0.00234601        -7.8636002502 -1.67E-02
     8 Broy./Diag. 0.10E+00  173.3     0.00004036        -7.8654690663 -1.87E-03
     8 Broy./Diag. 0.10E+00  172.1     0.00004036        -7.8654690663 -1.87E-03
     9 Broy./Diag. 0.10E+00  172.9     0.00009081        -7.8667661112 -1.30E-03

It does seem to double each step, or is that normal? Maybe this is all due to the submission script:

#SBATCH --no-requeue
#SBATCH --job-name="aiida-201"
#SBATCH --get-user-env
#SBATCH --output=_scheduler-stdout.txt
#SBATCH --error=_scheduler-stderr.txt
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=2
#SBATCH --time=01:00:00


ulimit -s unlimited

'mpirun' '-np' '2' '/usr/local/bin/cp2k.ssmp' '-i' 'aiida.inp'  > 'aiida.out' 2>&1

The text was updated successfully, but these errors were encountered:

giovannipizzi · 2021-04-13T11:34:42Z

I can reproduce this on a Mac host (running in VirtualBox), with QM 20.11.2a.
Even just running mpirun -np 2 cp2k.ssmp prints twice the "usage" message.

Do I understand correctly that the ssmp version (I guess downloaded from the GitHub releases page, looking at the ansible role) is supposed to be used with OpenMP/multithreading, and not with MPI?
Maybe @dev-zero or @oschuett can confirm?

In this case, what is the suggested/simplest way to get a compiled binary of CP2K working with MPI on Quantum Mobile (Ubuntu)?
@chrisjsewell and @yakutovicha did you try to follow the compilation instructions to get CP2K running in QMobile already?

dev-zero · 2021-04-13T12:04:00Z

Yes, ssmp (or any cp2k.s*) is strictly non-MPI.

dev-zero · 2021-04-13T12:08:11Z

You should also make sure that OMP_NUM_THREADS is set properly to avoid oversubscription (something like max(16, number_of_physical_cores//number_of_ranks_per_machine) may be a good start). Depending on your scheduler it may be set by --ntasks-per-node or some MPI runtime may also set it automatically (depending on mapping options).
Also, is that really correct to use mpirun inside an sbatch script (rather than srun)?

chrisjsewell · 2021-04-13T14:15:22Z

Do I understand correctly that the ssmp version (I guess downloaded from the GitHub releases page, looking at the ansible role) is supposed to be used with OpenMP/multithreading, and not with MPI?

Yeh basically this binary download has never worked on any Quantum Mobile, which is a little annoying to find out now (I had no part in writing it).
So basically it needs to be compiled from source each time. I asked @yakutovicha to look into this?

The other route is to use the https://github.com/conda-forge/cp2k-feedstock, which we eventually want to look into using for all simulation codes. But I think this may be too difficult to implement at this time (and also v8.1.0 is not yet released as there is still some outstanding issues for it)

sphuber · 2021-04-13T14:18:48Z

Also, is that really correct to use mpirun inside an sbatch script (rather than srun)?

I am not a 100% that when running with SLURM you have to run with srun. It is true that on QM we configure the localhost with SLURM but use mpirun. I just tried switching to srun -n {tot_num_mpiprocs} (which works on machines like Piz Daint for example) but now that fails when running Quantum ESPRESSO. Using the mpirun, however, works without problems. The calculation is run with two MPI process but with srun two individual parallel executions are launched that both write to the same file. Do you know when and when not srun can or should be used @dev-zero ?

I will try to look around a bit in the documentation of SLURM to see if I can find anything

dev-zero · 2021-04-13T14:46:11Z

@sphuber not really, sorry. My guess is that you have to configure slurm and that srun becomes a wrapper around mpirun (or whatever command is needed to run mpi on a system). What srun usually does is it takes env vars injected into the env by sbatch and forwards them to the MPI environment (and does some more mapping, etc.). So, in this simple setting it might just be the right thing to use mpirun and the only thing srun would allow is to avoid the explicit -np parameter for mpirun.

oschuett · 2021-04-13T14:51:06Z

The installation of CP2K is indeed a major pain point. We now have 30+ dependency and keep adding more.

The binary we provide with the releases is hand-rolled, statically linked, and stripped-down, e.g. without MPI.

While CP2K is included in Debian and Fedora, those distributions have long release cycles. Hence, I believe the way to go are indeed package managers like Conda or Spack. Unfortunately, maintaining those packages is a lot of work.

dev-zero · 2021-04-13T16:26:29Z

Probably stating the obvious, but the quickfix for now would be to limit the number of ranks to 1.

sphuber · 2021-04-13T18:24:58Z

That is something that we are considering adding in the input generators of the common workflow project, for which this problem is most critical now. But we cannot enforce this on the plugin level and so this means that CP2K is broken on QM for any other calculation where the user selects more than 1 rank. So we will have to find a solution at some point if we want CP2K to run reliably on QM

ltalirz · 2022-10-10T14:45:16Z

Just mentioning for anyone needing to run cp2k on the quantum mobile:

The following for me uses 1 process and 12 threads on the quantum mobile 21.05.1 docker container under ubuntu:

aiida-common-workflows launch eos cp2k -S Fe -p precise -s collinear --codes cp2k-7.1@localhost  --magnetization-per-site -4 4 --daemon -n 1

The calculation seems to be running fine (except for being rather slow of course ;-) ).

yakutovicha · 2022-10-11T07:51:02Z

By the way, what is the status of this? I believe the issue could be solved by installing CP2K from conda-forge.

sphuber assigned chrisjsewell Mar 29, 2021

sphuber mentioned this issue Apr 13, 2021

Set the correct mpirun_command for the marvel.aiida role marvel-nccr/quantum-mobile#178

Closed

sphuber mentioned this issue Apr 16, 2021

Set the correct mpirun_command to go with the localhost computer for the AiiDA install marvel-nccr/quantum-mobile#177

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problem with parallelization when running CP2K on `quantum-mobile:20.11.2a` #180

Problem with parallelization when running CP2K on `quantum-mobile:20.11.2a` #180

sphuber commented Mar 29, 2021

giovannipizzi commented Apr 13, 2021

dev-zero commented Apr 13, 2021

dev-zero commented Apr 13, 2021 •

edited

Loading

chrisjsewell commented Apr 13, 2021

sphuber commented Apr 13, 2021

dev-zero commented Apr 13, 2021

oschuett commented Apr 13, 2021 •

edited

Loading

dev-zero commented Apr 13, 2021

sphuber commented Apr 13, 2021

ltalirz commented Oct 10, 2022

yakutovicha commented Oct 11, 2022

Problem with parallelization when running CP2K on quantum-mobile:20.11.2a #180

Problem with parallelization when running CP2K on quantum-mobile:20.11.2a #180

Comments

sphuber commented Mar 29, 2021

giovannipizzi commented Apr 13, 2021

dev-zero commented Apr 13, 2021

dev-zero commented Apr 13, 2021 • edited Loading

chrisjsewell commented Apr 13, 2021

sphuber commented Apr 13, 2021

dev-zero commented Apr 13, 2021

oschuett commented Apr 13, 2021 • edited Loading

dev-zero commented Apr 13, 2021

sphuber commented Apr 13, 2021

ltalirz commented Oct 10, 2022

yakutovicha commented Oct 11, 2022

Problem with parallelization when running CP2K on `quantum-mobile:20.11.2a` #180

Problem with parallelization when running CP2K on `quantum-mobile:20.11.2a` #180

dev-zero commented Apr 13, 2021 •

edited

Loading

oschuett commented Apr 13, 2021 •

edited

Loading