-
Notifications
You must be signed in to change notification settings - Fork 4
Swe performance exercise
Start by cloning a clean version of the SWE.git repository master branch
git clone --recursive https://github.com/fomics/SWE.git
... and apply a patch to the Sconstruct file
cd SWE
git cherry-pick 6624e61b9b82a8e6098bcf556d187a7fb9d7f492
module swap PrgEnv-cray PrgEnv-gnu
module load scons
module load python/2.7.2
scons copyenv=true compiler=cray parallelization=mpi solver=fwavevec openmp=yes
- Open the file src/blocks/SWE_WavePropagation.cpp
- Add the line
#define LOOP_OPENMP
before
- Add the line
#ifdef LOOP_OPENMP
#include <omp.h>
#endif
- Remove the line that starts with
solver::Hybrid<float>
cd src/tools
Edit the file help.hh. Remove lines 99 and 100, initialization of float2d. In other words the resulting code should look like this:
Float2D(int _cols, int _rows) : rows(_rows),cols(_cols)
{
elem = new float[rows*cols];
}
Next, parallelize the initialization with OpenMP.
cd src/blocks
Edit the file SWE_Block.cpp. Make parallel for loops at line 97 and 107, namely for the scenario initialization for height and bathymetry. The resulting code should look like this:
#pragma omp parallel for
// initialize water height and discharge
for(int i=1; i<=nx; i++)
for(int j=1; j<=ny; j++) {
float x = offsetX + (i-0.5f)*dx;
float y = offsetY + (j-0.5f)*dy;
h[i][j] = i_scenario.getWaterHeight(x,y);
hu[i][j] = i_scenario.getVeloc_u(x,y) * h[i][j];
hv[i][j] = i_scenario.getVeloc_v(x,y) * h[i][j];
};
// initialize bathymetry
#pragma omp parallel for
for(int i=0; i<=nx+1; i++) {
for(int j=0; j<=ny+1; j++) {
b[i][j] = i_scenario.getBathymetry( offsetX + (i-0.5f)*dx,
offsetY + (j-0.5f)*dy );
}
}
This is essentially the result of Michael Bader's component on SWE (without the vectorization aspects which were specific for the Intel compiler). Before anything else, try this on various numbers of MPI processes and OpenMP threads on one core to make sure that you see scalability. E.g., from SWE directory,
salloc -N 1 # get allocation of one node
OMP_NUM_THREADS=1 aprun -n 1 -d 1 build/SWE_cray_release_mpi_fwavevec -x 400 -y 400 -c 1 -o /dev/null
OMP_NUM_THREADS=16 aprun -n 1 -d 16 build/SWE_cray_release_mpi_fwavevec -x 400 -y 400 -c 1 -o /dev/null
OMP_NUM_THREADS=4 aprun -n 1 -d 4 build/SWE_cray_release_mpi_fwavevec -x 400 -y 400 -c 1 -o /dev/null