Swe performance exercise

Prepare your Git repository

Start by cloning a clean version of the SWE.git repository master branch

git clone --recursive https://github.com/fomics/SWE.git

... and apply a patch to the Sconstruct file

cd SWE
git cherry-pick 6624e61b9b82a8e6098bcf556d187a7fb9d7f492

Compile SWE MPI/OpenMP hybrid using the GNU compiler (with Cray compiler wrapper)

module swap PrgEnv-cray PrgEnv-gnu
module load scons
module load python/2.7.2

scons copyenv=true compiler=cray parallelization=mpi solver=fwavevec openmp=yes

Open the file src/blocks/SWE_WavePropagation.cpp
- Add the line #define LOOP_OPENMP before

#ifdef LOOP_OPENMP
#include <omp.h>
#endif

Remove the line that starts with solver::Hybrid<float>

Perform OpenMP parallelization in the initialization

cd src/tools

Edit the file help.hh. Remove lines 99 and 100, initialization of float2d. In other words the resulting code should look like this:

       Float2D(int _cols, int _rows) : rows(_rows),cols(_cols)
       {
              elem = new float[rows*cols];
       }

Next, parallelize the initialization with OpenMP.

cd src/blocks

Edit the file SWE_Block.cpp. Make parallel for loops at line 97 and 107, namely for the scenario initialization for height and bathymetry. The resulting code should look like this:

#pragma omp parallel for
  // initialize water height and discharge
  for(int i=1; i<=nx; i++)
    for(int j=1; j<=ny; j++) {
      float x = offsetX + (i-0.5f)*dx;
      float y = offsetY + (j-0.5f)*dy;
      h[i][j] = i_scenario.getWaterHeight(x,y);
      hu[i][j] = i_scenario.getVeloc_u(x,y) * h[i][j];
      hv[i][j] = i_scenario.getVeloc_v(x,y) * h[i][j];
    };

  // initialize bathymetry
#pragma omp parallel for
  for(int i=0; i<=nx+1; i++) {
    for(int j=0; j<=ny+1; j++) {
      b[i][j] = i_scenario.getBathymetry( offsetX + (i-0.5f)*dx,
                                          offsetY + (j-0.5f)*dy );
    }
  }

This is essentially the result of Michael Bader's component on SWE (without the vectorization aspects which were specific for the Intel compiler). Before anything else, try this on various numbers of MPI processes and OpenMP threads on one core to make sure that you see scalability. E.g., from SWE directory,

salloc -N 1   # get allocation of one node
OMP_NUM_THREADS=1 aprun -n 1 -d 1 build/SWE_cray_release_mpi_fwavevec -x 400 -y 400 -c 1 -o /dev/null
OMP_NUM_THREADS=16 aprun -n 1 -d 16 build/SWE_cray_release_mpi_fwavevec -x 400 -y 400 -c 1 -o /dev/null
OMP_NUM_THREADS=4 aprun -n 1 -d 4 build/SWE_cray_release_mpi_fwavevec -x 400 -y 400 -c 1 -o /dev/null

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Swe performance exercise

Prepare your Git repository

Compile SWE MPI/OpenMP hybrid using the GNU compiler (with Cray compiler wrapper)

Perform OpenMP parallelization in the initialization

Clone this wiki locally