Skip to content

Swe performance exercise

fomics edited this page Jul 16, 2013 · 12 revisions

Prepare your Git repository

Start by cloning a clean version of the SWE.git repository master branch

git clone --recursive https://github.com/fomics/SWE.git

... and apply a patch to the Sconstruct file

cd SWE
git cherry-pick 6624e61b9b82a8e6098bcf556d187a7fb9d7f492

Compile SWE MPI/OpenMP hybrid using the GNU compiler (with Cray compiler wrapper)

module swap PrgEnv-cray PrgEnv-gnu
module load scons
module load python/2.7.2
scons copyenv=true compiler=cray parallelization=mpi solver=fwavevec openmp=yes
  • Open the file src/blocks/SWE_WavePropagation.cpp
    • Add the line #define LOOP_OPENMP before
#ifdef LOOP_OPENMP
#include <omp.h>
#endif
  • Remove the line that starts with solver::Hybrid<float>

Perform OpenMP parallelization in the initialization

cd src/tools

Edit the file help.hh. Remove lines 99 and 100, initialization of float2d. In other words the resulting code should look like this:

       Float2D(int _cols, int _rows) : rows(_rows),cols(_cols)
       {
              elem = new float[rows*cols];
       }

Next, parallelize the initialization with OpenMP.

cd src/blocks

Edit the file SWE_Block.cpp. Make parallel for loops at line 97 and 107, namely for the scenario initialization for height and bathymetry. The resulting code should look like this:

#pragma omp parallel for
  // initialize water height and discharge
  for(int i=1; i<=nx; i++)
    for(int j=1; j<=ny; j++) {
      float x = offsetX + (i-0.5f)*dx;
      float y = offsetY + (j-0.5f)*dy;
      h[i][j] = i_scenario.getWaterHeight(x,y);
      hu[i][j] = i_scenario.getVeloc_u(x,y) * h[i][j];
      hv[i][j] = i_scenario.getVeloc_v(x,y) * h[i][j];
    };

  // initialize bathymetry
#pragma omp parallel for
  for(int i=0; i<=nx+1; i++) {
    for(int j=0; j<=ny+1; j++) {
      b[i][j] = i_scenario.getBathymetry( offsetX + (i-0.5f)*dx,
                                          offsetY + (j-0.5f)*dy );
    }
  }

This is essentially the result of Michael Bader's component on SWE (without the vectorization aspects which were specific for the Intel compiler). Before anything else, try this on various numbers of MPI processes and OpenMP threads on one core to make sure that you see scalability. E.g., from SWE directory,

salloc -N 1   # get allocation of one node
OMP_NUM_THREADS=1 aprun -n 1 -d 1 build/SWE_cray_release_mpi_fwavevec -x 400 -y 400 -c 1 -o /dev/null
OMP_NUM_THREADS=16 aprun -n 1 -d 16 build/SWE_cray_release_mpi_fwavevec -x 400 -y 400 -c 1 -o /dev/null
OMP_NUM_THREADS=4 aprun -n 1 -d 4 build/SWE_cray_release_mpi_fwavevec -x 400 -y 400 -c 1 -o /dev/null
Clone this wiki locally