Skip to content

Commit

Permalink
Merge pull request #79 from LLNL/develop
Browse files Browse the repository at this point in the history
Merge develop into master
  • Loading branch information
rhornung67 authored Dec 19, 2019
2 parents c81b50d + e409059 commit 21e476f
Show file tree
Hide file tree
Showing 292 changed files with 14,761 additions and 7,828 deletions.
111 changes: 64 additions & 47 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,8 @@ RAJA Performance Suite

[![Build Status](https://travis-ci.org/LLNL/RAJAPerf.svg?branch=develop)](https://travis-ci.org/LLNL/RAJAPerf)

The RAJA performance suite is designed to explore performance of loop-based
computational kernels found in HPC applications. In particular, it
The RAJA performance suite is developed to explore performance of loop-based
computational kernels found in HPC applications. Specifically, it
is used to assess, monitor, and compare runtime performance of kernels
implemented using RAJA and variants implemented using standard or
vendor-supported parallel programming models directly. Each kernel in the
Expand Down Expand Up @@ -66,15 +66,18 @@ submodules. For example,
> cd RAJAPerf
> git checkout <some branch name>
> git submodule init
> git submodule update
> git submodule update --recursive
```

Note that the `--recursive` will update submodules within submodules, similar
to usage with the `git clone` as described above.

RAJA and the Performance Suite are built together using the same CMake
configuration. For convenience, we include scripts in the `scripts`
directory that invoke corresponding configuration files (CMake cache files)
in the RAJA submodule. For example, the `scripts/lc-builds` directory
contains scripts that show how we build code for testing on platforms in
the Livermore Computing Center. Each build script creates a
the Lawrence Livermore Computing Center. Each build script creates a
descriptively-named build space directory in the top-level Performance Suite
directory and runs CMake with a configuration appropriate for the platform and
compilers used. After CMake completes, enter the build directory and type
Expand Down Expand Up @@ -255,14 +258,22 @@ consistent.
Each kernel in the suite is implemented in a class whose header and
implementation files live in the directory named for the group
in which the kernel lives. The kernel class is responsible for implementing
all operations needed to execute and record execution timing and result
checksum information for each variant of the kernel.
all operations needed to manage data, execute and record execution timing and
result checksum information for each variant of the kernel.
To properly plug in to the Perf Suite framework, the kernel class must
inherit from the `KernelBase` base class that defines the interface for
a kernel in the suite.

Continuing with our example, we add a 'Foo' class header file 'Foo.hpp',
and multiple implementation files described in the following sections:
* 'Foo.cpp' contains the methods to setup and teardown the memory for the
'Foo kernel, and compute and record a checksum on the result after it
executes;
* 'Foo-Seq.cpp' contains CPU variants of the kernel;
* 'Foo-OMP.cpp' contains OpenMP CPU multithreading variants of the kernel;
* 'Foo-Cuda.cpp' contains CUDA GPU variants of the kernel; and
* 'Foo-OMPTarget.cpp' contains OpenMP target offload variants of the kernel.

Continuing with our example, we add 'Foo' class header and implementation
files 'Foo.hpp', 'Foo.cpp' (CPU variants), `Foo-Cuda.cpp` (CUDA variants),
and `Foo-OMPTarget.cpp` (OpenMP target variants) to the 'src/bar' directory.
The class must inherit from the 'KernelBase' base class that defines the
interface for kernels in the suite.

#### Kernel class header

Expand Down Expand Up @@ -298,10 +309,11 @@ public:
~Foo();

void setUp(VariantID vid);
void runKernel(VariantID vid);
void updateChecksum(VariantID vid);
void tearDown(VariantID vid);

void runSeqVariant(VariantID vid);
void runOpenMPVariant(VariantID vid);
void runCudaVariant(VariantID vid);
void runOpenMPTargetVariant(VariantID vid);

Expand All @@ -319,21 +331,21 @@ The kernel object header has a uniquely-named header file include guard and
the class is nested within the 'rajaperf' and 'bar' namespaces. The
constructor takes a reference to a 'RunParams' object, which contains the
input parameters for running the suite -- we'll say more about this later.
The four methods that take a variant ID argument must be provided as they are
The seven methods that take a variant ID argument must be provided as they are
pure virtual in the KernelBase class. Their names are descriptive of what they
do and we'll provide more details when we describe the class implementation
next.

#### Kernel class implementation

All kernels in the suite follow a similar implementation pattern for
consistency and ease of understanding. Here we describe several steps and
conventions that must be followed to ensure that all kernels interact with
the performance suite machinery in the same way:
consistency and ease of analysis and understanding. Here, we describe several
steps and conventions that must be followed to ensure that all kernels
interact with the performance suite machinery in the same way:

1. Initialize the 'KernelBase' class object with KernelID, default size, and default repetition count in the `class constructor`.
2. Implement data allocation and initialization operation for each kernel variant in the `setUp` method.
3. Implement kernel execution for each variant in the `RunKernel` method.
2. Implement data allocation and initialization operations for each kernel variant in the `setUp` method.
3. Implement kernel execution for the associated variants in the `run` methods.
4. Compute the checksum for each variant in the `updateChecksum` method.
5. Deallocate and reset any data that will be allocated and/or initialized in subsequent kernel executions in the `tearDown` method.

Expand Down Expand Up @@ -388,15 +400,17 @@ utility methods to allocate, initialize, deallocate, and copy data, and compute
checksums defined in the `DataUtils.hpp` `CudaDataUtils.hpp`, and
`OpenMPTargetDataUtils.hpp` header files in the 'common' directory.
##### runKernel() method
##### run methods
The 'runKernel()' method executes the kernel for the variant defined by the
variant ID argument. The method is also responsible for calling base class
methods to start and stop execution timers for the loop variant. A typical
kernel execution code section may look like:
Which files contain which 'run' methods and associated variant implementations
is described above. Each method take a variant ID argument which identifies
the variant to be run for each programming model back-end. Each method is also
responsible for calling base class methods to start and stop execution timers
when a loop variant is run. A typical kernel execution code section may look
like:
```cpp
void Foo::runKernel(VariantID vid)
void Foo::runSeqVariant(VariantID vid)
{
const Index_type run_reps = getRunReps();
// ...
Expand All @@ -408,8 +422,8 @@ void Foo::runKernel(VariantID vid)
// Declare data for baseline sequential variant of kernel...
startTimer();
for (SampIndex_type irep = 0; irep < run_reps; ++irep) {
// Implementation of kernel variant...
for (RepIndex_type irep = 0; irep < run_reps; ++irep) {
// Implementation of Base_Seq kernel variant...
}
stopTimer();
Expand All @@ -418,25 +432,29 @@ void Foo::runKernel(VariantID vid)
break;
}
// case statements for other CPU kernel variants....
#if defined(RUN_RAJA_SEQ)
case Lambda_Seq : {
startTimer();
for (RepIndex_type irep = 0; irep < run_reps; ++irep) {
// Implementation of Lambda_Seq kernel variant...
}
stopTimer();
#if defined(RAJA_ENABLE_TARGET_OPENMP)
case Base_OpenMPTarget :
case RAJA_OpenMPTarget :
{
runOpenMPTargetVariant(vid);
break;
}
#endif
#if defined(RAJA_ENABLE_CUDA)
case Base_CUDA :
case RAJA_CUDA :
{
runCudaVariant(vid);
case RAJA_Seq : {
startTimer();
for (RepIndex_type irep = 0; irep < run_reps; ++irep) {
// Implementation of RAJA_Seq kernel variant...
}
stopTimer();
break;
}
#endif
#endif // RUN_RAJA_SEQ
default : {
std::cout << "\n <kernel-name> : Unknown variant id = " << vid << std::endl;
Expand All @@ -449,18 +467,17 @@ void Foo::runKernel(VariantID vid)
All kernel implementation files are organized in this way. So following this
pattern will keep all new additions consistent.

Note: There are three source files for each kernel: 'Foo.cpp' contains CPU
variants, `Foo-Cuda.cpp` contains CUDA variants, and `Foo-OMPTarget.cpp`
constains OpenMP target variants. The reason for this is that it makes it
easier to apply unique compiler flags to different variants and to manage
compilation and linking issues that arise when some kernel variants are
combined in the same translation unit.
Note: As described earlier, there are five source files for each kernel.
The reason for this is that it makes it easier to apply unique compiler flags
to different variants and to manage compilation and linking issues that arise
when some kernel variants are combined in the same translation unit.

Note: for convenience, we make heavy use of macros to define data
declarations and kernel bodies in the suite. This significantly reduces
the amount of redundant code required to implement multiple variants
of each kernel. The kernel class implementation files in the suite
provide many examples of the basic pattern we use.
of each kernel and make sure things are the same as much as possible.
The kernel class implementation files in the suite provide many examples of
the basic pattern we use.

##### updateChecksum() method

Expand Down
58 changes: 57 additions & 1 deletion src/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -32,87 +32,143 @@ blt_add_executable(
SOURCES RAJAPerfSuiteDriver.cpp
apps/AppsData.cpp
apps/DEL_DOT_VEC_2D.cpp
apps/DEL_DOT_VEC_2D-Seq.cpp
apps/DEL_DOT_VEC_2D-OMPTarget.cpp
apps/ENERGY.cpp
apps/ENERGY-Seq.cpp
apps/ENERGY-OMPTarget.cpp
apps/FIR.cpp
apps/FIR-Seq.cpp
apps/FIR-OMPTarget.cpp
apps/PRESSURE.cpp
apps/PRESSURE-Seq.cpp
apps/PRESSURE-OMPTarget.cpp
apps/LTIMES.cpp
apps/LTIMES-Seq.cpp
apps/LTIMES-OMPTarget.cpp
apps/LTIMES_NOVIEW.cpp
apps/LTIMES_NOVIEW-Seq.cpp
apps/LTIMES_NOVIEW-OMPTarget.cpp
apps/WIP-COUPLE.cpp
apps/VOL3D.cpp
apps/VOL3D-Seq.cpp
apps/VOL3D-OMPTarget.cpp
apps/WIP-COUPLE.cpp
basic/ATOMIC_PI.cpp
basic/ATOMIC_PI-Seq.cpp
basic/ATOMIC_PI-OMPTarget.cpp
basic/DAXPY.cpp
basic/DAXPY-Seq.cpp
basic/DAXPY-OMPTarget.cpp
basic/IF_QUAD.cpp
basic/IF_QUAD-Seq.cpp
basic/IF_QUAD-OMPTarget.cpp
basic/INIT3.cpp
basic/INIT3-Seq.cpp
basic/INIT3-OMPTarget.cpp
basic/INIT_VIEW1D.cpp
basic/INIT_VIEW1D-Seq.cpp
basic/INIT_VIEW1D-OMPTarget.cpp
basic/INIT_VIEW1D_OFFSET.cpp
basic/INIT_VIEW1D_OFFSET-Seq.cpp
basic/INIT_VIEW1D_OFFSET-OMPTarget.cpp
basic/MULADDSUB.cpp
basic/MULADDSUB-Seq.cpp
basic/MULADDSUB-OMPTarget.cpp
basic/NESTED_INIT.cpp
basic/NESTED_INIT-Seq.cpp
basic/NESTED_INIT-OMPTarget.cpp
basic/REDUCE3_INT.cpp
basic/REDUCE3_INT-Seq.cpp
basic/REDUCE3_INT-OMPTarget.cpp
basic/TRAP_INT.cpp
basic/TRAP_INT-Seq.cpp
basic/TRAP_INT-OMPTarget.cpp
lcals/DIFF_PREDICT.cpp
lcals/DIFF_PREDICT-Seq.cpp
lcals/DIFF_PREDICT-OMPTarget.cpp
lcals/EOS.cpp
lcals/EOS-Seq.cpp
lcals/EOS-OMPTarget.cpp
lcals/FIRST_DIFF.cpp
lcals/FIRST_DIFF-Seq.cpp
lcals/FIRST_DIFF-OMPTarget.cpp
lcals/FIRST_MIN.cpp
lcals/FIRST_MIN-Seq.cpp
lcals/FIRST_MIN-OMPTarget.cpp
lcals/FIRST_SUM.cpp
lcals/FIRST_SUM-Seq.cpp
lcals/FIRST_SUM-OMPTarget.cpp
lcals/GEN_LIN_RECUR.cpp
lcals/GEN_LIN_RECUR-Seq.cpp
lcals/GEN_LIN_RECUR-OMPTarget.cpp
lcals/HYDRO_1D.cpp
lcals/HYDRO_1D-Seq.cpp
lcals/HYDRO_1D-OMPTarget.cpp
lcals/HYDRO_2D.cpp
lcals/HYDRO_2D-Seq.cpp
lcals/HYDRO_2D-OMPTarget.cpp
lcals/INT_PREDICT.cpp
lcals/INT_PREDICT-Seq.cpp
lcals/INT_PREDICT-OMPTarget.cpp
lcals/PLANCKIAN.cpp
lcals/PLANCKIAN-Seq.cpp
lcals/PLANCKIAN-OMPTarget.cpp
lcals/TRIDIAG_ELIM.cpp
lcals/TRIDIAG_ELIM-Seq.cpp
lcals/TRIDIAG_ELIM-OMPTarget.cpp
polybench/POLYBENCH_2MM.cpp
polybench/POLYBENCH_2MM-Seq.cpp
polybench/POLYBENCH_2MM-OMPTarget.cpp
polybench/POLYBENCH_3MM.cpp
polybench/POLYBENCH_3MM-Seq.cpp
polybench/POLYBENCH_3MM-OMPTarget.cpp
polybench/POLYBENCH_ADI.cpp
polybench/POLYBENCH_ADI-Seq.cpp
polybench/POLYBENCH_ADI-OMPTarget.cpp
polybench/POLYBENCH_ATAX.cpp
polybench/POLYBENCH_ATAX-Seq.cpp
polybench/POLYBENCH_ATAX-OMPTarget.cpp
polybench/POLYBENCH_FDTD_2D.cpp
polybench/POLYBENCH_FDTD_2D-Seq.cpp
polybench/POLYBENCH_FDTD_2D-OMPTarget.cpp
polybench/POLYBENCH_FLOYD_WARSHALL.cpp
polybench/POLYBENCH_FLOYD_WARSHALL-Seq.cpp
polybench/POLYBENCH_FLOYD_WARSHALL-OMPTarget.cpp
polybench/POLYBENCH_GEMM.cpp
polybench/POLYBENCH_GEMM-Seq.cpp
polybench/POLYBENCH_GEMM-OMPTarget.cpp
polybench/POLYBENCH_GEMVER.cpp
polybench/POLYBENCH_GEMVER-Seq.cpp
polybench/POLYBENCH_GEMVER-OMPTarget.cpp
polybench/POLYBENCH_GESUMMV.cpp
polybench/POLYBENCH_GESUMMV-Seq.cpp
polybench/POLYBENCH_GESUMMV-OMPTarget.cpp
polybench/POLYBENCH_HEAT_3D.cpp
polybench/POLYBENCH_HEAT_3D-Seq.cpp
polybench/POLYBENCH_HEAT_3D-OMPTarget.cpp
polybench/POLYBENCH_JACOBI_1D.cpp
polybench/POLYBENCH_JACOBI_1D-Seq.cpp
polybench/POLYBENCH_JACOBI_1D-OMPTarget.cpp
polybench/POLYBENCH_JACOBI_2D.cpp
polybench/POLYBENCH_JACOBI_2D-Seq.cpp
polybench/POLYBENCH_JACOBI_2D-OMPTarget.cpp
polybench/POLYBENCH_MVT.cpp
polybench/POLYBENCH_MVT-Seq.cpp
polybench/POLYBENCH_MVT-OMPTarget.cpp
stream/ADD.cpp
stream/ADD-Seq.cpp
stream/ADD-OMPTarget.cpp
stream/COPY.cpp
stream/COPY-Seq.cpp
stream/COPY-OMPTarget.cpp
stream/DOT.cpp
stream/DOT-Seq.cpp
stream/DOT-OMPTarget.cpp
stream/MUL.cpp
stream/MUL-Seq.cpp
stream/MUL-OMPTarget.cpp
stream/TRIAD.cpp
stream/TRIAD-Seq.cpp
stream/TRIAD-OMPTarget.cpp
common/DataUtils.cpp
common/Executor.cpp
Expand Down
22 changes: 18 additions & 4 deletions src/apps/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -9,27 +9,41 @@
blt_add_library(
NAME apps
SOURCES AppsData.cpp
WIP-COUPLE.cpp
DEL_DOT_VEC_2D.cpp
DEL_DOT_VEC_2D-Seq.cpp
DEL_DOT_VEC_2D-Cuda.cpp
DEL_DOT_VEC_2D-OMP.cpp
DEL_DOT_VEC_2D-OMPTarget.cpp
ENERGY.cpp
ENERGY.cpp
ENERGY-Seq.cpp
ENERGY-Cuda.cpp
ENERGY-OMP.cpp
ENERGY-OMPTarget.cpp
FIR.cpp
FIR-Seq.cpp
FIR-Cuda.cpp
FIR-OMP.cpp
FIR-OMPTarget.cpp
LTIMES.cpp
LTIMES-Seq.cpp
LTIMES-Cuda.cpp
LTIMES-OMP.cpp
LTIMES-OMPTarget.cpp
LTIMES_NOVIEW.cpp
LTIMES_NOVIEW-Seq.cpp
LTIMES_NOVIEW-Cuda.cpp
LTIMES_NOVIEW-OMP.cpp
LTIMES_NOVIEW-OMPTarget.cpp
PRESSURE.cpp
PRESSURE-Seq.cpp
PRESSURE-Cuda.cpp
PRESSURE-OMPTarget.cpp
VOL3D.cpp
PRESSURE-OMP.cpp
PRESSURE-OMP.cpp
VOL3D.cpp
VOL3D-Seq.cpp
VOL3D-Cuda.cpp
VOL3D-OMP.cpp
VOL3D-OMPTarget.cpp
WIP-COUPLE.cpp
DEPENDS_ON common ${RAJA_PERFSUITE_DEPENDS}
)
Loading

0 comments on commit 21e476f

Please sign in to comment.