v0.11.0
The release adds new kernels, new features, and resolves some issues. New kernels exercise RAJA features that are not used in pre-existing kernels.
Please download the RAJAPerf-v0.11.0.tar.gz file below. The others will not work due to the way RAJAPerf uses git submodules.
Notable changes include:
- Update RAJA submodule to v0.14.0 release.
- Update BLT submodule to v0.4.1 release (same one used in RAJA v0.14.0)
- New kernels added:
- 'Basic' group: MAT_MAT_SHARED, PI_ATOMIC, PI_REDUCE
- 'Apps' group: HALOEXCHANGE, HALOEXCHANGE_FUSED, MASS3DPA
- New group 'Algorithm' added and kernels in that group: SORT, SORTPAIRS
- New Lambda_CUDA and Lambda_HIP variants added to various kernels to help isolate performance issues when observed.
- Default problem size for all kernels is no ~1M so this is consistent across all kernels. Please refer to Suite documentation on main GitHub page for a discuss of problem size definitions.
- Execution of all GPU kernel variants has been modified (RAJA execution policies, base variant launches) to allow arbitrary problem sizes to be run.
- New runtime options:
- Option to run kernels with a specified size. This makes it easier to run scaling studies with the Suite.
- Option to filter kernels to run based on which RAJA features they use.
- More kernel information output added, such as features, iterations per rep, kernels per rep, bytes per rep, and FLOPs per rep. This and other information is printed to the screen before the Suite is run and is also output to a new CSV report file. Please see Suite documentation on main GitHub page for details.
- Additional warmup kernels enabled to initialize internal RAJA data structures so that initial kernel execution timings are more realistic.
- Error checking for base GPU variants added to catch launch failures where they occur.
- Compilation of RAJA exercises, examples, and tests is disabled by default. This makes compilation times much faster for users who do not want to build those parts of RAJA. These things can be enabled, if desired, with a CMake option.