This repository contains the Random Access Benchmark for FPGA and its OpenCL kernels. Currently only the Intel FPGA SDK for OpenCL utility is supported.
It is a modified implementation of the
Random Access Benchmark
provided in the HPC Challenge Benchmark suite.
The implementation follows the Python reference implementation given in
Introduction to the HPCChallenge Benchmark Suite available
here.
The Makefile will generate the device code using the code generator given in a submodule. So to make use of the code generation make sure to check out the repository recursively.
The code has the following dependencies:
- C++ compiler with C++11 support
- Intel FPGA SDK for OpenCL or Xilinx Vitis
- CMake 3.15 for building
CMake is used as the build system. The targets below can be used to build the benchmark and its kernels:
Target | Description |
---|---|
RandomAccess_VENDOR |
Builds the host application linking with the Intel SDK |
RandomAccess_test_VENDOR |
Compile the tests and its dependencies linking with the Intel SDK |
More over there are additional targets to generate kernel reports and bitstreams. The kernel targets are:
Target | Description |
---|---|
random_access_kernels_single_VENDOR |
Synthesizes the kernel (takes several hours!) |
random_access_kernels_single_report_VENDOR |
Just compile kernel and create logs and reports |
random_access_kernels_single_emulate_VENDOR |
Create a n emulation kernel |
For the host code as well as the kernels VENDOR
can be intel
or xilinx
.
The report target for Xilinx is missing but reports will be generated when the kernel is synthesized.
You can build for example the host application by running
mkdir build && cd build
cmake ..
make RandomAccess_intel
You will find all executables and kernel files in the bin
folder of your build directory.
Next to the common configuration options given in the README of the benchmark suite you might want to specify the following additional options before build:
Name | Default | Description |
---|---|---|
DEFAULT_ARRAY_LENGTH |
29 | Length of each input array in log2 (4GB) |
HPCC_FPGA_RA_NUM_REPLICATIONS |
4 | Replicates the kernels the given number of times |
HPCC_FPGA_RA_DEVICE_BUFFER_SIZE_LOG |
0 | Number of values that are stored in the local memory in the single kernel approach |
HPCC_FPGA_RA_INTEL_USE_PRAGMA_IVDEP |
No | Use the ivdep pragma in the main loop to remove the data dependency between reads and writes. This might lead to an error larger than 1%, but might also increase performance! |
HPCC_FPGA_RA_RNG_COUNT_LOG |
5 | Log2 of the number of random number generators that will be used concurrently |
HPCC_FPGA_RA_RNG_DISTANCE |
5 | Distance between RNGs in shift register. Used to relax data dependencies and increase clock frequency |
Moreover the environment variable INTELFPGAOCLSDKROOT
has to be set to the root
of the Intel FPGA SDK installation.
Additionally it is possible to set the used compiler and other build tools
in the CMakeCache.txt
located in the build directory after running cmake.
For execution of the benchmark run:
./RandomAccess_intel -f path_to_kernel.aocx
For more information on available input parameters run
./RandomAccess_intel -h
Implementation of the random access benchmark proposed in the HPCC benchmark suite for FPGA.
Version: 2.5
MPI Version: 3.1
Config. Time: Thu Dec 08 10:42:40 UTC 2022
Git Commit: 86e0064-dirty
Usage:
./bin/RandomAccess_intel [OPTION...]
-f, --file arg Kernel file name
-n, arg Number of repetitions (default: 10)
-i, Use memory Interleaving
--skip-validation Skip the validation of the output data. This will
speed up execution and helps when working with
special data types.
--device arg Index of the device that has to be used. If not
given you will be asked which device to use if there
are multiple devices available. (default: 0)
--platform arg Index of the platform that has to be used. If not
given you will be asked which platform to use if
there are multiple platforms available. (default: 0)
--platform_str arg Name of the platform that has to be used (default:
)
-r, arg Number of used kernel replications (default: 4)
--dump-json arg dump benchmark configuration and results to this
file in json format (default: )
--test Only test given configuration and skip execution
and validation
-h, --help Print this help
-d, arg Log2 of the size of the data array (default: 29)
-g, arg Log2 of the number of random number generators
(default: 5)
To execute the unit and integration tests for Intel devices run
CL_CONTEXT_EMULATOR_DEVICE=1 ./RandomAccess_test_intel -f KERNEL_FILE_NAME
in the bin
folder within the build directory.
It will run an emulation of the kernel and execute some functionality tests.
The host code will print the results of the execution to the standard output. The result summary looks similar to this:
Error: 3.90625e-03
best mean GUOPS
5.04258e-04 s 7.85656e-04 s 2.03071e-03 GUOP/s
best
andmean
are the fastest and the mean kernel execution time. The pure kernel execution time is measured without transferring the buffer back and forth to the host.GUPS
contains the calculated metric Giga Updates per Second. It takes the fastest kernel execution time. The formula is .Error
contains the percentage of memory positions with wrong values after the updates where made. The maximal allowed error rate of the random access benchmark is 1% according to the rules given in the HPCChallenge specification.
Benchmark results can be found in the results
folder in this
repository.
The json output looks like the following.
{
"config_time": "Wed Dec 14 08:43:07 UTC 2022",
"device": "Intel(R) FPGA Emulation Device",
"environment": {
"LD_LIBRARY_PATH": "/opt/software/pc2/EB-SW/software/Python/3.9.5-GCCcore-10.3.0/lib:/opt/software/pc2/EB-SW/software/libffi/3.3-GCCcore-10.3.0/lib64:/opt/software/pc2/EB-SW/software/GMP/6.2.1-GCCcore-10.3.0/lib:/opt/software/pc2/EB-SW/software/SQLite/3.35.4-GCCcore-10.3.0/lib:/opt/software/pc2/EB-SW/software/Tcl/8.6.11-GCCcore-10.3.0/lib:/opt/software/pc2/EB-SW/software/libreadline/8.1-GCCcore-10.3.0/lib:/opt/software/pc2/EB-SW/software/libarchive/3.5.1-GCCcore-10.3.0/lib:/opt/software/pc2/EB-SW/software/cURL/7.76.0-GCCcore-10.3.0/lib:/opt/software/pc2/EB-SW/software/bzip2/1.0.8-GCCcore-10.3.0/lib:/opt/software/pc2/EB-SW/software/ncurses/6.2-GCCcore-10.3.0/lib:/opt/software/pc2/EB-SW/software/ScaLAPACK/2.1.0-gompi-2021a-fb/lib:/opt/software/pc2/EB-SW/software/FFTW/3.3.9-gompi-2021a/lib:/opt/software/pc2/EB-SW/software/FlexiBLAS/3.0.4-GCC-10.3.0/lib:/opt/software/pc2/EB-SW/software/OpenBLAS/0.3.15-GCC-10.3.0/lib:/opt/software/pc2/EB-SW/software/OpenMPI/4.1.1-GCC-10.3.0/lib:/opt/software/pc2/EB-SW/software/PMIx/3.2.3-GCCcore-10.3.0/lib:/opt/software/pc2/EB-SW/software/libfabric/1.12.1-GCCcore-10.3.0/lib:/opt/software/pc2/EB-SW/software/UCX/1.10.0-GCCcore-10.3.0/lib:/opt/software/pc2/EB-SW/software/libevent/2.1.12-GCCcore-10.3.0/lib:/opt/software/pc2/EB-SW/software/OpenSSL/1.1/lib:/opt/software/pc2/EB-SW/software/hwloc/2.4.1-GCCcore-10.3.0/lib:/opt/software/pc2/EB-SW/software/libpciaccess/0.16-GCCcore-10.3.0/lib:/opt/software/pc2/EB-SW/software/libxml2/2.9.10-GCCcore-10.3.0/lib:/opt/software/pc2/EB-SW/software/XZ/5.2.5-GCCcore-10.3.0/lib:/opt/software/pc2/EB-SW/software/numactl/2.0.14-GCCcore-10.3.0/lib:/opt/software/pc2/EB-SW/software/binutils/2.36.1-GCCcore-10.3.0/lib:/opt/software/pc2/EB-SW/software/zlib/1.2.11-GCCcore-10.3.0/lib:/opt/software/pc2/EB-SW/software/GCCcore/10.3.0/lib64:/opt/software/slurm/21.08.6/lib:/opt/software/FPGA/IntelFPGA/opencl_sdk/21.2.0/hld/host/linux64/lib:/opt/software/FPGA/IntelFPGA/opencl_sdk/20.4.0/hld/board/bittware_pcie/s10/linux64/lib"
},
"errors": {
"ratio": 0.00390625
},
"execution_time": "Wed Dec 14 09:54:47 UTC 2022",
"git_commit": "be1a4e9-dirty",
"mpi": {
"subversion": 1,
"version": 3
},
"name": "random access",
"results": {
"guops": {
"unit": "GUOP/s",
"value": 0.0021329867229908477
},
"t_mean": {
"unit": "s",
"value": 0.0005428726000000001
},
"t_min": {
"unit": "s",
"value": 0.000480078
}
},
"settings": {
"#RNGs": 32,
"Array Size": 256,
"Communication Type": false,
"Kernel File": false,
"Kernel Replications": 4,
"MPI Ranks": 1,
"Repetitions": 10,
"Test Mode": false
},
"timings": {
"execution": [
{
"unit": "s",
"value": 0.000643471
},
{
"unit": "s",
"value": 0.000516849
},
{
"unit": "s",
"value": 0.000606361
},
{
"unit": "s",
"value": 0.00058182
},
{
"unit": "s",
"value": 0.00060401
},
{
"unit": "s",
"value": 0.000485259
},
{
"unit": "s",
"value": 0.000484699
},
{
"unit": "s",
"value": 0.00053713
},
{
"unit": "s",
"value": 0.000489049
},
{
"unit": "s",
"value": 0.000480078
}
]
},
"validated": true,
"version": "2.5"
}