OpenCL-FPGA-examples

These examples are used and discussed in the Tutorial

M02 OpenCL design flows for Intel and Xilinx FPGAs - common optimization strategies, design patterns and vendor-specific differences

at

DATE 19 Conference

Target compilers

Allow easy generation of reports and FPGA binaries using

make reportIntel-<design_name>
make reportXilinx-<design_name>
make buildIntel-<design_name>
make buildXilinx-<design_name>

Common header file to enable portable use of pipes and channels

vscale1_vec.cl scaling an input vector in chunks of 16 elements using the OpenCL float16 data type
vscale2_u.cl applying automatic unrolling to achieve 16x parallelism, requires loop epilogue, not generated by xocc
vscale3_u16.cl applying automatic unrolling to achieve 16x parallelism without requiring loop epilogue - functionally only identical if size is multiple of 16
vscale4_u16_epi.cl applying automatic unrolling to achieve 16x parallelism, manual formulation of loop epilogue
vscale5_short.cl scaling an input vector in chunks of 16 elements using the OpenCL short16 data type - demonstrates that short multiplications fit into single DSP on Xilinx Kintex Ultrascale

SAXPY1.cl direct implementation of BLAS 1 routine, requires one global write and two global read interfaces
SAXPY2_block.cl processing of routine in blocks of 1024, two read loops, one compute/write back loop
SAXPY3_ivdep.cl added ivdep pragma to blockwise processing to demonstrate formation of outer loop pipelining by aocx
SAXPY4_dataflow.cl add dataflow attribute, xocc design still suffers from lack of gmem ports
SAXPY5_streaming.cl separation into two kernels connected by pipes, proper pipelining for both compilers possible
SAXPY6_streaming16.cl separation into two kernels connected by pipes, using float16 datatype, asymptotic throughput of 16 elements per cycle with both compilers

Individual licenses in files apply

mmult.cl Xilinx design for systolic array matrix multiplication, source https://github.com/Xilinx/SDAccel_Examples/tree/master/getting_started/kernel_opt/systolic_array_ocl
matrix_mult.cl Intel FPGA design for ND range matrix multiplication, source https://www.intel.com/content/www/us/en/programmable/support/support-resources/design-examples/design-software/opencl/matrix-multiplication.html