These examples are used and discussed in the Tutorial
M02 OpenCL design flows for Intel and Xilinx FPGAs - common optimization strategies, design patterns and vendor-specific differences
https://www.date-conference.com/conference/tutorial-m02
at
DATE 19 Conference
- Intel FPGA SDK for OpenCL 18.1.1,
aocx
- Xilinx SDx 18.3, SDAccel feature with OpenCL,
xocc
Allow easy generation of reports and FPGA binaries using
make reportIntel-<design_name>
make reportXilinx-<design_name>
make buildIntel-<design_name>
make buildXilinx-<design_name>
Common header file to enable portable use of pipes and channels
vscale1_vec.cl
scaling an input vector in chunks of 16 elements using the OpenCL float16 data typevscale2_u.cl
applying automatic unrolling to achieve 16x parallelism, requires loop epilogue, not generated byxocc
vscale3_u16.cl
applying automatic unrolling to achieve 16x parallelism without requiring loop epilogue - functionally only identical if size is multiple of 16vscale4_u16_epi.cl
applying automatic unrolling to achieve 16x parallelism, manual formulation of loop epiloguevscale5_short.cl
scaling an input vector in chunks of 16 elements using the OpenCL short16 data type - demonstrates that short multiplications fit into single DSP on Xilinx Kintex Ultrascale
SAXPY1.cl
direct implementation of BLAS 1 routine, requires one global write and two global read interfacesSAXPY2_block.cl
processing of routine in blocks of 1024, two read loops, one compute/write back loopSAXPY3_ivdep.cl
addedivdep
pragma to blockwise processing to demonstrate formation of outer loop pipelining byaocx
SAXPY4_dataflow.cl
adddataflow
attribute,xocc
design still suffers from lack ofgmem
portsSAXPY5_streaming.cl
separation into two kernels connected by pipes, proper pipelining for both compilers possibleSAXPY6_streaming16.cl
separation into two kernels connected by pipes, using float16 datatype, asymptotic throughput of 16 elements per cycle with both compilers
Individual licenses in files apply
mmult.cl
Xilinx design for systolic array matrix multiplication, source https://github.com/Xilinx/SDAccel_Examples/tree/master/getting_started/kernel_opt/systolic_array_oclmatrix_mult.cl
Intel FPGA design for ND range matrix multiplication, source https://www.intel.com/content/www/us/en/programmable/support/support-resources/design-examples/design-software/opencl/matrix-multiplication.html