OpenCL-FPGA-examples

These examples are used and discussed in the Tutorial

M02 OpenCL design flows for Intel and Xilinx FPGAs - common optimization strategies, design patterns and vendor-specific differences

https://www.date-conference.com/conference/tutorial-m02

at

DATE 19 Conference

Target compilers

Intel FPGA SDK for OpenCL 18.1.1, aocx
Xilinx SDx 18.3, SDAccel feature with OpenCL, xocc

Makefile

Allow easy generation of reports and FPGA binaries using

make reportIntel-<design_name>
make reportXilinx-<design_name>
make buildIntel-<design_name>
make buildXilinx-<design_name>

Design files

macros.h

Common header file to enable portable use of pipes and channels

Example 1: vector scale

vscale1_vec.cl scaling an input vector in chunks of 16 elements using the OpenCL float16 data type
vscale2_u.cl applying automatic unrolling to achieve 16x parallelism, requires loop epilogue, not generated by xocc
vscale3_u16.cl applying automatic unrolling to achieve 16x parallelism without requiring loop epilogue - functionally only identical if size is multiple of 16
vscale4_u16_epi.cl applying automatic unrolling to achieve 16x parallelism, manual formulation of loop epilogue
vscale5_short.cl scaling an input vector in chunks of 16 elements using the OpenCL short16 data type - demonstrates that short multiplications fit into single DSP on Xilinx Kintex Ultrascale

Example 2: SAXPY

SAXPY1.cl direct implementation of BLAS 1 routine, requires one global write and two global read interfaces
SAXPY2_block.cl processing of routine in blocks of 1024, two read loops, one compute/write back loop
SAXPY3_ivdep.cl added ivdep pragma to blockwise processing to demonstrate formation of outer loop pipelining by aocx
SAXPY4_dataflow.cl add dataflow attribute, xocc design still suffers from lack of gmem ports
SAXPY5_streaming.cl separation into two kernels connected by pipes, proper pipelining for both compilers possible
SAXPY6_streaming16.cl separation into two kernels connected by pipes, using float16 datatype, asymptotic throughput of 16 elements per cycle with both compilers

Vendor Matrix Multiplication designs

Individual licenses in files apply

mmult.cl Xilinx design for systolic array matrix multiplication, source https://github.com/Xilinx/SDAccel_Examples/tree/master/getting_started/kernel_opt/systolic_array_ocl
matrix_mult.cl Intel FPGA design for ND range matrix multiplication, source https://www.intel.com/content/www/us/en/programmable/support/support-resources/design-examples/design-software/opencl/matrix-multiplication.html

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
device		device
reportIntel		reportIntel
reportXilinx		reportXilinx
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OpenCL-FPGA-examples

Target compilers

Makefile

Design files

macros.h

Example 1: vector scale

Example 2: SAXPY

Vendor Matrix Multiplication designs

About

Releases

Packages

Languages

License

kenter/OpenCL-FPGA-examples

Folders and files

Latest commit

History

Repository files navigation

OpenCL-FPGA-examples

Target compilers

Makefile

Design files

macros.h

Example 1: vector scale

Example 2: SAXPY

Vendor Matrix Multiplication designs

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages