Skip to content

kenter/OpenCL-FPGA-examples

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

OpenCL-FPGA-examples

These examples are used and discussed in the Tutorial

M02 OpenCL design flows for Intel and Xilinx FPGAs - common optimization strategies, design patterns and vendor-specific differences

https://www.date-conference.com/conference/tutorial-m02

at

DATE 19 Conference

Target compilers

  • Intel FPGA SDK for OpenCL 18.1.1, aocx
  • Xilinx SDx 18.3, SDAccel feature with OpenCL, xocc

Makefile

Allow easy generation of reports and FPGA binaries using

make reportIntel-<design_name>
make reportXilinx-<design_name>
make buildIntel-<design_name>
make buildXilinx-<design_name>

Design files

macros.h

Common header file to enable portable use of pipes and channels

Example 1: vector scale

  • vscale1_vec.cl scaling an input vector in chunks of 16 elements using the OpenCL float16 data type
  • vscale2_u.cl applying automatic unrolling to achieve 16x parallelism, requires loop epilogue, not generated by xocc
  • vscale3_u16.cl applying automatic unrolling to achieve 16x parallelism without requiring loop epilogue - functionally only identical if size is multiple of 16
  • vscale4_u16_epi.cl applying automatic unrolling to achieve 16x parallelism, manual formulation of loop epilogue
  • vscale5_short.cl scaling an input vector in chunks of 16 elements using the OpenCL short16 data type - demonstrates that short multiplications fit into single DSP on Xilinx Kintex Ultrascale

Example 2: SAXPY

  • SAXPY1.cl direct implementation of BLAS 1 routine, requires one global write and two global read interfaces
  • SAXPY2_block.cl processing of routine in blocks of 1024, two read loops, one compute/write back loop
  • SAXPY3_ivdep.cl added ivdep pragma to blockwise processing to demonstrate formation of outer loop pipelining by aocx
  • SAXPY4_dataflow.cl add dataflow attribute, xocc design still suffers from lack of gmem ports
  • SAXPY5_streaming.cl separation into two kernels connected by pipes, proper pipelining for both compilers possible
  • SAXPY6_streaming16.cl separation into two kernels connected by pipes, using float16 datatype, asymptotic throughput of 16 elements per cycle with both compilers

Vendor Matrix Multiplication designs

Individual licenses in files apply

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published