-
Notifications
You must be signed in to change notification settings - Fork 7
Interfacing OpenACC with cuFFT, BLAS, MKL, FFTW
Many of the EuroHack projects require basic numerical operations like FFTs or level-1/2/3 BLAS. For these it is highly advantageous to use the NVIDIA cuXXX libraries. The corresponding user manuals can be found here:
These libraries are heavily optimized for GPUs, and there is no reason to try to beat their performance by writing one's own. The problem becomes one of interfacing the user's OpenACC code to these libraries. Fortunately there are a number of comprehensive examples available:
OpenACC Interoperability Tricks by Jeff Larkin
Interfacing an OpenACC program to cuFFT
OpenACC/cuFFT interoperability by Adam Simpson
Implementing FFT in a performant and portable way (possibility to run (fast) on Teslas, MICs, CPUs) has come up as an issue. The FFTW interface has been discussed, however, it is a host only interface so doesn’t work well if your data is already on the device. Intel “Math Kernel Library” (MKL) has FFT functionality built in, which seems to be the equivalent for cuFFT on MIC [1]. It should be possible to build a common interface to both cuFFT and MKL-FFT that supports device pointers. This still leaves open how to fall back on non-Intel x86-systems - on CPU there should be a software fallback option, potentially using FFTW. A common interface could be built using preprocessor macros.