Skip to content

Releases: amd/amd-fftw

AOCL-FFTW 5.0

11 Oct 06:03
Compare
Choose a tag to compare

Highlights of this release

  • Support added for using the wisdom feature by default under the –enable-amd-app-opt option
  • Minor bug fixes

AOCL-FFTW 4.2

28 Feb 08:41
Compare
Choose a tag to compare
Merge 4.2 release branch into amd-fftw

AOCL-FFTW 4.1

07 Aug 07:40
Compare
Choose a tag to compare

Highlights of this release

  • Dynamic dispatch support added for AOCC build of the library on Linux
  • Minor bug fixes

AOCL FFTW version 4.0

11 Nov 12:46
Compare
Choose a tag to compare

Highlights of improvements on AMD EPYCTM processor family CPUs

  • AVX-512 enablement of DFT kernels
  • AVX-512 optimization of copy and transpose routines

AOCL FFTW version 3.2

08 Jul 17:06
Compare
Choose a tag to compare

Highlights of improvements on AMD EPYCTM processor family CPUs

  • Dynamic dispatcher for AOCL-FFTW
  • Upgraded AOCL-FFTW to align with the reference FFTW 3.3.10 from MIT
  • Windows FFTW features aligned with Linux FFTW

AMD Optimized FFTW version 3.1

13 Dec 10:21
Compare
Choose a tag to compare

Highlights of improvements on AMD EPYCTM processor family CPUs

  • Feature ‘AMD application optimization layer’ that uplifts the performance of HPC and scientific applications
  • Feature ‘Fast MPI transpose algorithm’ to speed up the distributed MPI FFT computations
  • Feature ‘Top N planner’ that minimizes single-threaded run-to-run variations
  • Support for building AMD FFTW library on Windows
  • GCC compilation support for AMD processors based on the AMD “Zen3” core architecture

AMD Optimized FFTW version 3.0.1

06 Jul 08:42
Compare
Choose a tag to compare

AMD Optimized FFTW version 3.0.1

Highlights of improvements on AMD EPYCTM processor family CPUs

  • A new planner feature called Top N planner is introduced that minimizes single-threaded run-to-run variations.
  • New parallel MPI transpose algorithm enabled via configure option "--enable-amd-mpi-vader-limit"
    • When using this configure option, the user needs to set --mca btl_vader_eager_limit appropriately (current preference is 65536) in the MPIRUN command.

AMD Optimized FFTW version 3.0

15 Mar 18:07
Compare
Choose a tag to compare

AMD Optimized FFTW version 3.0

Highlights of improvements on AMD EPYCTM processor family CPUs

  • New fast planner that improves the time of various planning modes in general and OPATIENT mode in particular. It can be enabled through configure option “–enable-amd-fast-planner”
  • Support for configure option “AMD_ARCH” to help cross compilation. It can take various options like auto/znver1/znver2/znver3 for AMD EPYC processors
  • Quad precision support is now included for AOCC clang compiler from version 10 onwards
  • Improved handling of –enable-debug and “CC” options by ‘configure’ when –enable-amd-opt is used
  • Fixed the wrong behavior of OWISDOM feature in the absence of wisdom file

AMD Optimized FFTW version 2.2

30 Jun 09:07
Compare
Choose a tag to compare

AMD Optimized FFTW version 2.2

Highlights of improvements on AMD EPYCTM processor family CPUs

  • Improved performance of in-place MPI FFT by employing a faster in-place MPI transpose routine.
  • Improved performance of copy function cpy2d_pair used for rank-0 transform and buffering plans.
  • Added DFT kernels of higher radix sizes for q1fv, t1fv and q1fv FFT codelets.

AMD Optimized FFTW Version 2.1

14 Jan 05:20
Compare
Choose a tag to compare

AMD Optimized FFTW version 2.1

Highlights of improvements on AMD EPYCTM processor family CPUs

  • Improved performance of the FFT kernels for AVX and AVX2
  • Improved performance of copy function used in rank-0 transform and buffering plans.
  • Several build configuration updates that work with --enable-amd-opt option including long double and quad precision support, CFLAGS, AOCC/clang compiler support