Releases · amd/amd-fftw

Feature ‘AMD application optimization layer’ that uplifts the performance of HPC and scientific applications
Feature ‘Fast MPI transpose algorithm’ to speed up the distributed MPI FFT computations
Feature ‘Top N planner’ that minimizes single-threaded run-to-run variations
Support for building AMD FFTW library on Windows
GCC compilation support for AMD processors based on the AMD “Zen3” core architecture

Assets 2

06 Jul 08:42

pradeeptrgit

3.0.1

764197a

AMD Optimized FFTW version 3.0.1

Highlights of improvements on AMD EPYC^TM processor family CPUs

A new planner feature called Top N planner is introduced that minimizes single-threaded run-to-run variations.
New parallel MPI transpose algorithm enabled via configure option "--enable-amd-mpi-vader-limit"
- When using this configure option, the user needs to set --mca btl_vader_eager_limit appropriately (current preference is 65536) in the MPIRUN command.

Assets 2

15 Mar 18:07

pradeeptrgit

3.0

94199c3

AMD Optimized FFTW version 3.0

Highlights of improvements on AMD EPYC^TM processor family CPUs

New fast planner that improves the time of various planning modes in general and OPATIENT mode in particular. It can be enabled through configure option “–enable-amd-fast-planner”
Support for configure option “AMD_ARCH” to help cross compilation. It can take various options like auto/znver1/znver2/znver3 for AMD EPYC processors
Quad precision support is now included for AOCC clang compiler from version 10 onwards
Improved handling of –enable-debug and “CC” options by ‘configure’ when –enable-amd-opt is used
Fixed the wrong behavior of OWISDOM feature in the absence of wisdom file

Assets 2

30 Jun 09:07

pradeeptrgit

2.2

2a05028

AMD Optimized FFTW version 2.2

Highlights of improvements on AMD EPYC^TM processor family CPUs

Improved performance of in-place MPI FFT by employing a faster in-place MPI transpose routine.
Improved performance of copy function cpy2d_pair used for rank-0 transform and buffering plans.
Added DFT kernels of higher radix sizes for q1fv, t1fv and q1fv FFT codelets.

Assets 4

14 Jan 05:20

pradeeptrgit

2.1

f8a904c

AMD Optimized FFTW Version 2.1

AMD Optimized FFTW version 2.1

Highlights of improvements on AMD EPYC^TM processor family CPUs

Improved performance of the FFT kernels for AVX and AVX2
Improved performance of copy function used in rank-0 transform and buffering plans.
Several build configuration updates that work with --enable-amd-opt option including long double and quad precision support, CFLAGS, AOCC/clang compiler support

Assets 4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: amd/amd-fftw

AOCL-FFTW 5.0

AOCL-FFTW 4.2

AOCL-FFTW 4.1

AOCL FFTW version 4.0

AOCL FFTW version 3.2

AMD Optimized FFTW version 3.1

AMD Optimized FFTW version 3.0.1

AMD Optimized FFTW version 3.0

AMD Optimized FFTW version 2.2

AMD Optimized FFTW Version 2.1