Releases: amd/amd-fftw
Releases · amd/amd-fftw
AOCL-FFTW 5.0
Highlights of this release
- Support added for using the wisdom feature by default under the –enable-amd-app-opt option
- Minor bug fixes
AOCL-FFTW 4.2
Merge 4.2 release branch into amd-fftw
AOCL-FFTW 4.1
Highlights of this release
- Dynamic dispatch support added for AOCC build of the library on Linux
- Minor bug fixes
AOCL FFTW version 4.0
Highlights of improvements on AMD EPYCTM processor family CPUs
- AVX-512 enablement of DFT kernels
- AVX-512 optimization of copy and transpose routines
AOCL FFTW version 3.2
Highlights of improvements on AMD EPYCTM processor family CPUs
- Dynamic dispatcher for AOCL-FFTW
- Upgraded AOCL-FFTW to align with the reference FFTW 3.3.10 from MIT
- Windows FFTW features aligned with Linux FFTW
AMD Optimized FFTW version 3.1
Highlights of improvements on AMD EPYCTM processor family CPUs
- Feature ‘AMD application optimization layer’ that uplifts the performance of HPC and scientific applications
- Feature ‘Fast MPI transpose algorithm’ to speed up the distributed MPI FFT computations
- Feature ‘Top N planner’ that minimizes single-threaded run-to-run variations
- Support for building AMD FFTW library on Windows
- GCC compilation support for AMD processors based on the AMD “Zen3” core architecture
AMD Optimized FFTW version 3.0.1
AMD Optimized FFTW version 3.0.1
Highlights of improvements on AMD EPYCTM processor family CPUs
- A new planner feature called Top N planner is introduced that minimizes single-threaded run-to-run variations.
- New parallel MPI transpose algorithm enabled via configure option "--enable-amd-mpi-vader-limit"
- When using this configure option, the user needs to set --mca btl_vader_eager_limit appropriately (current preference is 65536) in the MPIRUN command.
AMD Optimized FFTW version 3.0
AMD Optimized FFTW version 3.0
Highlights of improvements on AMD EPYCTM processor family CPUs
- New fast planner that improves the time of various planning modes in general and OPATIENT mode in particular. It can be enabled through configure option “–enable-amd-fast-planner”
- Support for configure option “AMD_ARCH” to help cross compilation. It can take various options like auto/znver1/znver2/znver3 for AMD EPYC processors
- Quad precision support is now included for AOCC clang compiler from version 10 onwards
- Improved handling of –enable-debug and “CC” options by ‘configure’ when –enable-amd-opt is used
- Fixed the wrong behavior of OWISDOM feature in the absence of wisdom file
AMD Optimized FFTW version 2.2
AMD Optimized FFTW version 2.2
Highlights of improvements on AMD EPYCTM processor family CPUs
- Improved performance of in-place MPI FFT by employing a faster in-place MPI transpose routine.
- Improved performance of copy function cpy2d_pair used for rank-0 transform and buffering plans.
- Added DFT kernels of higher radix sizes for q1fv, t1fv and q1fv FFT codelets.
AMD Optimized FFTW Version 2.1
AMD Optimized FFTW version 2.1
Highlights of improvements on AMD EPYCTM processor family CPUs
- Improved performance of the FFT kernels for AVX and AVX2
- Improved performance of copy function used in rank-0 transform and buffering plans.
- Several build configuration updates that work with --enable-amd-opt option including long double and quad precision support, CFLAGS, AOCC/clang compiler support