Skip to content

Commit

Permalink
This change contains the fixes for following issues:-
Browse files Browse the repository at this point in the history
1) When AMD optimized FFTW (using --enable-amd-opt) is configured with --enable-amd-trans, the hybrid OpenMP+MPI tests were failing.
   With this fix, AMD_OPT_TRANS is made undefined in case of hybrid OpenMP+MPI configuration. So, even using --enable-amd-trans, AMD optimized
   transpose would not be used when running for hybrid OpenMP+MPI configuration. For, single-threaded FFTW, AMD_OPT_TRANS will remain enabled.
2) long double and quad precision tests with --enable-amd-opt were failing. The cpy2d routine will use normal C version now in case of
   long double and quad precision. For single and double precision, AMD optimized cpy2d routine will remained enabled.

Change-Id: I30cdb461bd6d24f5563faba9f4c85b17f1c08006
  • Loading branch information
BiplabRaut committed Sep 25, 2019
1 parent 46b504e commit e987a4b
Show file tree
Hide file tree
Showing 2 changed files with 5 additions and 1 deletion.
2 changes: 1 addition & 1 deletion kernel/cpy2d.c
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@
# define WIDE_TYPE double
#endif

#ifdef AMD_OPT_ALL //AMD optimized routines
#if defined(AMD_OPT_ALL) && (!defined(FFTW_LDOUBLE) && !defined(FFTW_QUAD)) //AMD optimized routines

#ifdef FFTW_SINGLE//SINGLE PRECISION CPY2d starts
#ifdef AMD_OPT_IN_PLACE_1D_CPY2D_STABLE_INTRIN//SIMD optimized function
Expand Down
4 changes: 4 additions & 0 deletions kernel/ifftw.h
Original file line number Diff line number Diff line change
Expand Up @@ -96,6 +96,10 @@ extern "C"
//--------------------------------
//In-place Transpose related optimization switches :-
//The below switches are defined through config.h using configure script run-time feature arg --enable-amd-trans
//AMD_OPT_TRANS is currently tested and supported only for single-threaded, so undefining when MPI or openMP used
#if defined(HAVE_MPI) || defined(HAVE_OPENMP)
#undef AMD_OPT_TRANS
#endif
#ifdef AMD_OPT_TRANS
#define AMD_OPT_AUTO_TUNED_TRANS_BLK_SIZE
#define AMD_OPT_AUTO_TUNED_RASTER_TILED_TRANS_METHOD
Expand Down

0 comments on commit e987a4b

Please sign in to comment.