This is a version optimized by SCC team from Nanyang Technological University for the ISC17 Student Cluster Competition.
Thanks Shao Yiyang and Lu Shengliang for helping me solve bugs when optimizing and porting the code.
Make sure you have the following dependencies
- MAGMA (without OpenMP)
- Intel compilers and MPI
- CUDA (with Fortran thunking cuBLAS interface)
- Nvidia MPS server
Go into folder src
$ module load CUDA OpenMPI
$ source /opt/intel/compilers_and_libraries/linux/bin/compilervars.sh intel64
$ cp ../fortran_thunking.o .
$ make comp=intel # make sure you have fortran_thunking.o
Optimization macros
__CUDA
enables CUDA based optimization__MAGMA
enables MAGMA to solve diagonalization__CUBLAS
enables cuBLAS to solve ZGEMM__NONBLOCKING_FFT
enables non-blocking fft_scatter__ZHEGVD
enables MAGMA call to magmaf_zhegvd
- OpenMP seems to be problematic, please disable OpenMP.
Go into folder benchmark
$ sudo nvidia-smi -c 3
$ sudo nvidia-cuda-mps-control -d
$ mpirun -np 88 -ppn 44 -hosts compute0,compute1 bash run.sh