Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BoomerAMG segmentation fault with >= 64 nodes on Frontier #1106

Open
BenWibking opened this issue Jul 30, 2024 · 0 comments
Open

BoomerAMG segmentation fault with >= 64 nodes on Frontier #1106

BenWibking opened this issue Jul 30, 2024 · 0 comments

Comments

@BenWibking
Copy link

From LLNL/AMG2023#13:

I can run AMG2023 problem 1 successfully on Frontier with < 64 nodes fine, but I get a segmentation fault with >= 64 nodes:

Running with these driver parameters:
  Problem ID    = 1

=============================================
Hypre init times:
=============================================
Hypre init:
  wall clock time = 0.000006 seconds
  Laplacian_27pt:
    (Nx, Ny, Nz) = (1600, 1600, 1600)
    (Px, Py, Pz) = (8, 8, 8)

srun: error: frontier04522: tasks 282-287: Segmentation fault
srun: Terminating StepId=2131722.0

with Segmentation fault errors reported for all of the other MPI ranks as well.

I built Hypre v2.31.0 with:

./configure --with-hip --with-gpu-arch=gfx90a --with-MPI-lib-dirs="${MPICH_DIR}/lib" --with-MPI-libs="mpi" --with-MPI-include="${MPICH_DIR}/include" --enable-mixedint

with cce/17.0.0, rocm/5.7.1, and cray-mpich/8.1.28.

I'm running the problem with:

#SBATCH --ntasks-per-node=8
#SBATCH --cpus-per-task=7
#SBATCH --gpus-per-task=1
#SBATCH --gpu-bind=closest
#SBATCH -N 64

export LD_LIBRARY_PATH=${CRAY_LD_LIBRARY_PATH}:${LD_LIBRARY_PATH}
export MPICH_GPU_SUPPORT_ENABLED=1

srun ./amg -problem 1 -n 200 200 200 -P 8 8 8
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant