We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MPI_Comm_dup
New day, new issues. I just tried the latest amd software stack on Frontier:
module load cpe/23.12 module load PrgEnv-amd module load amd/5.7.1 module load craype-accel-amd-gfx90a cmake cray-hdf5-parallel cray-python ninja export MPICH_GPU_SUPPORT_ENABLED=1
and this result in non-functional code (e.g., advection example):
Assertion failed in file ../src/mpid/common/cray/cray_gpu_ops.c at line 188: mpi_errno == MPI_SUCCESS /opt/cray/pe/lib64/libmpi_amd.so.12(MPL_backtrace_show+0x26) [0x7fffebab367b] /opt/cray/pe/lib64/libmpi_amd.so.12(+0x22bf374) [0x7fffeb4d9374] /opt/cray/pe/lib64/libmpi_amd.so.12(+0x2725368) [0x7fffeb93f368] /opt/cray/pe/lib64/libmpi_amd.so.12(+0x2168420) [0x7fffeb382420] /opt/cray/pe/lib64/libmpi_amd.so.12(+0x1fa237c) [0x7fffeb1bc37c] /opt/cray/pe/lib64/libmpi_amd.so.12(+0x1fa028c) [0x7fffeb1ba28c] /opt/cray/pe/lib64/libmpi_amd.so.12(+0x6d4cf1) [0x7fffe98eecf1] /opt/cray/pe/lib64/libmpi_amd.so.12(PMPI_Comm_dup+0x174) [0x7fffe98eef34] /sw/frontier/spack-envs/base/opt/cray-sles15-zen3/cce-15.0.0/darshan-runtime-3.4.0-t6el25xrwgfg5j65rdrhrs3qjp4ojssp/lib/libdarshan.so.0(darshan_core_initialize+0xa8) [0x7fffebbd3f68] /sw/frontier/spack-envs/base/opt/cray-sles15-zen3/cce-15.0.0/darshan-runtime-3.4.0-t6el25xrwgfg5j65rdrhrs3qjp4ojssp/lib/libdarshan.so.0(MPI_Init+0x7d) [0x7fffebbd3d0d] /ccs/proj/ast146/pgrete/src/athenapk/external/parthenon/build-bisect-def-atomics-benfix-cpe2312/example/advection/advection-example() [0x335280a] /ccs/proj/ast146/pgrete/src/athenapk/external/parthenon/build-bisect-def-atomics-benfix-cpe2312/example/advection/advection-example() [0x3050e40] /lib64/libc.so.6(__libc_start_main+0xef) [0x7fffe89f924d] /ccs/proj/ast146/pgrete/src/athenapk/external/parthenon/build-bisect-def-atomics-benfix-cpe2312/example/advection/advection-example() [0x2f4ce6a] MPICH ERROR [Rank 0] [job id 2015481.11] [Tue Jun 11 08:41:29 2024] [frontier00491] - Abort(1): Internal error srun: error: frontier00491: task 0: Exited with exit code 1 srun: Terminating StepId=2015481.11
The text was updated successfully, but these errors were encountered:
Same issue with PrgEnv-cray
PrgEnv-cray
Sorry, something went wrong.
No branches or pull requests
New day, new issues.
I just tried the latest amd software stack on Frontier:
and this result in non-functional code (e.g., advection example):
The text was updated successfully, but these errors were encountered: