You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am running Commander3 on a cluster. I compiled current master branch with intel compilers. I attempt to run a tutorial parameter file but the job is terminated with the error attached below.
Abort(1090959) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init: Other MPI error, error stack:
MPIR_Init_thread(176)........:
MPID_Init(1548)..............:
MPIDI_OFI_mpi_init_hook(1554):
(unknown)(): Other MPI error
[unset]: write_line error; fd=-1 buf=:cmd=abort exitcode=1090959
:
system msg for write_line failure : Bad file descriptor
Abort(1090959) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init: Other MPI error, error stack:
MPIR_Init_thread(176)........:
MPID_Init(1548)..............:
MPIDI_OFI_mpi_init_hook(1554):
(unknown)(): Other MPI error
[unset]: write_line error; fd=-1 buf=:cmd=abort exitcode=1090959
:
system msg for write_line failure : Bad file descriptor
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image PC Routine Line Source
libpthread-2.31.s 0000152B82E37420 Unknown Unknown Unknown
libmpi.so.12.0.0 0000152B81233BE1 MPIR_Err_return_c Unknown Unknown
libmpi.so.12.0.0 0000152B813D9ED0 MPI_Init Unknown Unknown
libmpifort.so.12. 0000152B829D748B PMPI_INIT Unknown Unknown
commander3 000000000049276A MAIN__ 77 commander.f90
commander3 00000000004923BD Unknown Unknown Unknown
libc-2.31.so 0000152B806DD083 __libc_start_main Unknown Unknown
commander3 00000000004922DE Unknown Unknown Unknown
--------------------------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:
Process name: [[28425,1],0]
Exit code: 174
--------------------------------------------------------------------------
The parameter file is from BP10 branch. And the version of MPI I was using is mpirun (Open MPI) 4.1.5 I did not modify much but only change the path of output and data path. Is it an error related to MPI or running out of memory on my cluster? Looking forward to any help.
The text was updated successfully, but these errors were encountered:
Hi all,
I am running Commander3 on a cluster. I compiled current master branch with intel compilers. I attempt to run a tutorial parameter file but the job is terminated with the error attached below.
The parameter file is from BP10 branch. And the version of MPI I was using is mpirun (Open MPI) 4.1.5 I did not modify much but only change the path of output and data path. Is it an error related to MPI or running out of memory on my cluster? Looking forward to any help.
The text was updated successfully, but these errors were encountered: