You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We have a very strange behavior when running the Maia unit test suite with pytest_parallel and Intel MPI :
/[...]/maia/maia/my_debug/transfer/test/test_te_utils.py::test_create_all_elt_distribution[2] Assertion failed in file ../../src/mpid/ch4/src/intel/ch4_shm_coll.c at line 2279: comm->shm_numa_layout[my_numa_node].base_addr
Difficult to isolate a specific test case, it seems that we need a lot of test cases (~500) for it to fail.
Maybe a bug in one or several test cases, but seems to come rather from an MPI problem
No problem on other machines, other MPI versions... except maybe our dev cluster (same MPI version, but only triggered when launched through non-exclusive SLURM job)
We have a very strange behavior when running the Maia unit test suite with pytest_parallel and Intel MPI :
/[...]/maia/maia/my_debug/transfer/test/test_te_utils.py::test_create_all_elt_distribution[2] Assertion failed in file ../../src/mpid/ch4/src/intel/ch4_shm_coll.c at line 2279: comm->shm_numa_layout[my_numa_node].base_addr
Steps to reproduce :
Versions :
Notes :
time.sleep(0.1)
inpytest_pyfunc_call
to slightly change the concurrency does not change anythinggc.collect()
inpytest_runtest_protocol
or on the contrarygc.disable()
at the beginning does not change anythingComplete error message :
@maugarsb @couletj
The text was updated successfully, but these errors were encountered: