You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm seeing some strange performance issues with the GROMACS test on our system. I.e. occasionally, it just runs 10 times slower. Looking at htop, I see individual cores not being used - even though I would have expected each core to be running a single process (the GROMACS test is pure MPI).
The generated job script looks like this for a 2-node test:
I checked the binding of each process. To my surprise, the processes were bound to NUMA domains. I would never have expected that. According to https://www.open-mpi.org/doc/current/man1/mpirun.1.php when the number of processes is larger than 2, binding should be to socket.
Note that both binding to NUMA domain and to socket are potentially bad for the reproducibility of test performance: to make this performance predictable, I would just like to bind to core. I'm wondering if we shouldn't just call the set_compact_process_binding hook for this test... I'm not sure if this is the cause of my performance variation, but it seems like a good idea to me to enforce binding to core (which is essentially done by set_compact_process_binding) for the GROMACS test (and potentially others).
Right now, set_compact_process_binding is only used in the TensorFlow test, where it is quite essential (since that is a hybrid test).
The text was updated successfully, but these errors were encountered:
See #139 . I seem to get both better and more consistent performance with binding. Since reproducibility of the performance is important, I'd be in favor of enabling it (I'd probably even be in favor if the performance was worse, as long as it is more consistent :P).
I'm seeing some strange performance issues with the GROMACS test on our system. I.e. occasionally, it just runs 10 times slower. Looking at
htop
, I see individual cores not being used - even though I would have expected each core to be running a single process (the GROMACS test is pure MPI).The generated job script looks like this for a 2-node test:
I checked the binding of each process. To my surprise, the processes were bound to NUMA domains. I would never have expected that. According to https://www.open-mpi.org/doc/current/man1/mpirun.1.php when the number of processes is larger than 2, binding should be to socket.
Note that both binding to NUMA domain and to socket are potentially bad for the reproducibility of test performance: to make this performance predictable, I would just like to bind to core. I'm wondering if we shouldn't just call the
set_compact_process_binding
hook for this test... I'm not sure if this is the cause of my performance variation, but it seems like a good idea to me to enforce binding to core (which is essentially done byset_compact_process_binding
) for the GROMACS test (and potentially others).Right now,
set_compact_process_binding
is only used in theTensorFlow
test, where it is quite essential (since that is a hybrid test).The text was updated successfully, but these errors were encountered: