-
-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove mpi4py as dependency #101
base: develop
Are you sure you want to change the base?
Conversation
I don't have the time to carefully check the MPI dependency now. I'm still a bit afraid here that we might oversee something, if we merge this PR, but only having MPI as an optional dependency would still be nice. I would suggest to keep this issue. If somebody finds time to work on it or the MPI dependency really becomes a problem (because one cannot easily satisfy it on a special system, for example), we can start looking into this again. |
precice seems to be at the root of our issue with not being able to fork mpirun within gymprecice. The main reason is that, within the precice Python-binding, developers have added line "from mpi4py import MPI" to deal with a dependency compile error (as a workaround), which is still an open issue: The issue: precice/python-bindings#8 (comment) The PR which has not been merged into the release branch: precice/python-bindings#101 and precice/precice#299 Based on https://stackoverflow.com/questions/64581965/run-mpi-program-using-subprocess-popen, the added line, "from mpi4py import MPI", makes the python script to be run in singleton mode, and therefore: sunprocess.run(["mpirun"], ...) fails sunprocess.Popen(["mpirun"], ...) is zombified. This fix is a workaround to remove MPI from the parent env and provide the "cleaned" env to subprocess rather than letting the subprocess inherit the parent env as it is. The fix is inspired by the discussion in: https://stackoverflow.com/questions/30384649/python-with-embedded-call-to-mpirun
This PR removes mpi4py (again) as a dependency.
There is some history and some other issues involved. I just added a test in #100 to be able to reproduce possible failures. Before merging this PR I would like to investigate the setup described in #8 (comment) again and make sure that the error is really gone (ideally also find out why it is now gone).