Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mpi4py.MPI crashes on import in fedora:38 #112

Open
spossann opened this issue Oct 9, 2023 · 4 comments
Open

mpi4py.MPI crashes on import in fedora:38 #112

spossann opened this issue Oct 9, 2023 · 4 comments

Comments

@spossann
Copy link

spossann commented Oct 9, 2023

I have been using mpi4py with the image fedora:37 for a while without issues. However, with fedora:38 (and later) I see the following error:

I build the following image via docker build -t fedora_test -f docker/fedora.dockerfile .:

# docker/fedora.dockerfile
FROM fedora:38

RUN dnf install -y python3-pip \
    && dnf install -y gcc \
    && dnf install -y gfortran \ 
    && dnf install -y blas-devel lapack-devel \ 
    && dnf install -y openmpi openmpi-devel \
    && dnf install -y libgomp \
    && dnf install -y git \
    && dnf install -y environment-modules \
    && dnf install -y python3-mpi4py-openmpi \
    && dnf install -y python3-devel 

then run a container with docker run -it fedora_test and load the openmpi module

module load mpi/openmpi-x86_64

I then launch a Python virtual environment

python3 -m venv env
source env/bin/activate

and install mpi4py:

pip install mpi4py

Upon launching python3 and importing MPI from mpi4py, I get the following error:

Python 3.11.5 (main, Aug 28 2023, 00:00:00) [GCC 13.2.1 20230728 (Red Hat 13.2.1-1)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from mpi4py import MPI
64ad81b28a4c:pid259.python3: Failed to get eth0 (unit 0) cpu set
64ad81b28a4c:pid259: PSM3 can't open nic unit: 0 (err=23)
PMIx Log Report:[259]: (nic/PSM)[259]: PSM3 can't open nic unit: 0 (err=23)
64ad81b28a4c:pid259.python3: Failed to get eth0 (unit 0) cpu set
64ad81b28a4c:pid259: PSM3 can't open nic unit: 0 (err=23)
PMIx Log Report:[259]: (nic/PSM)[259]: PSM3 can't open nic unit: 0 (err=23)

The program is stuck from there on. All of the above steps work fine with the image fedora:37.

Here are some infos on my OS:

$ uname -a
Linux ######### 6.2.0-34-generic #34~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Thu Sep  7 13:12:03 UTC 2 x86_64 x86_64 x86_64 GNU/Linux
@mattdm
Copy link
Member

mattdm commented Oct 9, 2023

I just tried this on Fedora Linux 38 (and with podman and instead of docker) and the basic import works fine with no errors.

I notice you are installing both the Fedora-provided python3-mpi4py-openmpi and mpi4py via pip. I tried both, and both seem fine.

The errors you see seem to be something to do with accessing the NIC — or is that a red herring?

@spossann
Copy link
Author

spossann commented Oct 10, 2023

Thanks for the quick reply. To be clear: import mpi4py works for me, what does not work is

from mpi4py import MPI

I clarified this in the issue title. The namespace of mpi4py looks like this:

Python 3.11.5 (main, Aug 28 2023, 00:00:00) [GCC 13.2.1 20230728 (Red Hat 13.2.1-1)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import mpi4py
>>> dir(mpi4py)
['Rc', '__all__', '__author__', '__builtins__', '__cached__', '__credits__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__path__', '__spec__', '__version__', 'get_config', 'get_include', 'profile', 'rc']

and

>>> dir(mpi4py.__all__)
['__add__', '__class__', '__class_getitem__', '__contains__', '__delattr__', '__delitem__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__getstate__', '__gt__', '__hash__', '__iadd__', '__imul__', '__init__', '__init_subclass__', '__iter__', '__le__', '__len__', '__lt__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__reversed__', '__rmul__', '__setattr__', '__setitem__', '__sizeof__', '__str__', '__subclasshook__', 'append', 'clear', 'copy', 'count', 'extend', 'index', 'insert', 'pop', 'remove', 'reverse', 'sort']

I also tried removing python3-mpi4py-openmpi but the error persists.

@spossann spossann changed the title mpi4py crashes on import in fedora:38 mpi4py.MPI crashes on import in fedora:38 Oct 10, 2023
@mattdm
Copy link
Member

mattdm commented Oct 10, 2023

from mpi4py import MPI as in your example above is exactly what I did in attempting to duplicate your issue. Sorry that wasn't clear.

@richardvanderburgh
Copy link

Adding these environment variables seemed to resolve this issue for me.
OMPI_MCA_pml=ob1
OMPI_MCA_btl=tcp,self

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants