-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RDMA failed to open device #19
Comments
Hello @NicholasRasi,
|
Hello @Madeeks,
the execution stucks (while it does not without On the other hand, I tried to run the following bash script:
The execution completed giving the following result:
As far as I understand the workers do not communicate. If I launch the application with
I get a similar result. I also ran a batch script with MVAPICH2 and the Sarus MPI hook
I did not get any error but the workers are separated as in the previous result. On my cluster I have MVAPICH2 2.3.4 while on the guide the recommended version is the MVAPICH2 2.2, do you think it can be a problem? Thank you |
Hello,
I am trying to run some MPI benchmarks with Sarus containers. In particular I am using OpenMPI 4.
Nodes are RDMA capable and have Infiniband. Everything works fine without the container and if I run
ibv_devinfo
on the host I got:But if I run it inside a container I got
Failed to open device
. So, I tried to mount the device with a bind but it does not work without sudo:On the other hand, it works with sudo and the device is recognized inside the container.
1. Is there any other way to mount the device without sudo?
The guide reports that I need to use the SSH hook in order to run OpenMPI.
But if I launch sarus with sudo, mount and srun:
I got:
2. If I use OpenMPI I need the SSH hook, am I right?
I have created the container with the following Dockerfile:
I am new to Sarus and HPC world, thank you for your support!
The text was updated successfully, but these errors were encountered: