Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error setting up slurm-in-docker #28

Open
adityakavalur opened this issue Sep 15, 2020 · 4 comments
Open

Error setting up slurm-in-docker #28

adityakavalur opened this issue Sep 15, 2020 · 4 comments

Comments

@adityakavalur
Copy link

adityakavalur commented Sep 15, 2020

Thanks for sharing this project.
I am trying to build the slurm-in-docker following the instructions to test the slurm cli features. However, the controller container keeps rebooting every few seconds. I started the containers without the -d flag to get more output, provided below. Any idea of what might be wrong?

controller    | 2020-09-15 17:38:51 Spawning 1 thread for encoding
controller    | 2020-09-15 17:38:51 Processing credentials for 1 second
controller    | 2020-09-15 17:38:52 Processed 23635 credentials in 1.000s (23632 creds/sec)
controller    | cheking for slurmdbd.conf
controller    | ### generate slurm.conf ###
controller    | sacctmgr: error: Malformed RPC of type PERSIST_RC(1433) received
controller    | sacctmgr: error: slurm_persist_conn_open: Failed to unpack persistent connection init resp message from database:6819
controller    | sacctmgr: error: slurmdbd: Sending PersistInit msg: No error
controller    | sacctmgr: error: Malformed RPC of type PERSIST_RC(1433) received
controller    | sacctmgr: error: slurm_persist_conn_open: Failed to unpack persistent connection init resp message from database:6819
controller    | sacctmgr: error: slurmdbd: Sending PersistInit msg: No error
controller    | sacctmgr: error: Malformed RPC of type PERSIST_RC(1433) received
controller    | sacctmgr: error: slurm_persist_conn_open: Failed to unpack persistent connection init resp message from database:6819
controller    | sacctmgr: error: slurmdbd: Sending PersistInit msg: No error
controller    | sacctmgr: error: slurmdbd: DBD_GET_CLUSTERS failure: No error
controller    |  Problem getting clusters from database.  Contact your admin.
controller exited with code 1

@sethidden
Copy link

sethidden commented May 11, 2021

I'm very late to the party here.
There are only 2 google results about this problem and by a hail mary, one of them solves the issue.

Go here https://www.ni-sp.com/slurm-build-script-and-container-commercial-support/ (and the archived version in case the link goes down and scroll to the very bottom

There it says

In case the controller constantly restarts with messages like <br>sacctmgr: error: Malformed RPC of type PERSIST_RC(1433) received<br>sacctmgr: error: slurm_persist_conn_open: Failed to unpack persistent connection init resp message from database:6819 :

sh teardown.sh
rm -rf home/worker/.ssh/*
sudo rm -rf secret/*
docker-compose up -d

Have a look at our other <a rel="noreferrer noopener" href="https://www.ni-sp.com/technical-guides-and-articles-around-nice-dcv-and-enginframe/" target="_blank">technical guides</a> related to NICE DCV and EnginFrame HPC and session management portal. If there are any questions please <a rel="noreferrer noopener" href="https://www.ni-sp.com/contact/" target="_blank">let us know</a>.</p>

@adityakavalur
Copy link
Author

Thanks for the reply! @3nuc
I ended up using a different repo for this (https://github.com/giovtorres/slurm-docker-cluster)

@sethidden
Copy link

Yeaah I also tried to set that one up. The one you posted doesn't have OpenMPI preinstalled though right?

@adityakavalur
Copy link
Author

Yes, it does not.
Although technically speaking, MPI is not necessary to get a slurm docker cluster running, so I'd consider it an add on. In my fork I simply install mpich in the Dockerfile using yum (https://github.com/adityakavalur/slurm-docker-cluster/blob/d54703ddcab9d456be4743dae0f51daf3d549df5/Dockerfile#L53) .
If you want OpenMPI I think you can try out openmpi or openmpi-devel.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants