-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Help: slurm cluster example #12
Comments
@shikokuchuo , what's the most direct way of launching mirai workers on a set of hosts over SSH when we have a vector of local hostnames? The gist is that with Slurm you can submit a job requesting say 50 tasks (="workers") that Slurm may reserve slots for across multiple hosts, e.g. sbatch --ntasks=50 my_script.sh This will result in hostnames <- parallelly::availableWorkers()
library(mirai)
daemons(
url = host_url(),
remote = ssh_config(remotes = paste0("ssh://", hostnames))
) If that works, then: plan(future.mirai::mirai_cluster) should make Futurverse resolve futures via that cluster of mirai workers. |
The below code is using the Couple of points/questions
|
Making a bit more progress... while the above statements around I have been experimenting with removing So, using the patch in the
Maybe this is something that @shikokuchuo could integrate into mirai ? I have to admit I really don't like the idea of having temporary files created but it seems that both the size and number of files is very small and hence the execution speed is practically not affected at all. |
@michaelmayer2 thanks for investigating. I'll take a closer look at the shell quoting behaviour of |
Michael, in build 9001 (39ce672) the shell quoting is updated so the argument passed to Rscript is wrapped in single rather than double quotes. This used to be the case in You may test with the R-Universe dev build: install.packages("mirai", repos = "shikokuchuo.r-universe.dev") I hope this helps with SLURM, but even if not I believe it is safer to shell quote in this way - it may avoid other corner cases. If it doesn't solve the SLURM issue, I have a couple of other ideas, although from the man page for |
@shikokuchuo - thanks so much for looking into this, Charlie ! I tried with the latest changes but I am sorry to report it is still not working... The crucial bit really seems to be the In order to better demonstrate what is going on I have replaced this line with a
I then checked the following use cases
See detailed results below, the gist is that 1a and 2b work while 1b and 2a fail. And this is caused by a different behaviour of While
I am not sure how to go from here. Happy to supply more information as needed. Maybe we can make the problematic Case 1a - classic
Case 1b classic
Case 2a
Case 2b
|
Thank you both. I've gone through quote a few of these quote-or-not-to-quote and nested-quoting issues in the parallelly package. It grew out of different needs to launch parallel R workers locally, remotely, in Linux containers, over SSH, over PS. @michaelmayer2, the canonical way to get the location of current |
I just checked the parallelly code; it suffers from the same problem. I'll see if there's workaround/hack or if I have to update the package. |
Little more progress on comparing future.mirai to the cluster backend... https://pub.current.posit.team/public/future_mirai/. Really amazing how much more scalable mirai is compared to the good old (monolithic) PSOCK cluster. |
Hello,
{future.mirai} is a dream :) Any chance we could get a minimal working example for getting this to work on a slurm cluster? I am struggling to connect the dots.
Do I need to set up the daemons manually? https://shikokuchuo.net/mirai/reference/daemons.html?
Thanks to all authors for making this happen :)
The text was updated successfully, but these errors were encountered: