-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Featuer Idea] Fake worker #50
Comments
Hi @jmmshn, thanks for the interest in the code and for mentioning this use case. I have a few comments:
|
Thanks for the info!
OK, I understand things better now. I guess from my POV, whether a SLURM job can access the DB is not super critical as long as long as the RUNNER and WORKER are on the same side of the SSH login (I really don't know how common of a problem this is since many clusters are moving into heavy 2FA configurations). The code change I was suggesting was just some simple options for the type of WORKER and runners to make sure config and check functionality works but it should not be influential to how the code functions at all. |
This brings up another usage scenario question: I have another cluster that is not able to talk to my main DB so this seems like the exact situation you had in mind. But often the atomate2 jobs require access to the DB pass data between each other. The atomate2 config will just call a I've skimmed the docs and code but cannot figure out how this is suppose to work with atomate2. Is there a basic configuration somewhere that I can look at? |
I am not entirely sure if I understand correctly what you mean. If you are referring to the fact that some atomate2 Jobs have references to the outputs of other Jobs already completed (e.g. a NSCF calculation needs the outputs of a previous SCF calculation), this is already handled automatically. All the referenced are resolved by the Runner before uploading the files to the worker. In the same way the outputs are written to a file, retrieved by the Runner and inserted into the database. In this case there is nothing to set up. Does this answer your question? |
OK I think I'm almost at the eureka moment :)
While most of the base Vasp wfs use something like
Since some of my workflows rely heavily on the S3Store I wanna make sure I understand this really well so I can figure out if everything is feasible. |
Thanks for clarifying the issue. Your questions highlighted that these details really need to be addressed in the documentation.
So basically, whichever maggma Store you use in your configuration, it is never accessed from the running Job, but only from the Runner. I hope this better clarifies how the code works. Let me know if you have more specific question. To better understand your user case, when you refer to |
Great I think I get it now!
Correct, using the charge density for subsequent steps is a kind of niche application, but many other workflows store the charge density so the need for copying the serialized charge density either to or from the WORKER should be universal. From what I gather, as long as the data serializes properly in a JSON file then everything should be fine. I will test it and see. |
To add a bit more information, we have used this together with maggma's |
Hi @gpetretto et al,
Thanks so much for open-sourcing this code, this is fantastic!
I have a couple of questions / (too specific) problems.
Implement support for ControlMaster / ControlPath / etc paramiko/paramiko#852
In the current setup, I don't see anything about setting up just a USER on the local computer and having RUNNER + WORKER on the cluster.
Having something like this will effectively solve both problems for me since the local worker just has to talk to DBs without needing to access anything via SSH. Also, the cluster can now just keep submitting a job that just says "get a job and run it."
I poked around the code and it looks like this should be possible but it will be nice to have a "dummy WORKER" that just acts as a placeholder and just has USER and RUNNER talk to each other on local and have REMOTE and WORKER talk inside of each SLURM job. I guess you can have USER+RUNNER+REMOTE on the cluster and USER+RUNNER+(fake worker) on local. Then you can perform all the JFR operation on the cluster but still get manipulate data and submit jobs on local.
Let me know if this makes sense and I'm willing to start working on this as PR.
The text was updated successfully, but these errors were encountered: