-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PwBandsWorkchain.get_builder_from_protocol() not working on SGE machines #1010
Comments
Hi @stevenbos123 I've never used
The provided values are only used to demonstrate the idea, you would need to adapt them accordingly, based on your system. Hope that helps! |
Hi @t-reents, Thank you for the reply. It seems I did not provide enough information on my problem. Just like you suggested I specified the variables This poses a problem for SGE based systems as you do not provide the number of machines for them. Kind regards, Steven
|
Thanks for the additional details, @stevenbos123! You're right, the problem comes from the default protocols, that specify aiida-quantumespresso/src/aiida_quantumespresso/workflows/protocols/pw/base.yaml Lines 13 to 14 in d645069
If you provide the Important to note: Apparently, the current protocols are designed to work with Slurm out of the box, so you wouldn't necessarily need to provide the options. If you comment out the lines above, you would also always need to provide the corresponding slurm resources via the options (in case you want to run a slurm job). Another option would be to define scheduler dependent protocols, e.g.
You would also need to add the new protocols to the protocol files for I couldn't test the
even without specifying the options (as I moved them to the protocol in this example). |
@mbercx Without thinking too much about it, might it make sense to introduce some logic in the Maybe I'm also overlooking something. |
@stevenbos123 I realized that I made a small mistake in the first version of my comment, I just edited it. To make it easier, I adjusted the protocols on a local branch, in case you would like to use the second option with the new protocols. You can check it out here: https://github.com/t-reents/aiida-quantumespresso/tree/fix/sge_protocol If you clone it, you can change to the directory (remember to |
Thanks for the analysis @t-reents . I would personally be hesitant to put scheduler based logic in the protocols or even make scheduler specific protocols. You would have to start to do this everywhere and what happens if there is yet another scheduler plugin that you didn't cover yet. If the scheduler would implement some method to provide default resources that the workflow/protocol could just call blindly, that could work. But I don't think that exists yet in the scheduler interface. It is true that the current implementation is already scheduler specific due to our own SLURM background, but I think we should rather remove that and make it completely agnostic than start adding more scheduler specific logic. The protocols automate almost everything already. I think it is more than fair to ask the user to think at least about the resources they want to use and specify those. As long as that is properly documented of course. |
Thank you for joining @sphuber.
Yes, this is definitely true. The examples were mainly intended to show how @stevenbos123 could adapt the protocols, in case he prefers that, and to get his setup running. In general, I would also not vote for creating scheduler-specific protocols as one would also need to duplicate them for the different "levels", e.g. precision and fast.
This is what I was thinking about in my comment to Marnik, but I also don't think that it'd be worth to implement this/make things more complicated, for the single purpose of providing initial resources.
I think this is the most important point of this issue. So far, I've also expected that the protocols work just out of the box, and I've never thought about any scheduler dependency (as I think it's not explicitly mentioned anywhere, especially for new users). So to sum up, we could think of these two (simple) options, right?
|
Fully agree with the two options. Since they are not really mutually exclusive, we could already start with the first and simply improve the docs. We can then add a deprecation warning id no explicit resources are.provided saying this will start to error in the future. And then at some point we can remove the resource defaults to make the protocols scheduler agnostic |
Great! If you don't mind, I can open a PR in the upcoming days. |
Hi! Any update on this? We are also having this problem, I commented out the offending line in the protocol, but I wanted to know if there is a more long term solution in the pipeline. |
The PwBandsWorkchain.get_builder_from_protocol() does not support SGE scheduler. This makes it cumbersome to do calculations SGE based machines. See error message below. This happens when providing a code that is run on a SGE computer.
The text was updated successfully, but these errors were encountered: