-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RelaxWorkChain should use restart_mode restart when max_wall #968
Comments
Thanks @AndresOrtegaGuerrero! Could you explain why you think the aiida-quantumespresso/src/aiida_quantumespresso/workflows/pw/base.py Lines 432 to 449 in 5db3b28
Which does update the input structure for the next calculation, but restarts "from scratch", i.e. sets the following inputs: aiida-quantumespresso/src/aiida_quantumespresso/workflows/pw/base.py Lines 281 to 285 in 5db3b28
I suppose it might be more efficient to restart from the charge density/wave functions, but at the time I wasn't sure why the restart was set up this way, and hence felt it was better to sacrifice some efficiency instead of potentially breaking things or (even worse) producing incorrect results. |
@mbercx So in case you are doing a long geometry optimization , and if QE did a max_walltime save, then QE can do a proper restart and continue the geometry optimization or even the SCF. However this only works if QE did the save correctly (So it is important to check in the output if indeed this was done). I have been doing optimizations with DFT+U (and using the workchain) and i noticed that without the proper restart the simulations can increase the consumption of time and resources as well. On the other hand it can also make a simulation that was relatively converging to a whole new one where it will take time to reach convergence. If the system has more than 100 atoms, then is quite important a proper restart , especially if the simulations was converging |
Yeah, I can see how for larger structures it can make a significant difference.
The error handler above only handles cases where
So in principle this should indicate a clean shutdown. I'd be inclined to accept switching to a full restart here with |
@mbercx , it is important to guarantee that indeed qe did the clean shutdown, sometimes this shutdown doesnt take place (maybe because the time is not enough, or is an expensive calculation like a hybrid ). |
@AndresOrtegaGuerrero, do you think more discussion is required? I didn't see big problem with set |
I will try to use this restart only for the Relax workchain , and only when there is a MAX_WALLTIME |
If QE did a clean shutdown, I'm also fine with doing a full restart. @AndresOrtegaGuerrero I can quickly open a PR for this if you're not already doing so?
The |
I want to highlight this. It is possible, and I have seen it often in practice, where QE is initializing a soft-shutdown because the walltime was exceeded (and so it prints |
Thanks @sphuber, I see your point. But we already distinguish between aiida-quantumespresso/src/aiida_quantumespresso/parsers/pw.py Lines 159 to 162 in b517686
I suppose it's safe to assume that in case |
That depends, if |
It would be nice to include a check if "JOB DONE" is printed, in case you have big system and a hybrid functional , then you can guarantee you have the wfn to do a restart , (even in a |
This is already done, and in case it is missing aiida-quantumespresso/src/aiida_quantumespresso/parsers/parse_raw/pw.py Lines 319 to 325 in b517686
I now notice that this is not done in case there is a
I'm not 100% sure either, but it seems pretty reasonable, since the following line is printed before this:
So if the calculation would be interrupted during writing, I think the final line in the |
Indeed, otherwise say there is a recoverable error (e.g. Cholesky), the stdout incomplete error would be exposed, hence making it unrecoverable. Nevertheless, if the run is interrupted while writing, there will be not CRASH file, as it is written only by QE when encounters an error.
Indeed, so it would be unrecoverable. Although, one might try to restart e.g. using the charge density only as a compromise, as it will probably the first and fastest thing that QE would dump. But nevertheless, wouldn't be worth it to try restarting either way? Or bad things would happen? |
This of course depends on how the
I was also wondering about this. If a file is corrupted, will QE crash when trying to read it, or simply ignore the file? |
That's true. I think in order to avoid issues we decided to opt for this solution, so that we were certain that it wouldn't return the |
The RelaxWorkChain should do a restar_mode = 'restart' when the max wall is reach, instead is starting a simulations from scratch
The text was updated successfully, but these errors were encountered: