Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Proposal]: Better handling of MD-type jobs where a termination may not be a failure #2408

Open
tomdemeyere opened this issue Aug 10, 2024 · 1 comment
Labels
enhancement New feature or request

Comments

@tomdemeyere
Copy link
Contributor

tomdemeyere commented Aug 10, 2024

What new feature would you like to see?

This proposal aim to extend the work of the terminate() function by making it call a new failed_schema() function. Ideally this function should attempt to fetch the current available results. For example in the case of a timed-out MD, the traj and log should be read and put in a dictionary in the same way that this is already done.

The problem, as mentioned by @Andrew-S-Rosen is that:

To do what you're suggesting, we would need to try to prepare a schema, write it to disk, and then terminate. Output files will also not always be able to be parsed (e.g. if the calculator crashes instantly), and this would cause the schema generation to crash.

Indeed such function would need to be full or try/exception as no assumption is made on the current state of the calculation. My interpretation is that such function should not attempt to summarise results (no call to pymatgen etc...) but to barely read what is available: the calculation is not done. In the case of logfile and trajfile that's easy, the files are known. In the case of software specific files, it would be nice to come up with a solution to attempt to read them, for example by using #2407.

From the discussions in #2399

@tomdemeyere tomdemeyere added the enhancement New feature or request label Aug 10, 2024
@tomdemeyere tomdemeyere changed the title [Proposal]: create a "failed schema" that is written in case of job failure. [Proposal]: create a "failed_schema" summarising error and available results in case of job failure Aug 10, 2024
@Andrew-S-Rosen
Copy link
Member

Andrew-S-Rosen commented Aug 10, 2024

This is certainly doable. That said, I would propose that this behavior is toggleable via the global settings, such as SETTINGS.STORE_FAILED_JOBS: bool = False. There are two reasons for this: 1) cloud databases like MongoDB are often limited on space, and storing failed calculations may not be desirable; 2) storing the failed jobs to disk and/or database would be a fairly notable breaking change since anyone querying their database for calculation results would now have to add an additional query that only selects "successful" calculations. Silently storing failed outputs in the database will cause downstream problems for people, so this would be an opt-in feature.

It does not seem terribly difficult to implement. The idea would basically be to use a more flexible version of quacc.schemas.ase.Summarize.run to parse the code's main log file along with the input Atoms and calculator parameters. We would also need to store the job state for all jobs (success or failure) so this can be queried. If the parse is unsuccessful (say it fails immediately due to some weird input parameters), then there is not much to store other than the input Atoms and the calculator parameters along with the job state.

@Andrew-S-Rosen Andrew-S-Rosen changed the title [Proposal]: create a "failed_schema" summarising error and available results in case of job failure [Proposal]: Better handling of MD-type jobs where a termination may not be a failure Aug 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Development

No branches or pull requests

2 participants