Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add error handlers to check for chgcar, aeccar integrity #93

Open
mkhorton opened this issue Mar 25, 2019 · 12 comments
Open

Add error handlers to check for chgcar, aeccar integrity #93

mkhorton opened this issue Mar 25, 2019 · 12 comments

Comments

@mkhorton
Copy link
Member

mkhorton commented Mar 25, 2019

See atomate PR hackingmaterials/atomate#276

Needs further investigation; @jmmshn could you attach some sample bad outputs? Suspect it may be an mpi-related issue where lines are getting transposed.

@jmmshn
Copy link
Contributor

jmmshn commented Mar 25, 2019

I will add some kind of simple smoothness check for the CHGCAR and AECCAR.
I have only seen the AECCAR exhibit this problem (picture below)

Screen Shot 2019-03-25 at 12 52 31 PM

@mkhorton
Copy link
Member Author

What iso-surface level is that screenshot taken at? Almost looks like the grid points are off by one... Is there any correlation between grid dimension and this issue occurring, or is it random?

@shyuep
Copy link
Member

shyuep commented Mar 25, 2019

Just to be clear - the role of custodian is not to catch things which cannot be fixed. I am not sure what is causing this smoothness issue, but unless there is a recipe to be taken, this is not something for cusotdian.

@jmmshn
Copy link
Contributor

jmmshn commented Mar 25, 2019

I haven't collected statistics on the success rates but rerunning the calculations seemed to fix some of these.

@mkhorton
Copy link
Member Author

mkhorton commented Mar 25, 2019

@shyuep Surely it needs to trigger a NonRecoverableError? In my mind it would be bad for Custodian to raise no errors for a calculation that has produced bad output.

@shyuep
Copy link
Member

shyuep commented Mar 25, 2019

@mkhorton It depends on the definition of "bad output". CHGCARs are somewhat optional in a lot of the analysis that we do. E.g., if you are just relaxing a structure or just want an energy. If this is a desired check, I would certainly put it as a ChgcarValidator - not a Handler, which implies real-time checking. And this validator would be one where only people who care about CHGCARs would bother using.

@mkhorton
Copy link
Member Author

Ok, that sounds fair.

@jmmshn
Copy link
Contributor

jmmshn commented Mar 25, 2019

OK sound good I will add an example "bad" output and the validator.
I would still like to keep the checks on the atomate side as a fail-safe.

@shyuep
Copy link
Member

shyuep commented Mar 25, 2019

@jmmshn @computron It is up to you since I don't want to impose my will on atomate, but my general advice is that software checks should be performed in ONE place. I generally do not like duplicating checks within or across codes. That duplication merely serves to create headaches - some new student fiddles with the atomate checks and forgets to make the same changes to the custodian validators. It is fine if you want atomate to check a different set of things that are complementary to custodian checks. But the same check should not be in two places.

As a general principle, MP-related stuff should be on the basis on "one source of truth". For input sets, that should be pymatgen. For run-time error handling, that should be custodian. For workflows, that would be atomate.

@mkhorton
Copy link
Member Author

One solution here might be to write a ChgcarValidator in custodian, but to explicitly import that and use it in the atomate drone to validate when chgcar ingestion is attempted. That way code implementation is not duplicated.

@shyuep
Copy link
Member

shyuep commented Mar 25, 2019

That is fine. Though I would have thought the ChgcarValidator would simply be passed into the custodian call in atomate?

@jmmshn
Copy link
Contributor

jmmshn commented Mar 25, 2019

I think the concern was about ingestion of old tasks into MP releases. For that, I thank what Matt suggested makes sense.

I have a simple test in the atomate PR that just checks for positive definiteness of the AECCAR which I will replace with a more sophisticated check after I update custodian.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants