Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test on blake failed for a long time #832

Open
bartgol opened this issue Sep 12, 2022 · 5 comments
Open

Test on blake failed for a long time #832

bartgol opened this issue Sep 12, 2022 · 5 comments

Comments

@bartgol
Copy link
Collaborator

bartgol commented Sep 12, 2022

Following up on Irina's email, I noticed that some blake tests have failed for ages. This one, for instance, started failing on June 8th, and has failed ever since.

I looked at our PR history, but no PR was merged around that day. However, I think we can push to master, so someone might have pushed something straight to master. Also, I don't recall if there was a system upgrade/change around then.

The fail is in the response check:

Response 0: Solution Average

                    -6.995108075095e+00
Response Test 0: -6.995108075095e+00 != -7.005509894455e+00 (rel 1.000000000000e-05 abs 1.000000000000e-03)

and it's a relative change of 1.48e-3.

@mperego what are your thoughts?

@ikalash
Copy link
Collaborator

ikalash commented Sep 12, 2022

Are those the performance tests (I can't see CDash while overseas)? I believe @jewatkins was monitoring these for some time but I'm not sure what happened. I agree about changing / deactivating the tests if they are going to fail.

@jewatkins
Copy link
Collaborator

These are the same failing tests discussed in #712 and I think the same issue remains. We'd probably have to increase the tolerance to 1e-2 because the GPU tests might still give the same result.

@bartgol
Copy link
Collaborator Author

bartgol commented Sep 12, 2022

To be more general, I suspect at some point we should use test values that are mach-specific. In general, we can't expect solution to be the same across archs. Yes, we are using some tolerance, but unless nonlinear tolerances are ridicolously low, tiny residual might still mean not-so-tiny solution diffs (depending on pb conditioning).

OTOH, a mach-specific baseline/test-value is supposed to give us always the same value (unless ranks are changed, or trilinos impl change, or some part of the code uses randomized stuff).

@jewatkins
Copy link
Collaborator

In this particular case, it's odd to me that the result was the same on both blake and weaver and then suddenly differed on blake-only. But I'd be okay with machine specific tests since that is what E3SM does. At which point, we could tighten tolerances. We should decide what we would like to do for E3SM integration and follow suit.

@bartgol
Copy link
Collaborator Author

bartgol commented Sep 12, 2022

Right, with mach-specific expected values, we can be more strict, and be more robust against asnwer-changing mods.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants