Test on blake failed for a long time #832

bartgol · 2022-09-12T16:01:40Z

Following up on Irina's email, I noticed that some blake tests have failed for ages. This one, for instance, started failing on June 8th, and has failed ever since.

I looked at our PR history, but no PR was merged around that day. However, I think we can push to master, so someone might have pushed something straight to master. Also, I don't recall if there was a system upgrade/change around then.

The fail is in the response check:

Response 0: Solution Average

                    -6.995108075095e+00
Response Test 0: -6.995108075095e+00 != -7.005509894455e+00 (rel 1.000000000000e-05 abs 1.000000000000e-03)

and it's a relative change of 1.48e-3.

@mperego what are your thoughts?

The text was updated successfully, but these errors were encountered:

ikalash · 2022-09-12T19:33:00Z

Are those the performance tests (I can't see CDash while overseas)? I believe @jewatkins was monitoring these for some time but I'm not sure what happened. I agree about changing / deactivating the tests if they are going to fail.

jewatkins · 2022-09-12T19:55:36Z

These are the same failing tests discussed in #712 and I think the same issue remains. We'd probably have to increase the tolerance to 1e-2 because the GPU tests might still give the same result.

bartgol · 2022-09-12T20:15:49Z

To be more general, I suspect at some point we should use test values that are mach-specific. In general, we can't expect solution to be the same across archs. Yes, we are using some tolerance, but unless nonlinear tolerances are ridicolously low, tiny residual might still mean not-so-tiny solution diffs (depending on pb conditioning).

OTOH, a mach-specific baseline/test-value is supposed to give us always the same value (unless ranks are changed, or trilinos impl change, or some part of the code uses randomized stuff).

jewatkins · 2022-09-12T20:28:34Z

In this particular case, it's odd to me that the result was the same on both blake and weaver and then suddenly differed on blake-only. But I'd be okay with machine specific tests since that is what E3SM does. At which point, we could tighten tolerances. We should decide what we would like to do for E3SM integration and follow suit.

bartgol · 2022-09-12T20:29:51Z

Right, with mach-specific expected values, we can be more strict, and be more robust against asnwer-changing mods.

jewatkins mentioned this issue Sep 12, 2022

Modify performance tests to be machine specific sandialabs/ali-perf-tests#1

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test on blake failed for a long time #832

Test on blake failed for a long time #832

bartgol commented Sep 12, 2022

ikalash commented Sep 12, 2022

jewatkins commented Sep 12, 2022

bartgol commented Sep 12, 2022

jewatkins commented Sep 12, 2022

bartgol commented Sep 12, 2022

Test on blake failed for a long time #832

Test on blake failed for a long time #832

Comments

bartgol commented Sep 12, 2022

ikalash commented Sep 12, 2022

jewatkins commented Sep 12, 2022

bartgol commented Sep 12, 2022

jewatkins commented Sep 12, 2022

bartgol commented Sep 12, 2022