Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GNU version of FATES SMS test resulting in differences to baseline with ctsm5.1.dev049 #1464

Closed
glemieux opened this issue Aug 19, 2021 · 4 comments
Assignees
Labels
closed: wontfix We won't fix this issue, because it would be too difficult and/or isn't important enough to fix investigation Needs to be verified and more investigation into what's going on.

Comments

@glemieux
Copy link
Collaborator

glemieux commented Aug 19, 2021

Brief summary of bug

Differences are being seen for SMS_Lm13.1x1_brazil.I2000Clm50FatesCru.cheyenne_gnu.clm-FatesColdDef on Cheyenne between ctsm5.1.dev048 and ctsm5.1.dev049. The intel version of the same test passes B4B.

General bug information

CTSM version you are using: ctsm5.1.dev049

Does this bug cause significantly incorrect results in the model's science? [Yes / No] Yes?

Configurations affected: Fates single-point tests with gnu compiler.

Details of bug

The differences appear to occur around the 9th time step of the run and are affecting a number of variables. In particular the AREA_ prefixed history outputs, which suggests that the calculation of patch%area may be the primary affected variable which then is bleeding into downstream calculations.

Important details of your setup / configuration so we can reproduce the bug

Note that this tag also introduced MOSART issue 45. Adding frivinp_rtm = '/dev/null' to the user_nl_mosart list and rebuilding the test is necessary to get it to run. I don't think this is (direct) cause of the issue as other 1x1_brazil fates grid tests pass b4b that have this same modification (although those are also intel based tests).

The comparison between the tags uses fates tag sci.1.46.2_api.16.1.0 which is the default when using checkout_externals.

@glemieux glemieux changed the title GNU version of FATES SMS test resulting in differences to baseline GNU version of FATES SMS test resulting in differences to baseline with ctsm5.1.dev049 Aug 19, 2021
@glemieux glemieux mentioned this issue Aug 19, 2021
5 tasks
@ekluzek ekluzek self-assigned this Aug 19, 2021
@ekluzek ekluzek added the investigation Needs to be verified and more investigation into what's going on. label Aug 19, 2021
@ekluzek
Copy link
Collaborator

ekluzek commented Aug 19, 2021

I've verified that I see this change and now have a baseline version of this test for both ctsm5.1.dev048 and ctsm5.1.dev049.

In comparing the namelists I only see the frivinp_rtm change that @glemieux mentions above and an explicit setting ofpio_netcdf_format = "64bit_offset". There are changes in module versions and a change in Macros.make.

@ekluzek
Copy link
Collaborator

ekluzek commented Aug 19, 2021

OK, I ran this test in ctsm5.1.dev049 with the same env_mach_specific.xml, Macros.cmake, and Macros.make as ctsm5.1.dev048 and I get identical answers again. So this change is just because of a compiler version update between the two tags. As such I don't think we need to worry about this, especially since gnu isn't our main compiler. As long as the difference is only because of roundoff differences in optimization it really doesn't matter. The only thing that would matter is if those optimization differences lead to substantially different answers. Because this is only showing up for this one test it's reasonable that isn't the case here.

@ekluzek
Copy link
Collaborator

ekluzek commented Aug 19, 2021

Looks like the compiler update came in with cime6.0.0, and this PR

ESMCI/cime#3985

@glemieux
Copy link
Collaborator Author

I ran a new ctsm5.1.dev046 baseline version of the same test mod with a different grid and then compared it to the same test setup using ctsm5.1.dev049. Doing so resulted in b4b comparison, so this seems like it might be specific to the 1x1_brazil grid for gnu only. I also ran a non-mosart compset dev046 baseline and dev049 case comparison for the 1x1_brazil grid to make sure it didn't have anything to do with the issue 45 workaround. This also failed the comparison with the same DIFF results.

I think that we can probably close this and label it wontfix given your comments, Erik and the fact that gridded gnu cases don't seem to be suffering the same issue.

@glemieux glemieux added the closed: wontfix We won't fix this issue, because it would be too difficult and/or isn't important enough to fix label Aug 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
closed: wontfix We won't fix this issue, because it would be too difficult and/or isn't important enough to fix investigation Needs to be verified and more investigation into what's going on.
Projects
None yet
Development

No branches or pull requests

2 participants