Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CAM-diagnostics interpretes regridded SE output files as original SE grid. #133

Open
oyvindseland opened this issue Feb 9, 2024 · 18 comments
Assignees
Labels
Next release To be included in next release

Comments

@oyvindseland
Copy link

Issue Type

Other (please describe below)

Issue Description

I have tried to run cam-diagnostics on the simulation found at Betzy: /cluster/work/users/mvertens/archive/NB1850proto01
The simulations are with SE dycore with output regridded to FV 0.9x1.25 degree grid

The diagnostics simulation fails with the error message:(0) unstructured_to_ESMF: latitude and longitude must have the same number of elements: /cluster/work/users/oyvinds/diagnostics/out/CAM_DIAG/config/NB1850proto01/logs/out_240208_153621.log
Prior to the fail, the averaged files are given an SE name e.g /cluster/work/users/oyvinds/diagnostics/out/CAM_DIAG/climo/NB1850proto01/sav_se/NB1850proto01_01_000201_001101_climo_SE.nc
Another possible point of failure is that the variable name used for latitude weights are w, not gw as used to be the standard in FV.

Possible test: A SE simulation of 14 months or more to see if the diagnostics tool can manage SE grid output.

Will this change answers?

No

Will you be implementing this yourself?

No

@oyvindseland oyvindseland added the Next release To be included in next release label Feb 9, 2024
@oyvindseland oyvindseland changed the title CAM-diagnostics interprete regridded SE output files as original SE grid. CAM-diagnostics interpretes regridded SE output files as original SE grid. Feb 9, 2024
@mvertens
Copy link

@oyvindseland - can you summarize the exact version of the diagnostic package you were using. I plan to contact the NCAR folks to see how they handle this.

@oyvindseland
Copy link
Author

I did not create the set-up but as far as I can see it is Script Version: 140804
I checked the svn site and it looks like the most recent svn release. https://svn-ccsm-release.cgd.ucar.edu/model_diagnostics/atm/cam/
revision 231

@oyvindseland
Copy link
Author

Did you run a simulation with original SE output as well? @mvertens

@oyvindseland
Copy link
Author

Checked the version on Nird and it is the same as on Betzy.

@mvertens
Copy link

@oyvindseland - I have not run a simulation with just SE output yet. We are still moving and everything is totally chaotic today. I'll start one tomorrow.

@oyvindseland
Copy link
Author

No worries, I do not sit around waiting for it.

@mvertens
Copy link

@gold2718 - could you please help with this as well?

@oyvindseland
Copy link
Author

Information about diagnostics can be found at https://noresm-docs.readthedocs.io/en/noresm2/diagnostics/diagnostics.html

On betzy the command is /cluster/shared/noresm/diagnostics/noresm/bin/diag_srun

@oyvindseland
Copy link
Author

Default amwg script at /cluster/shared/noresm/diagnostics/noresm/packages/CAM_DIAG
Before it actually runs the scripts it is by default copied to
/cluster/work/users/$user/diagnostics/out/CAM_DIAG/config/$CASENAME/run_scripts
Path can be changed by the script and it can also create the scripts without running.

@mvertens
Copy link

@oyvindseland - @gold2718 has forked the repository and I have downloaded it to /cluster/shared/noresm/diagnostics/noresm_dev on betzy. I would like first to reproduce your error. What was your command to diag_srun that resulted in this failure?

@oyvindseland
Copy link
Author

Command that failed

/cluster/shared/noresm/diagnostics/noresm/bin/diag_srun -m cam -i /cluster/work/users/mvertens/archive -c NB1850proto01 -s 2 -e 11

@mvertens
Copy link

So I changed all the variable name from w -> gw in all the cam history files. Now It is dying with the following error:
nco_err_exit(): ERROR Short NCO-generated message (usually name of function that triggered error): nco_get_var1()
nco_err_exit(): ERROR Short NCO-generated message (usually name of function that triggered error): nco_get_var1()
nco_err_exit(): ERROR Error code is 12. nco_err_exit(): ERROR Error code is 12. Translation into English with nc_strerror(12) is "Cannot allocate memory"
Translation into English with nc_strerror(12) is "Cannot allocate memory"
ERROR: nco_get_var1() failed to nc_get_var1() variable "time_bnds"
nco_err_exit(): ERROR NCO will now exit with system call exit(EXIT_FAILURE)
ERROR: nco_get_var1() failed to nc_get_var1() variable "time_bnds"
nco_err_exit(): ERROR NCO will now exit with system call exit(EXIT_FAILURE)
I believe that the version we are using of the CAM diagnostic package is no longer compatible with the CAM history output for the development code.

@mvertens
Copy link

So I scrubbed everything and tried again - and got totally different errors. See /cluster/work/users/mvertens/diagnostics/logs/-diagsrun-240213-194000.log. @oyvindseland - can you try running the script again and see if you get anything different.

@oyvindseland
Copy link
Author

I reran the script and also got an OOM error.
I do not think I have seen an out of memory issue in the diagnostics before so I do not understand why this is.
Just need to ask for more memory in the script?
I should add though that I rarely use the script on Betzy but on Nird.

I did copy year 2 and 3 of your output files, renamed gw and ran the amwg script without the wrapper.

In this case the script runs but have only relatively limited output. The output claims that the variable hyam is missing
fatal:["Execute.c":6394]:variable (hyam) is not in file (inptr)
Also did the same for your set-up and got the same result, some plots and the "hyam error"

Plots: https://ns2345k.web.sigma2.no/diagnostics/noresm/oyvinds/NB1850proto01/
For comparison 20 years of CMIP6 piControl:
https://ns2345k.web.sigma2.no/diagnostics/noresm/oyvinds/N1850frc2_f09_tn14_20191001/CAM_DIAG/

@mvertens
Copy link

@oyvindseland - I think the problem is that on betzy the wrapper is submitted to the preproc queue which is a shared memory batch node. So depending on who else is using it will limit the memory available. This explains I think why the OOM appeared in different places each time the wrapper was submitted on betzy. When you just run the script itself interactively you are using the shared memory of the login node. I think running on Nird is probably better. BTW - I changed the variable from w -> gw in all of the files.
The fact that the variable is denoted as missing which is not on the input file is problematic.
@gold2718 - where are the latest version(s) of the CAM diagnostic packages. Is anything available on github at this point?

@oyvindseland
Copy link
Author

On nird the script runs without OOM but the hyam problem is still the same.

@oyvindseland
Copy link
Author

A test with native grid output created the same plots as the coupled simulation. The definition of vertical levels, hyam and hybm are still missing from the averaged files.
https://ns2345k.web.sigma2.no/diagnostics/noresm/oyvinds/NF2000proto01/

The interpolation of SE onto a lat-lon grid in the diagnostics fails, see e.g.
https://ns2345k.web.sigma2.no/diagnostics/noresm/oyvinds/NB1850proto01/yrs2to3-obs/set5_6/set5_ANN_LWCF_ERBE_obsc.png
vs
https://ns2345k.web.sigma2.no/diagnostics/noresm/oyvinds/NF2000proto01/yrs1to1-obs/set5_6/set5_ANN_LWCF_ERBE_obsc.png

@oyvindseland
Copy link
Author

I looked around at the amwg website and I found some diagnostics plot with SE and 48 Levels so it should be possible if we need to use the ncl diagnostics
The simulations were relatively old (2021)
https://webext.cgd.ucar.edu/FWscHIST/f.e21.FWscHIST_BGC.ne30_ne30_mg17_L48_revert-J.001/atm/

The table that linked in the simulations did not say who created the plots or did the simulations
https://docs.google.com/spreadsheets/d/1nSTQ9tscsqeLhy3fhytW_ko1wLjydYqa5ZRGThLP2K8/edit#gid=1338712341

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Next release To be included in next release
Projects
Status: Todo
Development

No branches or pull requests

2 participants