-
Notifications
You must be signed in to change notification settings - Fork 119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[production/AQM.v7] concatenate_nexus_post_split.py - unstable output file size #775
Comments
@chan-hoo Do you remember when did we merge the updated concatenate_nexus_post_split.py into the workflow? |
@JianpingHuang-NOAA I think it had to be around December of last year |
I just did a test using the community code with 72 hour forecasts every 6 hourly cycles and didn't see an issue path on wcoss here: -rw-r--r-- 1 barry.baker emc 4.0G May 4 15:03 2023040100/INPUT/aqm.t00z.NEXUS_Expt.nc
-rw-r--r-- 1 barry.baker emc 4.0G May 4 15:03 2023040106/INPUT/aqm.t06z.NEXUS_Expt.nc
-rw-r--r-- 1 barry.baker emc 4.0G May 4 15:03 2023040112/INPUT/aqm.t12z.NEXUS_Expt.nc
-rw-r--r-- 1 barry.baker emc 4.0G May 4 15:03 2023040118/INPUT/aqm.t18z.NEXUS_Expt.nc
-rw-r--r-- 1 barry.baker emc 4.0G May 4 15:02 2023040200/INPUT/aqm.t00z.NEXUS_Expt.nc
-rw-r--r-- 1 barry.baker emc 4.0G May 4 15:02 2023040206/INPUT/aqm.t06z.NEXUS_Expt.nc
-rw-r--r-- 1 barry.baker emc 4.0G May 4 15:02 2023040212/INPUT/aqm.t12z.NEXUS_Expt.nc
-rw-r--r-- 1 barry.baker emc 4.0G May 4 15:03 2023040218/INPUT/aqm.t18z.NEXUS_Expt.nc
-rw-r--r-- 1 barry.baker emc 4.0G May 4 15:03 2023040300/INPUT/aqm.t00z.NEXUS_Expt.nc
-rw-r--r-- 1 barry.baker emc 4.0G May 4 15:03 2023040306/INPUT/aqm.t06z.NEXUS_Expt.nc
-rw-r--r-- 1 barry.baker emc 4.0G May 4 15:02 2023040312/INPUT/aqm.t12z.NEXUS_Expt.nc
-rw-r--r-- 1 barry.baker emc 4.0G May 4 15:03 2023040318/INPUT/aqm.t18z.NEXUS_Expt.nc
-rw-r--r-- 1 barry.baker emc 4.0G May 4 15:03 2023040400/INPUT/aqm.t00z.NEXUS_Expt.nc
-rw-r--r-- 1 barry.baker emc 4.0G May 4 15:03 2023040406/INPUT/aqm.t06z.NEXUS_Expt.nc
-rw-r--r-- 1 barry.baker emc 4.0G May 4 15:03 2023040412/INPUT/aqm.t12z.NEXUS_Expt.nc
-rw-r--r-- 1 barry.baker emc 4.0G May 4 15:03 2023040418/INPUT/aqm.t18z.NEXUS_Expt.nc
-rw-r--r-- 1 barry.baker emc 4.0G May 4 15:03 2023040500/INPUT/aqm.t00z.NEXUS_Expt.nc
-rw-r--r-- 1 barry.baker emc 4.0G May 4 15:03 2023040506/INPUT/aqm.t06z.NEXUS_Expt.nc
-rw-r--r-- 1 barry.baker emc 4.0G May 4 15:02 2023040512/INPUT/aqm.t12z.NEXUS_Expt.nc
-rw-r--r-- 1 barry.baker emc 4.0G May 4 15:03 2023040518/INPUT/aqm.t18z.NEXUS_Expt.nc
-rw-r--r-- 1 barry.baker emc 4.0G May 4 15:03 2023040600/INPUT/aqm.t00z.NEXUS_Expt.nc
-rw-r--r-- 1 barry.baker emc 4.0G May 4 15:03 2023040606/INPUT/aqm.t06z.NEXUS_Expt.nc
-rw-r--r-- 1 barry.baker emc 4.0G May 4 15:03 2023040612/INPUT/aqm.t12z.NEXUS_Expt.nc
-rw-r--r-- 1 barry.baker emc 4.0G May 4 15:03 2023040618/INPUT/aqm.t18z.NEXUS_Expt.nc
-rw-r--r-- 1 barry.baker emc 4.0G May 4 15:03 2023040700/INPUT/aqm.t00z.NEXUS_Expt.nc
-rw-r--r-- 1 barry.baker emc 4.0G May 4 15:03 2023040706/INPUT/aqm.t06z.NEXUS_Expt.nc
-rw-r--r-- 1 barry.baker emc 4.0G May 4 15:02 2023040712/INPUT/aqm.t12z.NEXUS_Expt.nc
-rw-r--r-- 1 barry.baker emc 4.0G May 4 15:03 2023040718/INPUT/aqm.t18z.NEXUS_Expt.nc
-rw-r--r-- 1 barry.baker emc 4.0G May 4 15:02 2023040800/INPUT/aqm.t00z.NEXUS_Expt.nc
-rw-r--r-- 1 barry.baker emc 4.0G May 4 15:03 2023040806/INPUT/aqm.t06z.NEXUS_Expt.nc
-rw-r--r-- 1 barry.baker emc 4.0G May 4 15:03 2023040812/INPUT/aqm.t12z.NEXUS_Expt.nc
-rw-r--r-- 1 barry.baker emc 4.0G May 4 15:02 2023040818/INPUT/aqm.t18z.NEXUS_Expt.nc
-rw-r--r-- 1 barry.baker emc 4.0G May 4 15:03 2023040900/INPUT/aqm.t00z.NEXUS_Expt.nc
-rw-r--r-- 1 barry.baker emc 4.0G May 4 15:03 2023040906/INPUT/aqm.t06z.NEXUS_Expt.nc
-rw-r--r-- 1 barry.baker emc 4.0G May 4 15:03 2023040912/INPUT/aqm.t12z.NEXUS_Expt.nc
-rw-r--r-- 1 barry.baker emc 4.0G May 4 15:03 2023040918/INPUT/aqm.t18z.NEXUS_Expt.nc
-rw-r--r-- 1 barry.baker emc 4.0G May 4 15:03 2023041000/INPUT/aqm.t00z.NEXUS_Expt.nc
-rw-r--r-- 1 barry.baker emc 4.0G May 4 15:03 2023041006/INPUT/aqm.t06z.NEXUS_Expt.nc |
The error check has already been added to the command line:
|
This issue has not been tested yet. We need to merge the latest package and test it with both AQM realtime parallel and ecflow. |
Yes, I agree.
…On Tue, May 16, 2023 at 9:34 AM lgannoaa ***@***.***> wrote:
This issue has not been tested yet. We need to merge the latest package
and test it with both AQM realtime parallel and ecflow.
Please keep it open.
—
Reply to this email directly, view it on GitHub
<#775 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ANA2PI22EJ2E2VH36H6CBDDXGN65RANCNFSM6AAAAAAXUZLCQY>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
…fs-community#775) * update input namelist of chgres_cube * update diag_table templates * update scripts * back to original * specify miniconda version on Jet
* Bug fix with FIELD_TABLE_FN * Modify crontab management, use config_defaults.sh. * Add status badge. * Update cheyenne crontab management. * source lmod-setup * Add main to set_predef_grid * Bug fix in predef_grid * Don't import dead params. * Fix bug in resetting VERBOSE * Minor fix in INI config. * Construct var_defns components from dictionary. * Allow also lower case variables to be exported. * Updates to python workflow due to PR ufs-community#776 * Use python versions of link_fix and set_FV3_sfc in job script. * Use python versions of create_diag/model. * Some fixes addressing Christina's suggestions. * Delete shell workflow * Append pid to temp files. * Update scripts to work with the latest hashes of UFS_UTILS and UPP (ufs-community#775) * update input namelist of chgres_cube * update diag_table templates * update scripts * back to original * specify miniconda version on Jet * Remove -S option from link_fix call. * Fixes due to merge * Cosmoetic changes. Co-authored-by: Chan-Hoo.Jeon-NOAA <[email protected]>
There is a finding that $HOMEaqm/sorc/arl_nexus/utils/python/concatenate_nexus_post_split.py utility generated corrupted NEXUS_Expt_combined.nc file. This file has normal size around 858MB. However, some tests have shown corrupted size 572MB.
Job nexus_post_split used concatenate_nexus_post_split.py to create NEXUS_Expt_combined.nc and used by $HOMEaqm/sorc/arl_nexus/utils/python/make_nexus_output_pretty.py within the same job. The make_nexus_output_pretty.py failed with AssertionError without generate output file NEXUS_Expt_pretty.nc. However, nexus_post_split completed without necessary exception handling.
Since the corrupted output and AssertionError was not handled by exception handling. The nexus_post_split completed. The forecast job failed not finding NEXUS_Expt_pretty.nc file. Error:
not exist or is not a file:
Recommend for developer to do stability test on concatenate_nexus_post_split.py utility.
Recommend for developer to patch exception handling in the ex-script of the job nexus_post_split - exregional_nexus_post_split.sh.
Machines affected
wcoss
Debug Output Saved in /lfs/h2/emc/global/noscrub/lin.gan/canned/concatenate_nexus_post_split_debug
aqm_nexus_post_split_00.o56988419-BAD (nexus_post_split job log that does should have failed due to bad output file with AssertionError)
NEXUS_Expt_combined.nc-BAD (The corrupted file)
aqm_nexus_post_split_00.o57049044-GOOD (a rerun of the nexus_post_split job log that created the same file with correct file size)
NEXUS_Expt_combined.nc-GOOD (The good output generated from rerun)
The text was updated successfully, but these errors were encountered: