Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clone, build, and run C48_ATM and C48_S2SW on Gaea C5 and C6 #3106

Merged
merged 41 commits into from
Jan 22, 2025

Conversation

DavidBurrows-NCO
Copy link
Contributor

@DavidBurrows-NCO DavidBurrows-NCO commented Nov 15, 2024

Description

What:
Correct build/run for C48_ATM and C48_S2SW on Gaea C5. Add build and run capability for C48_ATM, C48_S2SW, and C96_atm3DVar on Gaea C6.
Why:
After the C5 OS upgrade, submodules no longer built in the global-workflow. This PR correct that and adds build/run capability to C6.

Resolves #3011
Depends on:
ufs-community/ufs-weather-model#2448
ufs-community/UFS_UTILS#995
NOAA-EMC/gfs-utils#87
NOAA-EMC/UPP#1070
NOAA-EMC/GSI#800
NOAA-EMC/GSI-utils#55
NOAA-EMC/GSI-Monitor#146
NOAA-EMC/GDASApp#1361

Type of change

  • Bug fix (fixes something broken)
  • New feature (adds functionality)

Change characteristics

How has this been tested?

C5 and C6: clone, built, and ran C48_ATM and C48_S2SW successfully.
C96_atm3DVar is hanging in sfcanl jobs.

Checklist

  • Any dependent changes have been merged and published
  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have documented my code, including function, input, and output descriptions
  • My changes generate no new warnings
  • New and existing tests pass with my changes
  • This change is covered by an existing CI test or a new one has been added
  • Any new scripts have been added to the .github/CODEOWNERS file with owners
  • I have made corresponding changes to the system documentation if necessary

@DavidBurrows-NCO
Copy link
Contributor Author

Hi @aerorahul @WalterKolczynski-NOAA We're still waiting on build merges for some submodules, so I've left this PR in draft. From our conversation Tuesday, I've pointed the submodules that were merged to their respective head of develop and the others to my commit for now. Should I be pointing to my submodule commits instead to limit the number of changes coming into GW? Thanks

@jswhit
Copy link
Contributor

jswhit commented Nov 15, 2024

sorc/build_all.sh needs the following update:

--- sorc/build_all.sh
+++ sorc/build_all.sh
@@ -149,7 +149,7 @@ build_opts["ww3prepost"]="${_wave_opt} ${_verbose_opt} ${_build_ufs_opt} ${_buil

 # Optional DA builds
 if [[ "${_build_ufsda}" == "YES" ]]; then
-   if [[ "${MACHINE_ID}" != "orion" && "${MACHINE_ID}" != "hera" && "${MACHINE_ID}" != "hercules" && "${MACHINE_ID}" != "wcoss2" && "${MACHINE_ID}" != "noaacloud" && "${MACHINE_ID}" != "gaea" ]]; then
+   if [[ "${MACHINE_ID}" != "orion" && "${MACHINE_ID}" != "hera" && "${MACHINE_ID}" != "hercules" && "${MACHINE_ID}" != "wcoss2" && "${MACHINE_ID}" != "noaacloud" && "${MACHINE_ID}" != "gaeac5" && "${MACHINE_ID}" != "gaeac6" ]]; then
       echo "NOTE: The GDAS App is not supported on ${MACHINE_ID}.  Disabling build."
    else
       build_jobs["gdas"]=8

@jswhit
Copy link
Contributor

jswhit commented Nov 15, 2024

also ush/load_ufsda_modules.sh needs

--- a/ush/load_ufsda_modules.sh
+++ b/ush/load_ufsda_modules.sh
@@ -34,13 +34,13 @@ source "${HOMEgfs}/ush/module-setup.sh"
 module use "${HOMEgfs}/sorc/gdas.cd/modulefiles"

 case "${MACHINE_ID}" in
-  ("hera" | "orion" | "hercules" | "wcoss2")
+  ("hera" | "orion" | "hercules" | "gaeac5" | "gaeac6" | "wcoss2")
     module load "${MODS}/${MACHINE_ID}"
     ncdump=$( command -v ncdump )
     NETCDF=$( echo "${ncdump}" | cut -d " " -f 3 )
     export NETCDF
     ;;
-  ("jet" | "gaea" | "s4" | "acorn")
+  ("jet" | "s4" | "acorn")
     echo WARNING: UFSDA NOT SUPPORTED ON THIS PLATFORM
     ;;
   *)

@DavidBurrows-NCO
Copy link
Contributor Author

also ush/load_ufsda_modules.sh needs

Thanks @jswhit I pushed changes to ush/load_ufsda_modules.sh and sorc/build_all.sh

@jswhit
Copy link
Contributor

jswhit commented Nov 15, 2024

also...

workflow/hosts/gaeac6.yaml and gaeac5.yaml:

-QUEUE_SERVICE: normal
+QUEUE_SERVICE: hpss
 PARTITION_BATCH: batch
-PARTITION_SERVICE: batch
+PARTITION_SERVICE: dtn_f5_f6

and modulefiles/module_gwsetup.gaeac6.lua:

-prepend_path("MODULEPATH", "/ncrc/proj/epic/spack-stack/spack-stack-1.6.0/envs/unified-env/install/modulefiles/Core")
+prepend_path("MODULEPATH", "/ncrc/proj/epic/spack-stack/c6/spack-stack-1.6.0/envs/unified-env/install/modulefiles/Core")

@jswhit
Copy link
Contributor

jswhit commented Nov 15, 2024

env/GAEAC5.env and GAEAC6.env seem to be missing a bunch of stuff. I just copied HERCULES.env for both, and made some minor mods (see https://github.com/jswhit2/global-workflow/blob/develop/env/GAEAC5.env)

@jswhit
Copy link
Contributor

jswhit commented Nov 15, 2024

build_ww3prepost is failing for me on both c5 and c6 (using ufs-wx-model 2448)

@DavidBurrows-NCO
Copy link
Contributor Author

missing a bunch of stuff

@jswhit It's not really missing but intentionally minimized at the request of EMC porting to a new machine. Instead, we started from a nearly blank canvas and have been building up. Currently, the C5 and C6.env files are set up for C48_ATM, C48_S2SW, and C96_atm3DVar jobs. The 3DVarAOWCDA configuration you're running will definitely have some additional jobs. If you send those particular job names (or "step" in the env file). I will add them to the files.

@JessicaMeixner-NOAA
Copy link
Contributor

build_ww3prepost is failing for me on both c5 and c6 (using ufs-wx-model 2448)

@jswhit - can you point me to a log file? Maybe I can look and see if something is easy to fix with this.

@jswhit
Copy link
Contributor

jswhit commented Nov 18, 2024

build_ww3prepost is failing for me on both c5 and c6 (using ufs-wx-model 2448)

@jswhit - can you point me to a log file? Maybe I can look and see if something is easy to fix with this.

@JessicaMeixner-NOAA here is the error:

gpfs/f6/ira-da/proj-shared/Jeffrey.S.Whitaker/global-workflow-jswhit2/sorc/ufs_model.fd/WW3/model/src/w3initmd.F90(451): error #7002: Error in opening the compiled module file.  Check INCLUDE paths.   [WAV_RESTART_MOD]    use wav_restart_mod, only : read_restart
--------^
/gpfs/f6/ira-da/proj-shared/Jeffrey.S.Whitaker/global-workflow-jswhit2/sorc/ufs_model.fd/WW3/model/src/w3initmd.F90(975): error #6632: Keyword arguments are invalid without an explicit interface.   [VA]            call read_restart(trim(fname), va=va, mapsta=mapsta, mapst2=mapst2)
-------------------------------------------^
/gpfs/f6/ira-da/proj-shared/Jeffrey.S.Whitaker/global-workflow-jswhit2/sorc/ufs_model.fd/WW3/model/src/w3initmd.F90(975): error #6632: Keyword arguments are invalid without an explicit interface.   [MAPSTA]
            call read_restart(trim(fname), va=va, mapsta=mapsta, mapst2=mapst2)
--------------------------------------------------^
/gpfs/f6/ira-da/proj-shared/Jeffrey.S.Whitaker/global-workflow-jswhit2/sorc/ufs_model.fd/WW3/model/src/w3initmd.F90(975): error #6632: Keyword arguments are invalid without an explicit interface.   [MAPST2]
            call read_restart(trim(fname), va=va, mapsta=mapsta, mapst2=mapst2)
-----------------------------------------------------------------^
/gpfs/f6/ira-da/proj-shared/Jeffrey.S.Whitaker/global-workflow-jswhit2/sorc/ufs_model.fd/WW3/model/src/w3initmd.F90(451): error #6580: Name in only-list does not exist or is not accessible.   [READ_RESTART]
    use wav_restart_mod, only : read_restart
--------------------------------^
compilation aborted for /gpfs/f6/ira-da/proj-shared/Jeffrey.S.Whitaker/global-workflow-jswhit2/sorc/ufs_model.fd/WW3/model/src/w3initmd.F90 (code 1)

@JessicaMeixner-NOAA
Copy link
Contributor

@jswhit - Okay I know what the issue is, but it'll take a minute to get it fixed. The issue crept in with ufs-community/ufs-weather-model#2445 and we didn't catch it. If you go back one-commit of ufs-waether-model, hopefully things will run. We'll get a fix in as soon as possible.

@jswhit
Copy link
Contributor

jswhit commented Nov 19, 2024

@JessicaMeixner-NOAA I'm seeing this error in the gdas_fcst step on c6 when I run with ufs-wx-model 2448

424:  (abort_ice)ABORTED:
424:  (abort_ice) error =
424:  (construct_filename) ERROR: history filename already used for another history s
424:  tream iceh_inst.2021-03-24-10800.nc

and the traceback looks like this

473: ufs_model.x        0000000005E9CD8B  ice_broadcast_mp_         252  ice_broadcast.F90
473: ufs_model.x        0000000005F055E3  ice_history_write         169  ice_history_write.F90
473: ufs_model.x        0000000005C2A4E2  ice_history_mp_ac        4134  ice_history.F90
473: ufs_model.x        0000000005EE77FC  cice_runmod_mp_ci         367  CICE_RunMod.F90
473: ufs_model.x        0000000005B7DA06  ice_comp_nuopc_mp        1204  ice_comp_nuopc.F90
473: ufs_model.x        0000000000D05438  Unknown               Unknown  Unknown

Do you know of any recenter cice changes that could cause this?

@JessicaMeixner-NOAA
Copy link
Contributor

I don't know but I'm not as caught up on all the recent ufs wm changes as I normally am, but taking a quick look at ufs-weather-model says CICE hasn't been updated in 2 months.

@jswhit
Copy link
Contributor

jswhit commented Nov 19, 2024

For some more context on the cice error, from ice_diag.d:

(ice_comp_nuopc):(ModelAdvance) cice istep, nextsw_cday =         15      0.83111111111111D+02
 (ice_pio_init) create file ./CICE_OUTPUT/iceh_inst.2021-03-24-09600.nc

 Finished writing ./CICE_OUTPUT/iceh_inst.2021-03-24-09600.nc
(ice_comp_nuopc):(ModelAdvance) cice istep, nextsw_cday =         16      0.83118055555556D+02
 (ice_pio_init) create file ./CICE_OUTPUT/iceh_inst.2021-03-24-10200.nc

 Finished writing ./CICE_OUTPUT/iceh_inst.2021-03-24-10200.nc
(ice_comp_nuopc):(ModelAdvance) cice istep, nextsw_cday =         17      0.83125000000000D+02
 (ice_pio_init) create file ./CICE_OUTPUT/iceh_inst.2021-03-24-10800.nc

 Finished writing ./CICE_OUTPUT/iceh_inst.2021-03-24-10800.nc
 (construct_filename) history stream =            4
 (construct_filename) history filename = iceh_inst.2021-03-24-10800.nc
 (construct_filename) filename in use for stream            3
 (construct_filename) filename for stream iceh_inst.2021-03-24-10800.nc
 (construct_filename) Use namelist hist_suffix so history filenames are unique

@jswhit2 jswhit2 mentioned this pull request Nov 21, 2024
10 tasks
@jswhit2
Copy link
Contributor

jswhit2 commented Nov 21, 2024

The problem with the ice model (and a potential fix) are documented in PR #3121

@JessicaMeixner-NOAA
Copy link
Contributor

@DavidBurrows-NCO - If you will merge in develop, it will use a ufs-weather-model hash beyond what you need for Gaea and should take care of the model related issues with RESTART_FH.

@aerorahul
Copy link
Contributor

@DavidBurrows-NCO
Can you merge in develop and resolve the conflicts. With that, we should be able to move forward on this PR.

@aerorahul aerorahul removed CI-Hera-Failed **Bot use only** CI testing on Hera for this PR has failed CI-Hercules-Failed **Bot use only** CI testing on Hercules for this PR has failed labels Jan 21, 2025
@DavidBurrows-NCO
Copy link
Contributor Author

Thanks @JessicaMeixner-NOAA for reaching out about this. @aerorahul Yes..working to update my branch to develop today and will retest C48_S2SW.

@DavidBurrows-NCO
Copy link
Contributor Author

@aerorahul I aligned my branch with develop today and tested C48_S2SW on C6 successfully. F5 is unmounted from C5 today, so I will have to test C5 tomorrow.

@DavidBurrows-NCO
Copy link
Contributor Author

Morning @aerorahul. I was able to clone, build, and run C48_S2SW on C5 this morning. The only issue I see is when create_experiment runs on C5 and C6, memory is set in the xml file for the gfs_wavepostsbs job. This causes a failure in job submission. I'm not sure where that is occurring though.

aerorahul
aerorahul previously approved these changes Jan 22, 2025
Copy link
Contributor

@aerorahul aerorahul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me.
Since this updates a lot of submodules, it should be tested across all machines.

aerorahul
aerorahul previously approved these changes Jan 22, 2025
Copy link
Contributor

@aerorahul aerorahul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good.
Since the PR does not update any submodules or impacts running on any of the machines WCOSS2, Hera, Hercules, or Orion, there is no need to run CI on this PR.

Copy link
Member

@KateFriedman-NOAA KateFriedman-NOAA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks @DavidBurrows-NCO !

Copy link
Contributor

@aerorahul aerorahul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm.

@aerorahul aerorahul merged commit 01f9c35 into NOAA-EMC:develop Jan 22, 2025
5 checks passed
@DavidBurrows-NCO DavidBurrows-NCO deleted the gw_c5OSc6 branch January 23, 2025 13:39
tsga added a commit to tsga/global-workflow that referenced this pull request Jan 23, 2025
* develop:
  Only run METplus in the 3Dvar tests (NOAA-EMC#3245)
  Clone, build, and run C48_ATM and C48_S2SW on Gaea C5 and C6 (NOAA-EMC#3106)
tsga added a commit to tsga/global-workflow that referenced this pull request Jan 23, 2025
* develop:
  Only run METplus in the 3Dvar tests (NOAA-EMC#3245)
  Clone, build, and run C48_ATM and C48_S2SW on Gaea C5 and C6 (NOAA-EMC#3106)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

GW submodules no longer building on Gaea-C5 after OS upgrade; Also add Gaea-C6 build
8 participants