Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

coastal_ian_atlantic_datm2sch2ww3 #124

Open
Tracked by #92
yunfangsun opened this issue Aug 1, 2024 · 60 comments
Open
Tracked by #92

coastal_ian_atlantic_datm2sch2ww3 #124

yunfangsun opened this issue Aug 1, 2024 · 60 comments
Assignees

Comments

@yunfangsun
Copy link
Collaborator

No description provided.

@yunfangsun
Copy link
Collaborator Author

Hi @uturuncoglu @saeed-moghimi-noaa @janahaddad @pvelissariou1

For the high-resolution subset mesh, I have finished the ATM+WW3 at /work2/noaa/nosofs/yunfangs/stmp/yunfangs/FV3_RT/rt_3388781_atmww3_052024/coastal_ian_subset_atm2ww3_intel
and ATM+SCH at /work2/noaa/nosofs/yunfangs/stmp/yunfangs/FV3_RT/rt_2446480_atmsch_052024/coastal_ian_atlantic_atm2sch_intel_subset
The results are correct.

However, for the ATM+SCH+WW3 (/work2/noaa/nosofs/yunfangs/stmp/yunfangs/FV3_RT/rt_324728_atmschww3_06132024/coastal_ian_subset_atm2sch2ww3_intel_5_4), I am using the same input files, the case can't start, and ESMF is starting to complaining the mesh (which is the same as the two cases above).

The error message is as follows:

20240731 223248.119 ERROR            PET0122 /work/noaa/epic/role-epic/spack-stack/hercules/spack-stack-1.5.1/cache/build_stage/spack-stage-esmf-8.5.0-ffgsxo7vntsk3fr5lwbjnjrrviof6dz5/spack-src/src/Infrastructure/Mesh/src/ESMCI_Mesh_Glue.C:6233 ESMCI_meshcreatedual() Internal error: Bad condition  - /work/noaa/epic/role-epic/spack-stack/hercules/spack-stack-1.5.1/cache/build_stage/spack-stage-esmf-8.5.0-ffgsxo7vntsk3fr5lwbjnjrrviof6dz5/spack-src/src/Infrastructure/Mesh/src/ESMCI_MeshDual.C, line:950: - there was a problem with triangulation (e.g. repeated points, clockwise poly, etc.)
20240731 223248.119 ERROR            PET0122 ESMCI_MeshCap.C:258 MeshCap::meshcreatedual() Internal error: Bad condition  - Internal subroutine call returned Error
20240731 223248.119 ERROR            PET0122 ESMF_Mesh.F90:3706 ESMF_MeshCreateDual() Internal error: Bad condition  - Internal subroutine call returned Error
20240731 223248.119 ERROR            PET0122 ESMF_FieldRegrid.F90:1375 ESMF_FieldRegridStoreNX Internal error: Bad condition  - Internal subroutine call returned Error
20240731 223248.119 ERROR            PET0122 ESMF_FieldRegrid.F90:976 ESMF_FieldRegridStoreNX Internal error: Bad condition  - Internal subroutine call returned Error
20240731 223248.119 ERROR            PET0122 med_map_mod.F90:503 Internal error: Bad condition  - Passing error in return code
20240731 223248.119 ERROR            PET0122 med_map_mod.F90:179 Internal error: Bad condition  - Passing error in return code
20240731 223248.119 ERROR            PET0122 med.F90:1871 Internal error: Bad condition  - Passing error in return code
20240731 223248.119 ERROR            PET0122 MED:src/addon/NUOPC/src/NUOPC_ModelBase.F90:1631 Internal error: Bad condition  - Passing error in return code
20240731 223248.119 ERROR            PET0122 /work/noaa/epic/role-epic/spack-stack/hercules/spack-stack-1.5.1/cache/build_stage/spack-stage-esmf-8.5.0-ffgsxo7vntsk3fr5lwbjnjrrviof6dz5/spack-src/src/Infrastructure/Trace/src/ESMCI_Trace.C:1707 ESMCI:TraceEventPhaseExit() Wrong argument specified  - Trace regions not properly nested exiting from region: [MED] IPDv03p7 Expected exit from: MED: (med_map_mod: RouteHandles_init)
20240731 223248.119 ERROR            PET0122 /work/noaa/epic/role-epic/spack-stack/hercules/spack-stack-1.5.1/cache/build_stage/spack-stage-esmf-8.5.0-ffgsxo7vntsk3fr5lwbjnjrrviof6dz5/spack-src/src/Infrastructure/Trace/src/ESMCI_Trace.C:1665 ESMCI::TraceEventCompPhaseExit() Wrong argument specified  - Internal subroutine call returned Error
20240731 223248.119 ERROR            PET0122 ESMCI_FTable.C:832 ESMCI_FTableCallEntryPointVMHop Wrong argument specified  - Internal subroutine call returned Error
20240731 223248.119 ERROR            PET0122 ESMCI_FTable.C:1100 c_esmc_compwait Wrong argument specified  - Internal subroutine call returned Error
20240731 223248.119 ERROR            PET0122 ESMF_Comp.F90:1256 ESMF_CompExecute Wrong argument specified  - Internal subroutine call returned Error
20240731 223248.119 ERROR            PET0122 ESMF_GridComp.F90:1426 ESMF_GridCompInitialize Wrong argument specified  - Internal subroutine call returned Error
20240731 223248.119 ERROR            PET0122 UFS Driver Grid Comp:src/addon/NUOPC/src/NUOPC_Driver.F90:2892 Wrong argument specified  - Failed calling phase 'IPDv03p7' Initialize for modelComp 4: MED
20240731 223248.119 ERROR            PET0122 UFS Driver Grid Comp:src/addon/NUOPC/src/NUOPC_Driver.F90:2696 Wrong argument specified  - Passing error in return code
20240731 223248.119 ERROR            PET0122 UFS Driver Grid Comp:src/addon/NUOPC/src/NUOPC_Driver.F90:2454 Wrong argument specified  - Passing error in return code
20240731 223248.119 ERROR            PET0122 UFS Driver Grid Comp:src/addon/NUOPC/src/NUOPC_Driver.F90:489 Wrong argument specified  - Passing error in return code
20240731 223248.119 ERROR            PET0122 /work/noaa/epic/role-epic/spack-stack/hercules/spack-stack-1.5.1/cache/build_stage/spack-stage-esmf-8.5.0-ffgsxo7vntsk3fr5lwbjnjrrviof6dz5/spack-src/src/Infrastructure/Trace/src/ESMCI_Trace.C:1707 ESMCI:TraceEventPhaseExit() Wrong argument specified  - Trace regions not properly nested exiting from region: [UFS Driver Grid Comp] Init 1 Expected exit from: MED: (med_map_mod: RouteHandles_init)
20240731 223248.119 ERROR            PET0122 /work/noaa/epic/role-epic/spack-stack/hercules/spack-stack-1.5.1/cache/build_stage/spack-stage-esmf-8.5.0-ffgsxo7vntsk3fr5lwbjnjrrviof6dz5/spack-src/src/Infrastructure/Trace/src/ESMCI_Trace.C:1665 ESMCI::TraceEventCompPhaseExit() Wrong argument specified  - Internal subroutine call returned Error
20240731 223248.119 ERROR            PET0122 ESMCI_FTable.C:832 ESMCI_FTableCallEntryPointVMHop Wrong argument specified  - Internal subroutine call returned Error
20240731 223248.119 ERROR            PET0122 ESMCI_FTable.C:1100 c_esmc_compwait Wrong argument specified  - Internal subroutine call returned Error
20240731 223248.119 ERROR            PET0122 ESMF_Comp.F90:1256 ESMF_CompExecute Wrong argument specified  - Internal subroutine call returned Error
20240731 223248.119 ERROR            PET0122 ESMF_GridComp.F90:1426 ESMF_GridCompInitialize Wrong argument specified  - Internal subroutine call returned Error
20240731 223248.119 ERROR            PET0122 UFS.F90:393 Wrong argument specified  - Aborting UFS
20240731 223248.119 INFO             PET0122 Finalizing ESMF
20240731 223248.119 ERROR            PET0122 /work/noaa/epic/role-epic/spack-stack/hercules/spack-stack-1.5.1/cache/build_stage/spack-stage-esmf-8.5.0-ffgsxo7vntsk3fr5lwbjnjrrviof6dz5/spack-src/src/Infrastructure/Trace/src/ESMCI_Trace.C:1816 ESMCI::TraceEventRegionExit() Wrong argument specified  - Trace regions not properly nested exiting from region: [ESMF] Expected exit from: MED: (med_map_mod: RouteHandles_init)
20240731 223248.119 ERROR            PET0122 /work/noaa/epic/role-epic/spack-stack/hercules/spack-stack-1.5.1/cache/build_stage/spack-stage-esmf-8.5.0-ffgsxo7vntsk3fr5lwbjnjrrviof6dz5/spack-src/src/Infrastructure/Trace/src/ESMCI_Trace.C:1258 ESMCI::TraceClose() Wrong argument specified  - Internal subroutine call returned Error

Hi @uturuncoglu , do you know if there is a way for ESMF to locate the problematic triangulation and write the log file?

Thank you!

@uturuncoglu
Copy link
Collaborator

@yunfangsun It seems that there is an issue with the mesh. So, I think it is an issue with WW3 mesh. I think there is no way in WW3 side to write the mesh but CMEPs mediator could dump the mesh in VTK format. Please set dbug_flag = 30in the ufs.configure all component section (or med section). Once, you run it it might write VTK files of the mesh and then we could check them to see any issue in the mesh.

@yunfangsun
Copy link
Collaborator Author

Hi @uturuncoglu ,

I have used dbug_flag = 30 in the ufs.configure

# MED #
MED_model:                      cmeps
MED_petlist_bounds:             0 1599
MED_omp_num_threads:            1
MED_attributes::
  ATM_model = datm
  OCN_model = schism
  WAV_model = ww3
  history_n = 1
  history_option = nhours
  history_ymd = -999
  coupling_mode = coastal
  pio_typename = PNETCDF
  pio_numiotasks = 32
  dbug_flag = 30
::

And I didn't see the vtk files in the folder /work2/noaa/nosofs/yunfangs/stmp/yunfangs/FV3_RT/rt_324728_atmschww3_06132024/coastal_ian_subset_atm2sch2ww3_intel_5_4

If there anything wrong in the ufs.configure

@yunfangsun
Copy link
Collaborator Author

Hi @sbanihash,

Thank you for helping me check the log files,
There are another copies of the files located at ATM+WW3: /work2/noaa/nos-surge/yunfangs/stmp/yunfangs/FV3_RT/rt_3388781_atmww3_052024/coastal_ian_subset_atm2ww3_intel

ATM+SCH+WW3: /work2/noaa/nos-surge/yunfangs/stmp/yunfangs/FV3_RT/rt_324728_atmschww3_06132024/coastal_ian_subset_atm2sch2ww3_intel_test4_1

@uturuncoglu
Copy link
Collaborator

@yunfangsun you could also try to set it in the code. Goto mediator/med_constants_mod.F90 and set med_constants_dbug_flag something big like 30. You need to compile the code from scratch before run.

@yunfangsun
Copy link
Collaborator Author

To test the forcing issue, in the ATM+SCH+WW3 case, the forcing flags are turned off

input%forcing%winds      = 'F'
  input%forcing%currents   = 'F'

The run can't go through

To test the memory issue, for the ATM+SCH+WW3, the WW3 parts are using the same number of cores (5000 cores), ATM+SCH+WW3 is still failed.

@yunfangsun
Copy link
Collaborator Author

yunfangsun commented Aug 8, 2024

The mesh quality of the subset mesh is evaluated:
Screenshot 2024-08-08 at 9 46 08 AM
Screenshot 2024-08-08 at 9 46 19 AM

Screenshot 2024-08-08 at 9 46 33 AM Screenshot 2024-08-08 at 9 46 42 AM Screenshot 2024-08-08 at 9 46 42 AM Screenshot 2024-08-08 at 9 47 43 AM

There are two points are almost identical, the modification is done
Screenshot 2024-08-08 at 10 00 17 AM

@felicio93
Copy link

Per Yunfang's request I reran the mesh subseting/merging processes using OCSMesh latest functions.

This is the code I used:

import ocsmesh
import geopandas as gpd

path = r"PATH/subset/"
shape = gpd.read_file(path+'file34.shp')
highres = ocsmesh.Mesh.open(path+'ECG120.2dm', crs=4326)
lowres = ocsmesh.Mesh.open(path+'final_mesh.2dm', crs=4326)

highres_clip = ocsmesh.utils.clip_mesh_by_shape(highres.msh_t,
                                                shape=shape.union_all(),
                                                inverse=False,
                                                fit_inside=False,
                                                check_cross_edges=False,
                                                adjacent_layers=0)
merged = ocsmesh.utils.merge_overlapping_meshes([lowres.msh_t,highres_clip],adjacent_layers=2)
ocsmesh.utils.cleanup_duplicates(merged)
ocsmesh.utils.put_id_tags(merged)

ocsmesh.Mesh(merged).write(path+"merged.2dm", format='2dm', overwrite=True)

@yunfangsun you will probably have to interpolate the DEM back to the mesh again.

The final mesh was uploaded here
The mesh can be open on SMS with any errors, which means no invalid elements were created during the merging process.

Please let me know if there is anything else I can help with. Also, let know if you encounter any problems with this mesh.

@janahaddad janahaddad changed the title combined Atl mesh coastal_ian_atlantic_datm2sch2ww3 Aug 20, 2024
@yunfangsun
Copy link
Collaborator Author

I have manually corrected the mesh by using SMS.
The original number of nodes is 2470094, number of cells is 4886065. After the duplicated nodes, and nearly duplicated nodes, the final mesh have 2470026 nodes, and 4885987 cells (no node position is changed).
a6

And now the worst quality mesh calculated from MATLAB is as follows:

a5
a4

This mesh is acceptable.

@yunfangsun
Copy link
Collaborator Author

Based on the above mesh, the new ATM+SCHISM, ATM+WW3, ATM+SCHISM+WW3 configuration is developed for the current version of UFS-Coastal.

The ATM+SCHISM is located at /work2/noaa/nosofs/yunfangs/hurricane_ian/atm_sch_new/coastal_ian_subset_atm2sch_intel_1, this configuration is working for the Hurricane Ian case, and tested by @mansurjisan .

The ATM+WW3 is located at /work2/noaa/nosofs/yunfangs/hurricane_ian/atm_ww_new/coastal_ian_subset_atm2ww3_intel_1, this configuration is also working, finished the simulation.

The ATM+SCHISM+WW3 is located at /work2/noaa/nosofs/yunfangs/hurricane_ian/atm_sch_ww3_new/coastal_ian_subset_atm2sch2ww3_intel_1, however, the same error occurs.

@yunfangsun
Copy link
Collaborator Author

Since now the mesh quality is validated, a few of ATM+SCHISM+WW3 configurations have been tested:

Since this mesh has the maximum numbers of elements for all of the existing ufs-coastal applications,

The may be possible memory issue, I have used different number of cores for this case:

In the folder of /work2/noaa/nosofs/yunfangs/hurricane_ian/atm_sch_ww3_new/coastal_ian_subset_atm2sch2ww3_intel_1_1, 8000 cores are used, the job dropped at the processors of 6163, 6675.

/work2/noaa/nosofs/yunfangs/hurricane_ian/atm_sch_ww3_new/coastal_ian_subset_atm2sch2ww3_intel_1_2, 10000 cores are used, the job dropped at the processors of 8866, 977.

/work2/noaa/nosofs/yunfangs/hurricane_ian/atm_sch_ww3_new/coastal_ian_subset_atm2sch2ww3_intel_1_3, 12000 cores are used, the job dropped at the processors of 4494, 9876, 9875.
all of the error messages are similar:

20240821 131839.030 ERROR            PET04494 /work/noaa/epic/role-epic/spack-stack/hercules/spack-stack-1.6.0/cache/build_stage/spack-stage-esmf-8.6.0-rqrapepmgfb7kpri3ynqlxusquf6npfq/spack-src/src/Infrastructure/Mesh/src/ESMCI_Mesh_Glue.C:6233 ESMCI_meshcreatedual() Internal error: Bad condition  - /work/noaa/epic/role-epic/spack-stack/hercules/spack-stack-1.6.0/cache/build_stage/spack-stage-esmf-8.6.0-rqrapepmgfb7kpri3ynqlxusquf6npfq/spack-src/src/Infrastructure/Mesh/src/ESMCI_MeshDual.C, line:950: - there was a problem with triangulation (e.g. repeated points, clockwise poly, etc.)
20240821 131839.030 ERROR            PET04494 ESMCI_MeshCap.C:258 MeshCap::meshcreatedual() Internal error: Bad condition  - Internal subroutine call returned Error
20240821 131839.030 ERROR            PET04494 ESMF_Mesh.F90:3707 ESMF_MeshCreateDual() Internal error: Bad condition  - Internal subroutine call returned Error
20240821 131839.030 ERROR            PET04494 ESMF_FieldRegrid.F90:1414 getMeshWithNodesOnFieldLoc Internal error: Bad condition  - Internal subroutine call returned Error
20240821 131839.030 ERROR            PET04494 ESMF_FieldRegrid.F90:1011 ESMF_FieldRegridStoreNX Internal error: Bad condition  - Internal subroutine call returned Error
20240821 131839.030 ERROR            PET04494 med_map_mod.F90:503 Internal error: Bad condition  - Passing error in return code
20240821 131839.030 ERROR            PET04494 med_map_mod.F90:179 Internal error: Bad condition  - Passing error in return code
20240821 131839.030 ERROR            PET04494 med.F90:1871 Internal error: Bad condition  - Passing error in return code
20240821 131839.030 ERROR            PET04494 MED:src/addon/NUOPC/src/NUOPC_ModelBase.F90:1631 Internal error: Bad condition  - Passing error in return code
20240821 131839.030 ERROR            PET04494 /work/noaa/epic/role-epic/spack-stack/hercules/spack-stack-1.6.0/cache/build_stage/spack-stage-esmf-8.6.0-rqrapepmgfb7kpri3ynqlxusquf6npfq/spack-src/src/Infrastructure/Trace/src/ESMCI_Trace.C:1707 ESMCI:TraceEventPhaseExit() Wrong argument specified  - Trace regions not properly nested exiting from region: [MED] IPDv03p7 Expected exit from: MED: (med_map_mod: RouteHandles_init)
20240821 131839.030 ERROR            PET04494 /work/noaa/epic/role-epic/spack-stack/hercules/spack-stack-1.6.0/cache/build_stage/spack-stage-esmf-8.6.0-rqrapepmgfb7kpri3ynqlxusquf6npfq/spack-src/src/Infrastructure/Trace/src/ESMCI_Trace.C:1665 ESMCI::TraceEventCompPhaseExit() Wrong argument specified  - Internal subroutine call returned Error
20240821 131839.030 ERROR            PET04494 ESMCI_FTable.C:832 ESMCI_FTableCallEntryPointVMHop Wrong argument specified  - Internal subroutine call returned Error
20240821 131839.030 ERROR            PET04494 ESMCI_FTable.C:1100 c_esmc_compwait Wrong argument specified  - Internal subroutine call returned Error
20240821 131839.030 ERROR            PET04494 ESMF_Comp.F90:1285 ESMF_CompExecute Wrong argument specified  - Internal subroutine call returned Error
20240821 131839.030 ERROR            PET04494 ESMF_GridComp.F90:1433 ESMF_GridCompInitialize Wrong argument specified  - Internal subroutine call returned Error
20240821 131839.030 ERROR            PET04494 UFS Driver Grid Comp:src/addon/NUOPC/src/NUOPC_Driver.F90:2895 Wrong argument specified  - Failed calling phase 'IPDv03p7' Initialize for modelComp 4: MED
20240821 131839.030 ERROR            PET04494 UFS Driver Grid Comp:src/addon/NUOPC/src/NUOPC_Driver.F90:2699 Wrong argument specified  - Passing error in return code
20240821 131839.030 ERROR            PET04494 UFS Driver Grid Comp:src/addon/NUOPC/src/NUOPC_Driver.F90:2457 Wrong argument specified  - Passing error in return code
20240821 131839.030 ERROR            PET04494 UFS Driver Grid Comp:src/addon/NUOPC/src/NUOPC_Driver.F90:492 Wrong argument specified  - Passing error in return code
20240821 131839.030 ERROR            PET04494 /work/noaa/epic/role-epic/spack-stack/hercules/spack-stack-1.6.0/cache/build_stage/spack-stage-esmf-8.6.0-rqrapepmgfb7kpri3ynqlxusquf6npfq/spack-src/src/Infrastructure/Trace/src/ESMCI_Trace.C:1707 ESMCI:TraceEventPhaseExit() Wrong argument specified  - Trace regions not properly nested exiting from region: [UFS Driver Grid Comp] Init 1 Expected exit from: MED: (med_map_mod: RouteHandles_init)
20240821 131839.030 ERROR            PET04494 /work/noaa/epic/role-epic/spack-stack/hercules/spack-stack-1.6.0/cache/build_stage/spack-stage-esmf-8.6.0-rqrapepmgfb7kpri3ynqlxusquf6npfq/spack-src/src/Infrastructure/Trace/src/ESMCI_Trace.C:1665 ESMCI::TraceEventCompPhaseExit() Wrong argument specified  - Internal subroutine call returned Error
20240821 131839.030 ERROR            PET04494 ESMCI_FTable.C:832 ESMCI_FTableCallEntryPointVMHop Wrong argument specified  - Internal subroutine call returned Error
20240821 131839.030 ERROR            PET04494 ESMCI_FTable.C:1100 c_esmc_compwait Wrong argument specified  - Internal subroutine call returned Error
20240821 131839.030 ERROR            PET04494 ESMF_Comp.F90:1285 ESMF_CompExecute Wrong argument specified  - Internal subroutine call returned Error
20240821 131839.030 ERROR            PET04494 ESMF_GridComp.F90:1433 ESMF_GridCompInitialize Wrong argument specified  - Internal subroutine call returned Error
20240821 131839.030 ERROR            PET04494 UFS.F90:393 Wrong argument specified  - Aborting UFS
20240821 131839.030 INFO             PET04494  Finalizing ESMF with endflag==ESMF_END_ABORT
20240821 131839.030 ERROR            PET04494 /work/noaa/epic/role-epic/spack-stack/hercules/spack-stack-1.6.0/cache/build_stage/spack-stage-esmf-8.6.0-rqrapepmgfb7kpri3ynqlxusquf6npfq/spack-src/src/Infrastructure/Trace/src/ESMCI_Trace.C:1816 ESMCI::TraceEventRegionExit() Wrong argument specified  - Trace regions not properly nested exiting from region: [ESMF] Expected exit from: MED: (med_map_mod: RouteHandles_init)
20240821 131839.030 ERROR            PET04494 /work/noaa/epic/role-epic/spack-stack/hercules/spack-stack-1.6.0/cache/build_stage/spack-stage-esmf-8.6.0-rqrapepmgfb7kpri3ynqlxusquf6npfq/spack-src/src/Infrastructure/Trace/src/ESMCI_Trace.C:1258 ESMCI::TraceClose() Wrong argument specified  - Internal subroutine call returned Error
20240821 131839.030 ERROR            PET04494 ESMF_Trace.F90:102 ESMF_TraceClose() Wrong argument specified  - Internal subroutine call returned Error

@yunfangsun
Copy link
Collaborator Author

yunfangsun commented Aug 26, 2024

To test the exact location of the broken run.

The subset mesh /work2/noaa/nosofs/yunfangs/hurricane_ian/atm_sch_ww3_new/coastal_ian_subset_atm2sch2ww3_intel_1_9_1_9_1 and coarse schism mesh /work2/noaa/nosofs/yunfangs/hurricane_ian/atm_sch_ww3_new/coastal_ian_subset_atm2sch2ww3_intel_coarse_mesh_new_1 are both used and for the core usages:
100 for CMEPS
20 for DATM CDEPS, 40 SCHISM, 40 WW3

The option debug_level = 2 is used, and then the vtk format of mesh on each core for schism is written out, however, it is not working for ww3.

export_2022-09-15T00:00:00_ocean_mask.40.39.vtk
export_2022-09-15T00:00:00_ocn_current_zonal.40.30.vtk

And for ww3 Diagnostic = 1 is used, but it gives the outputs for the whole domain not on subdomains on each core:

diagnostic_WAV_InitializeIPDv01p5_enter_import_2022_09_15_00_00_00_000_Sa_u10m.nc
diagnostic_WAV_InitializeIPDv01p5_enter_import_2022_09_15_00_00_00_000_Sa_v10m.nc
diagnostic_WAV_InitializeIPDv01p5_enter_import_2022_09_15_00_00_00_000_So_u.nc
diagnostic_WAV_InitializeIPDv01p5_enter_import_2022_09_15_00_00_00_000_So_v.nc
diagnostic_WAV_InitializeIPDvXp07_enter_import_2022_09_15_00_00_00_000_Sa_u10m.nc
diagnostic_WAV_InitializeIPDvXp07_enter_import_2022_09_15_00_00_00_000_Sa_v10m.nc
diagnostic_WAV_InitializeIPDvXp07_enter_import_2022_09_15_00_00_00_000_So_u.nc
diagnostic_WAV_InitializeIPDvXp07_enter_import_2022_09_15_00_00_00_000_So_v.nc

@yunfangsun
Copy link
Collaborator Author

yunfangsun commented Aug 26, 2024

By comparing the up two cases:

The subset mesh configuration is stopped in the cmeps

Before the following:

PET84 after  med_map_RouteHandles_init
20240825 130547.125 INFO             PET84  Map type bilnr_nstod, destcomp ocn,  mapnorm one  Sa_pslv
20240825 130547.126 INFO             PET84  Map type bilnr_nstod, destcomp ocn,  mapnorm one  Sw_wavsuu
20240825 130547.127 INFO             PET84  Map type bilnr_nstod, destcomp wav,  mapnorm one  Sa_u10m
20240825 130547.127 INFO             PET84  Map type bilnr_nstod, destcomp wav,  mapnorm one  So_u
20240825 130547.136 INFO             PET84 (med.F90:DataInitialize): called

The subset mesh is broken at Infrastructure/Mesh/src/ESMCI_MeshDual.C for the ghost mesh part

@yunfangsun
Copy link
Collaborator Author

Hi @uturuncoglu ,

The configuration with mesh problem is located at /work2/noaa/nos-surge/yunfangs/stmp/yunfangs/FV3_RT/ufuk/coastal_ian_subset_atm2sch2ww3_intel_1_9_1

The configuration without problem is located at /work2/noaa/nos-surge/yunfangs/stmp/yunfangs/FV3_RT/ufuk/coastal_ian_subset_atm2sch2ww3_intel_coarse_mesh_new_1

Thank you!

@uturuncoglu
Copy link
Collaborator

@yunfangsun I could able to create VTK files but since they are very high resolution it is hard to fine the problematic region. So, I ask one of my colleague for help. I'll update you when I have some update about it. In the mean time, I plot those vtk files using Preview (a simile section of it can be seen in the following figure) and I wonder if they are exactly same or not. From plot, it seems that meshes are not same. Maybe the difference is coming because SCHSIM is creating mesh using ESMF API and WW3 is creating by reading SCRIP grid definition (netcdf file).

Screenshot 2024-08-29 at 4 19 50 PM

@yunfangsun
Copy link
Collaborator Author

Hi @uturuncoglu ,

The SCHISM and WW3 are using exactly the same mesh:

ww3 mesh

netcdf scrip_ww3_esmf {
dimensions:
	nodeCount = 4995917 ;
	elementCount = 2470026 ;
	maxNodePElement = 11 ;
	coordDim = 2 ;
variables:
	double nodeCoords(nodeCount, coordDim) ;
		nodeCoords:units = "degrees" ;
	int elementConn(elementCount, maxNodePElement) ;
		elementConn:long_name = "Node indices that define the element connectivity" ;
		elementConn:_FillValue = -1 ;
	int numElementConn(elementCount) ;
		numElementConn:long_name = "Number of nodes per element" ;
	double centerCoords(elementCount, coordDim) ;
		centerCoords:units = "degrees" ;
	int elementMask(elementCount) ;
		elementMask:units = "unitless" ;

// global attributes:
		:gridType = "unstructured mesh" ;
		:version = "0.9" ;
		:inputFile = "scrip.nc" ;
		:timeGenerated = "Tue Aug 20 14:31:27 2024" ;
}

SCHISM hgrid file:

EPSG:4326
4885987 2470026
1 -77.95614850 35.30211970 -17.62350390
2 -77.95471320 35.30315010 -18.46252900
3 -77.95556400 35.30068700 -17.86483240

@uturuncoglu
Copy link
Collaborator

@yunfangsun It seems that is an issue with the triangulation. I need to run the case with a specific version of ESMf that will output the region that causes issue. I think I could use your directory /work2/noaa/nos-surge/yunfangs/stmp/yunfangs/FV3_RT/ufuk/coastal_ian_subset_atm2sch2ww3_intel_1_9_1. Right?

@uturuncoglu
Copy link
Collaborator

@yunfangsun They seems they are same but actually not. I think creating mesh with SCRIP file might use same coordinates but could end up different number of elements. So, maybe in WW3 side, ESMf call needs to get extra argument to have exact mesh but not sure.

@yunfangsun
Copy link
Collaborator Author

Hi @uturuncoglu , Yes, the folder is /work2/noaa/nos-surge/yunfangs/stmp/yunfangs/FV3_RT/ufuk/coastal_ian_subset_atm2sch2ww3_intel_1_9_1

@yunfangsun
Copy link
Collaborator Author

Hi @uturuncoglu ,

To produce the scrip_ww3_esmf.nc file, I firstly ran the WW3 stand-alone case and get a scrip.nc file, and then use ESMF_Scrip2Unstruct scrip.nc scrip_ww3_esmf.nc 0 ESMF to produce the scrip_ww3_esmf.nc file.

@uturuncoglu
Copy link
Collaborator

@yunfangsun That is great. Yes, that is the preferred way. Once I have ESMF with debug feature, I'll try to run your case and we see the issue. After that we could look at WW3 side for its mesh generation.

@uturuncoglu
Copy link
Collaborator

@yunfangsun BTW, I am getting permission error for following files,

cp: cannot open '/work2/noaa/nos-surge/yunfangs/stmp/yunfangs/FV3_RT/ufuk/coastal_ian_subset_atm2sch2ww3_intel_1_9_1/atlantic.msh' for reading: Permission denied
cp: cannot open '/work2/noaa/nos-surge/yunfangs/stmp/yunfangs/FV3_RT/ufuk/coastal_ian_subset_atm2sch2ww3_intel_1_9_1/.atlantic.msh.swp' for reading: Permission denied

Are those used by the case? If not it is not important.

@yunfangsun
Copy link
Collaborator Author

Hi @uturuncoglu ,
I have changed the permission of atlantic.msh, could you please try it again? thank you!

@uturuncoglu
Copy link
Collaborator

@yunfangsun Do you remember your command that you used to convert scrip file to esmf mesh file. It seems that ESMf created dual mesh for WW3. The command should be something like ESMF_Scrip2Unstruct input_SCRIP.nc output_ESMFmesh.nc 0 but if you pass 1 as a last argument rather than 0, it converts the mesh to dual mesh (basically swaps nodes with elements). So, it would be nice to check that one. Of course this is not related with the issue that we seeing in the mesh but it could explain the issue of not having identical mesh in both side.

@yunfangsun
Copy link
Collaborator Author

Hi @uturuncoglu ,

I am using ESMF_Scrip2Unstruct scrip.nc scrip_ww3_esmf.nc 0 ESMF
and it works for the coarse SCHISM mesh and HOSFS mesh

@uturuncoglu
Copy link
Collaborator

@yunfangsun Okay but it does not mean both side has same mesh in the coarse case. It is just running. Do you still have scrip.nc file around?

@yunfangsun
Copy link
Collaborator Author

Hi @uturuncoglu ,

Yes, the scrip.nc is located at /work2/noaa/nos-surge/yunfangs/stmp/yunfangs/FV3_RT/ufuk/coastal_ian_subset_atm2sch2ww3_intel_1_9_1

@yunfangsun
Copy link
Collaborator Author

Hi @uturuncoglu

Yes, atlantic.msh is the one used by WW3 to generate SCRIP file .

And I have copied the ww3_grid.inp and ww3_grid.out to /work2/noaa/nos-surge/yunfangs/stmp/yunfangs/FV3_RT/ufuk/coastal_ian_subset_atm2sch2ww3_intel_1_1_new_msh09182024/

@uturuncoglu
Copy link
Collaborator

@yunfangsun @janahaddad Just to confirm mesh resolution difference, I run coastal_ike_shinnecock_atm2sch2ww3 RT and check the meshes over there. It seems we have same issue also there. The SCHSIM mesh is coarser than WW3 (see following plot - white one is for WW3 and red one is for SCHSIM).

Screenshot 2024-10-14 at 10 52 22 PM

As next step, I'll try to run SCHSIM with node based decomposition to see it has same issue or not. This case would be easier to debug than Ian case since it uses less processor but it also shows the same issue which is great. I'll keep you updated about it.

PS: I am also working on interpolation issue. I need to run the case again to get more debug output. That requires updating debug version of ESMF and run the case agains it. I'll keep updating about it too.

@uturuncoglu
Copy link
Collaborator

@yunfangsun @janahaddad Okay. I run same configuration but this time SCHSIM uses node based decomposition and no CMEPS mediator. The result is same. See the following plot (left is SCHSIM is using element based decomposition and right one is using node based decomposition).

Screenshot 2024-10-14 at 11 15 33 PM

I think there is something going on mesh generation of SCHSIM and it creates nodes on element centers. Since we implemented element based decomposition approach using node based on as reference to be compatible with CMEPS, I think we also inherit this issue.

@josephzhang8 @platipodium I am not sure you check this before or not but it seems that SCHSIM mesh generation has issue and it does not creating exactly same mesh with WW3 and this is same in both element and node based decomposition algorithm. I'll check both and try to solve the issue. In the meantime, please let me know if you have some information or suggestion.

@platipodium
Copy link
Member

@josephzhang8 @platipodium I am not sure you check this before or not but it seems that SCHSIM mesh generation has issue and it does not creating exactly same mesh with WW3 and this is same in both element and node based decomposition algorithm. I'll check both and try to solve the issue. In the meantime, please let me know if you have some information or suggestion.

@uturuncoglu We certainly did not check this before in the detail you did. I may have been too naive implementing the decomposition and it is quite possible that we thought we told it to decompose on nodes but in fact let it decompose on element centers. I would look at the arguments of the MeshCreate call; I believe we gave it node coords (need to check) but may have provided some information such that the actual decomposition is on the element centers (indirectly calculated from node coords). Please go ahead making necessary changes.

@saeed-moghimi-noaa
Copy link

@uturuncoglu

Hi Ufuk,
Would doing the same test for ADCIRC be helpful? From what I remember, we have this checked when I was developing the node based / communicator only data exchange for ADC-WW3 coupling.

Thanks

@josephzhang8
Copy link
Collaborator

The native decomp inside SCHISM is elem based, not node based. I think this is the root cause for confusion. Once the decomp is done, node/side/elem inside SCHISM are assigned to each MPI process, so resident elements (non-ghost) are exclusive in each process, but resident nodes/sides can be present in >1 process, as a result of the decomp.

I don't know how exactly ESMF distribute the info among processes; I was told we only need resident, not ghost. But how about the case where nodes/sides are resident on >1 proc? The same issue is there with node based decomp; in that case, elem will be resident on >1 proc.

@josephzhang8
Copy link
Collaborator

I suggest 3 of us (Ufuk, Carsten and I) have a meeting to discuss this asap. I'm available tomorrow between 10 and noon ET. Thx.

@uturuncoglu
Copy link
Collaborator

@saeed-moghimi-noaa Yes, I could checkADCIRC configuration since we have no WW3 component in there I am not sure it would be helpful or not. Anyway, let me try.

@uturuncoglu
Copy link
Collaborator

@platipodium I am not sure this is simply decomposition issue or not. Maybe it is just the way of creating mesh with API. Anyway, I am plaining to get some suggestion internally from ESMF team. I'll update you when I have more information. @josephzhang8 Yes. We could meet about this issue but let's wait until I discuss it with the team tomorrow (we have seeking core team call every Wednesday).

@uturuncoglu
Copy link
Collaborator

uturuncoglu commented Oct 15, 2024

@saeed-moghimi-noaa I am seeing same problem in ADCIRC model. Here is the same plot but in this case last column is ADCIRC grid. It is same with SCHSIM.

Screenshot 2024-10-15 at 2 06 49 PM

I think this is a result of the way of generating grid in the ocean model side. Anyway, I'll try to debug and if we solve with SCHSIM we could also port same way to ADCIRC too.

@uturuncoglu
Copy link
Collaborator

@yunfangsun @janahaddad BTW, i checked coastal_ike_shinnecock_atm2sch2ww3 case and WW3 ESMF mesh file has 6496 nodes and 3070 element but SCHSIM hgrid file has 5780 and 3070. So, it seems that number of elements are not matching. I am not sure who and how those files are created, there could be also issue with the configuration. I think all of those needs to be check carefully. As I know @mansurjisan was working on these RTs and creating files with Python from scratch but not sure he realized this difference or not.

@uturuncoglu
Copy link
Collaborator

@yunfangsun If you don't mind, could you share some information about the mesh creation process. How are you creating hgrid.ll or hgrid.gr3 file. Are you creating first WW3 grid and then convert it to SCHSIM grid or the opposite?

@mansurjisan @yunfangsun @janahaddad I am checking a single element from coastal_ike_shinnecock_atm2sch2ww3 case that has extra node in the middle of the element for WW3 but not for SCHSIM. It seems that the coordinate of the extra node found on WW3 mesh not available in the hgrid.gr3 file. I am assuming mesh files that is found in coastal_ike_shinnecock_atm2sch2ww3 are correct and consistent. If so, there could be only two possibility: (1) WW3 adding extra node to that element (I am not sure this is the case but I'll check the scrip file) and (2) SCHSIM preprocessing script or mesh generation is removing that extra node or even not have it. Is there any way to read in scrip.nc file (or inlet.msh in coastal_ike_shinnecock_atm2sch2ww3 case) and create hgrid.gr3 file from it.

I could also do similar check for Ian case but lets focus this low resolution first since it would be easier for me to debug.

@yunfangsun
Copy link
Collaborator Author

yunfangsun commented Oct 16, 2024

Hi @uturuncoglu,

The SCHISM mesh files hgrid.ll or hgrid.gr3 are made from mesh creation programs such as Ocsmesh, SMS, etc. and I was using hgrid.gr3 to convert it to the WW3 mesh file inlet.msh . And use WW3 program to generate the scrip.nc from inlet.msh, and then use the ESMF_Scrip2Unstruct to convert it to esmf mesh.

And for the RT case of coastal_ike_shinnecock_atm2sch2ww3_intel, both SCHISM and WW3 use the same mesh. SCHISM mesh file hgrid.gr3 has 5780 cells and 3070 nodes, and ww3 mesh file inlet.msh has 5855 cells and 3070 nodes, the cell differences of 75 cells are the open boundary node cells.

However, the ww3 esmf mesh generates 6496 nodes and 3070 cells.

netcdf mesh.shinnecock.cdf5 {
dimensions:
	nodeCount = 6496 ;
	elementCount = 3070 ;
	maxNodePElement = 9 ;
	coordDim = 2 ;

When I developed the Hurricane Ian case with the three meshes, I also used the same mesh for both schism and WW3,

For the low resolution SCHISM mesh at /work2/noaa/nos-surge/yunfangs/hurricane_ian/ufs_atm2sch2ww3_crs:

SCHISM is using 1069847 cells and 557481 nodes, WW3 is using the same mesh as follows:

netcdf ww_noobc_crs_coupled {
dimensions:
	time = UNLIMITED ; // (384 currently)
	ny = 1 ;
	nx = 557481 ;
	ne = 1069847 ;
	nn = 3 ;
	noswll = 4 ;

For the middle resolution HSOFS mesh at /work2/noaa/nos-surge/yunfangs/hurricane_ian/ufs_atm2sch2ww3_hsofs

SCHISM is using 3564104 cells and 1813443 nodes, WW3's esmf mesh is as follows:

netcdf hsofs_ESMFmesh {
dimensions:
	nodeCount = 3690012 ;
	elementCount = 1813443 ;
	maxNodePElement = 10 ;
	coordDim = 2 ;

For the high resolution of combined mesh at /work2/noaa/nos-surge/yunfangs/stmp/yunfangs/FV3_RT/ufuk/coastal_ian_subset_atm2sch2ww3_intel_1_1_new_msh09182024

SCHISM is using 4885987 cells and 2470026 nodes, ww3's esmf mesh is as follows:

netcdf scrip_ww3_esmf {
dimensions:
	nodeCount = 4995917 ;
	elementCount = 2470026 ;
	maxNodePElement = 11 ;
	coordDim = 2 ;

@uturuncoglu
Copy link
Collaborator

uturuncoglu commented Oct 16, 2024

@yunfangsun If I am following correctly, the source mesh is created by external mesh tool and the result mesh is used to create both WW3 and SCHSIM meshes. Right? Is this tool creating hgrid.gr3 file directly? Are you using any other tool to go from mesh generation tool to SCHSIM mesh? If not, I am surprised since I could not find center node in the SCHSIM mesh file. Maybe WW3 is doing something special and adding those center nodes. How can be sure about the mesh file produced by the mesh tool and the hgrid.gr3 are same. If we could plot them and they are same and similar with the VTK files that I am producing then WW3 is doing something special that I don't know.

@platipodium
Copy link
Member

I suggest 3 of us (Ufuk, Carsten and I) have a meeting to discuss this asap. I'm available tomorrow between 10 and noon ET. Thx.

Are you still on daylight saving time? 10-12 ET would then be 16-18 CEST, I am traveling but could find some time in that slot as well.

@platipodium
Copy link
Member

mesh file produced by the mesh tool and the hgrid.gr3 are same. If we could plot them and they are same and similar with the VTK files that I am producing then WW3 is doing something special that I don't know.

Does this call for a quick Hgrid2Scrip conversion tool? ... SCHISM output is Ugrid so internally we kind of do this.

@uturuncoglu
Copy link
Collaborator

@platipodium At this point I am not sure about the source of the problem and I am trying to narrow down. The first thing that we can do is to be sure that the source grid (created by some meshing tool) is converted correctly to both model input format (scrip file for WW3 and hgrid.gr3 for SCHSIM). My initial impression is that we do not have same information in SCHSIM side like we have in WW3. So, I just want to be sure that SCHSIM is reading correct information and has all the nodes that is available to WW3. This can be done by creating mesh plot just after having output of meshing tool and compare with the information found in the hgrid.gr3. Of course I am assuming the meshing tool has additional step to convert internal mesh representation to hgrid.gr3 format. If they are same, then I am plaining to look at SCHSIM cap (especially mesh creation part) to identify the source (I did not see anything obvious in my initial pass). anyway, I could make a call on Friday morning (around 9-9:30 MT) if it works for everyone. I also suggest to include @yunfangsun and @mansurjisan. Since they have experience of creating those configurations.

@josephzhang8
Copy link
Collaborator

Friday 9am MT works for me. Thx @uturuncoglu.

@josephzhang8
Copy link
Collaborator

As a reference, the plot below shows the elem-based decomposition used in SCHISM:

image

@mansurjisan
Copy link

Hi @uturuncoglu ,

I’ve primarily been working on the atm2sch configuration, so I haven’t had a chance to dive into the mesh issue with the atm2sch2ww3 setup yet. I’ll join the Friday 9 am MT meeting and look forward to learning more about it then.

-Mansur

@saeed-moghimi-noaa
Copy link

saeed-moghimi-noaa commented Oct 16, 2024

From meeting on 10/16/2024

@janahaddad
Copy link
Collaborator

@saeed-moghimi-noaa @josephzhang8 @danishyo @aliabdolali please note I'm moving the comment from our meeting today (above) to oceanmodeling/WW3#11 , where Dan was originally updating on this topic.

@platipodium
Copy link
Member

@platipodium I am not sure this is simply decomposition issue or not. Maybe it is just the way of creating mesh with API.

Just saying that despite the fact that Joseph and I are pretty confident (with your help) about our part of the Mesh Generation there might always the possibility of problem due to lack of testing.

@uturuncoglu
Copy link
Collaborator

@yunfangsun @josephzhang8 @platipodium @janahaddad @saeed-moghimi-noaa I have some update about the issues that we faced in here;

  1. I tracked the issue about interpolation error with help from ESMF team. If you remember the interpolation was giving an error in the high-resolution configuration in the interpolation weights (route handle) generation step. With help of debug version of ESMF, we track the issue in the ESMF library and turns out that it seems that we are hitting an edge case in the library side. Bob is working on a fix in the ESMF library side and once it is available I'll test with the Ian configuration. If the fix works, it would be available in next beta snapshot and eventually go to ESMF 8.8 release. So, stay tuned and I'll update you when I have something new.

  2. The second issue was unmatched grids in WW3 and SCHSIM side. So, I also tracked this one to WW3 and I found issue with WW3 scrip generation. The scrip generation part was doing something special related with the mesh and it was ending up a different mesh (only for unstructured case) than what is defined in the text mesh input file. The result mesh is not completely different but it was switching nodes with elements (dual case). I also discussed with Denise from NOAA/EMC and she also thinks that there could be issue over there. Anyway, I fixed the issue with scrip generation part (ww3_grid tool, https://github.com/NOAA-EMC/WW3/blob/abc77b992c54d0b05169c624c35cebf25da97a68/model/src/wmscrpmd.F90#L547 routine) and with the fix could have identical mesh with ocean. @aliabdolali I am not sure why this routine is fixed to use DUALAPPROACH all the time. After that I try to use this mesh with WW3 but it did not work correctly. The WW3 NUOPC cap is trying to remove the ghost points but since we don't have them in the mesh, it is removing some part of the domain from the mesh. Anyway, Denise will try my fix in the SCRIP generation part with global application to see what happens. Probably we need some fix also in the cap (around https://github.com/NOAA-EMC/WW3/blob/c7004b658b9dae9fc473d4e6511dfc1cf8e6a7bd/model/src/wav_comp_nuopc.F90#L747 and also maybe in the model itself https://github.com/NOAA-EMC/WW3/blob/abc77b992c54d0b05169c624c35cebf25da97a68/model/src/w3parall.F90#L1489) but we are not sure yet. I'll keep continue to look at the issue and maybe we could discuss it in the next meeting. @aliabdolali please let me know if you have any extra information about the scrip implementation or any suggestion. @janahaddad if you don't mind could you add a discussion topic to our next call about this issue. @platipodium @josephzhang8 At this point, SCHSIM side looks fine since we track and find the issue in WW3 side. Thanks all of you for your help.

@josephzhang8
Copy link
Collaborator

@uturuncoglu : thank you so much for working on these issues!
I know there are multiple meetings related to UFS, but feel free to ask me to join whenever you need me.

@uturuncoglu
Copy link
Collaborator

@janahaddad @yunfangsun @aliabdolali @saeed-moghimi-noaa @josephzhang8 JFYI, I created an issue in NOAA-EMC side to track this issue, NOAA-EMC/WW3#1319

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Progress
Development

No branches or pull requests

8 participants