Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FATES landuse default fluh_timeseries crashes FatesColdLUH test on izumi #2653

Open
glemieux opened this issue Jul 16, 2024 · 0 comments
Open

Comments

@glemieux
Copy link
Collaborator

Brief summary of bug

Updating the default fluh_timeseries file to a newer, larger version of the file is causing the model to crash during initialization.

General bug information

CTSM version you are using: ctsm5.2.011-123-g322097f0a

Does this bug cause significantly incorrect results in the model's science? Yes (crashes model)

Configurations affected: configurations matching FatesColdLUH2 testmod

Details of bug

This was discovered in the process of testing NGEET/fates#1223. The FatesColdLUH2 test in the fates suite fails RUN very early into the process. From the lnd.log file it looks like the finundated read upper bound step isn't reporting the correct file that it's reading from, but I think that might be a red herring. Note that this doesn't appear to be an issue on derecho or perlmutter.

I can confirm that switching the fluh_timeseries to an older file that has a shorter time length does not present this issue. That said, the size of the file does not appear to be an issue after attempting to run the test case with a copy of the same file, but truncated to a shorter time. I will also note that the older file is formatted with the classic netcdf type, where as the newer file is cdf5. That said, I'm not sure how relevant that is as the flandusepftdat file that is used in this test does not present an issue when used in conjunction with the older fluh_timeseries file.

It is possible that the newer file, which was generated via the fates land use tool, could be introducing an issue based on an update since the initial tool development when the original default was created (the original file was created when the tool was located as part of the fates repository). Issue NGEET/tools-fates-landusedata#5 to investigate potential causes on that side.

Important details of your setup / configuration so we can reproduce the bug

fates tag: sci.1.77.0_api.36.0.0

Important output or errors that show the problem

lnd.log

successfully initialized sdat
(shr_strdata_readstrm) opening   : /fs/cgd/csm/inputdata/lnd/clm2/paramdata/finundated_inversiondata_0.9x1.25_c170706.nc
(shr_strdata_readstrm) setting pio descriptor : /fs/cgd/csm/inputdata/lnd/clm2/paramdata/finundated_inversiondata_0.9x1.25_c170706.nc
(shr_strdata_set_stream_iodesc) setting iodesc for : FWS_TWS_A with dimlens(1), dimlens(2) =      288       192   variable as time dimension time
(shr_strdata_readstrm) reading file lb: /fs/cgd/csm/inputdata/lnd/clm2/paramdata/finundated_inversiondata_0.9x1.25_c170706.nc       1
(shr_strdata_readstrm) reading file ub: /fs/cgd/csm/inputdata/lnd/clm2/

cesm.log

Obtained 10 stack frames.
/scratch/cluster/glemieux/ctsm-tests/tests_0716-152356iz/ERS_D_Ld3.f45_f45_mg37.I2000Clm50FatesCruRsGs.izumi_nag.clm-FatesColdLUH2.0716-152356iz/bld/cesm.exe() [0x336f214]
/scratch/cluster/glemieux/ctsm-tests/tests_0716-152356iz/ERS_D_Ld3.f45_f45_mg37.I2000Clm50FatesCruRsGs.izumi_nag.clm-FatesColdLUH2.0716-152356iz/bld/cesm.exe() [0x336f748]
/scratch/cluster/glemieux/ctsm-tests/tests_0716-152356iz/ERS_D_Ld3.f45_f45_mg37.I2000Clm50FatesCruRsGs.izumi_nag.clm-FatesColdLUH2.0716-152356iz/bld/cesm.exe() [0x336fcc8]
/scratch/cluster/glemieux/ctsm-tests/tests_0716-152356iz/ERS_D_Ld3.f45_f45_mg37.I2000Clm50FatesCruRsGs.izumi_nag.clm-FatesColdLUH2.0716-152356iz/bld/cesm.exe() [0x33727f9]
/scratch/cluster/glemieux/ctsm-tests/tests_0716-152356iz/ERS_D_Ld3.f45_f45_mg37.I2000Clm50FatesCruRsGs.izumi_nag.clm-FatesColdLUH2.0716-152356iz/bld/cesm.exe(PIOc_openfile+0x11) [0x336e611]
/scratch/cluster/glemieux/ctsm-tests/tests_0716-152356iz/ERS_D_Ld3.f45_f45_mg37.I2000Clm50FatesCruRsGs.izumi_nag.clm-FatesColdLUH2.0716-152356iz/bld/cesm.exe() [0x33230e9]
/scratch/cluster/glemieux/ctsm-tests/tests_0716-152356iz/ERS_D_Ld3.f45_f45_mg37.I2000Clm50FatesCruRsGs.izumi_nag.clm-FatesColdLUH2.0716-152356iz/bld/cesm.exe() [0xa2a117]
/scratch/cluster/glemieux/ctsm-tests/tests_0716-152356iz/ERS_D_Ld3.f45_f45_mg37.I2000Clm50FatesCruRsGs.izumi_nag.clm-FatesColdLUH2.0716-152356iz/bld/cesm.exe() [0xaa737f]
/scratch/cluster/glemieux/ctsm-tests/tests_0716-152356iz/ERS_D_Ld3.f45_f45_mg37.I2000Clm50FatesCruRsGs.izumi_nag.clm-FatesColdLUH2.0716-152356iz/bld/cesm.exe() [0xb3ef07]
/scratch/cluster/glemieux/ctsm-tests/tests_0716-152356iz/ERS_D_Ld3.f45_f45_mg37.I2000Clm50FatesCruRsGs.izumi_nag.clm-FatesColdLUH2.0716-152356iz/bld/cesm.exe() [0xa8a293]
[cli_1]: aborting job:
application called MPI_Abort(MPI_COMM_WORLD, -1) - process 1
[[email protected]] HYDT_bscd_pbs_wait_for_completion (tools/bootstrap/external/pbs_wait.c:67): tm_poll(obit_event) failed with TM error 17002
[[email protected]] HYDT_bsci_wait_for_completion (tools/bootstrap/src/bsci_wait.c:23): launcher returned error waiting for completion
```.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant