Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[v2.11.0 release] Recipe failures on DKRZ: OSError: [Errno -101] NetCDF: HDF error #3702

Open
ehogan opened this issue Jul 1, 2024 · 5 comments
Labels

Comments

@ehogan
Copy link
Contributor

ehogan commented Jul 1, 2024

Describe the bug
Following ESMValCore v2.11.0rc2 testing, a few recipes are failing with OSError: [Errno -101] NetCDF: HDF error:

  • recipe_collins13ipcc.yml
  File "/miniforge3/envs/rc2-env/lib/python3.11/site-packages/iris/fileformats/netcdf/_thread_safe_nc.py", line 384, in __setitem__
    dataset = netCDF4.Dataset(self.path, "r+")
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "src/netCDF4/_netCDF4.pyx", line 2470, in netCDF4._netCDF4.Dataset.__init__
  File "src/netCDF4/_netCDF4.pyx", line 2107, in netCDF4._netCDF4._ensure_nc_success
OSError: [Errno -101] NetCDF: HDF error: '/esmvaltool_output/recipe_collins13ipcc_20240627_164355/preproc/IAV_calc_tas/tas/CMIP5_CCSM4_Amon_piControl_r1i1p1_tas_0250-1300.nc'
  • recipe_ipccwg1ar6ch3_atmosphere.yml
  File "/miniforge3/envs/rc2-env/lib/python3.11/site-packages/iris/fileformats/netcdf/_thread_safe_nc.py", line 384, in __setitem__
    dataset = netCDF4.Dataset(self.path, "r+")
    ^^^^^^^^^^^^^^^^^
  File "src/netCDF4/_netCDF4.pyx", line 2470, in netCDF4._netCDF4.Dataset.__init__
  File "src/netCDF4/_netCDF4.pyx", line 2107, in netCDF4._netCDF4._ensure_nc_success
OSError: [Errno -101] NetCDF: HDF error: '/esmvaltool_output/recipe_ipccwg1ar6ch3_atmosphere_20240627_170126/preproc/IAV_calc_tas_cmip5/tas/CMIP5_bcc-csm1-1-m_Amon_piControl_r1i1p1_tas_0001-0400.nc'
  • recipe_tebaldi21esd.yml
  File "/miniforge3/envs/rc2-env/lib/python3.11/site-packages/iris/fileformats/netcdf/_thread_safe_nc.py", line 384, in __setitem__
    dataset = netCDF4.Dataset(self.path, "r+")
    ^^^^^^^^^^^^^^^^^
  File "src/netCDF4/_netCDF4.pyx", line 2470, in netCDF4._netCDF4.Dataset.__init__
  File "src/netCDF4/_netCDF4.pyx", line 2107, in netCDF4._netCDF4._ensure_nc_success
OSError: [Errno -101] NetCDF: HDF error: '/esmvaltool_output/recipe_tebaldi21esd_20240627_164355/preproc/fig6d_IAV/tas/CMIP6_MIROC6_Amon_piControl_r1i1p1f1_pr_gn_3200-3500.nc'
  • recipe_preprocessor_derive_test.yml
  File "/miniforge3/envs/rc2-env/lib/python3.11/site-packages/iris/fileformats/netcdf/_thread_safe_nc.py", line 384, in __setitem__
    dataset = netCDF4.Dataset(self.path, "r+")
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "src/netCDF4/_netCDF4.pyx", line 2470, in netCDF4._netCDF4.Dataset.__init__
  File "src/netCDF4/_netCDF4.pyx", line 2107, in netCDF4._netCDF4._ensure_nc_success
OSError: [Errno -101] NetCDF: HDF error: '/esmvaltool_output/recipe_preprocessor_derive_test_20240627_180859/preproc/cmip5/ohc/CMIP5_CCSM4_Omon_historical_r1i1p1_ohc_2004-2005.nc'

The first three were failing during ESMValTool v2.10.0 testing. The last recipe started failing during ESMValCore v2.11.0rc1 testing.

I intend to add these to the list of broken recipes via #3662.

@bouweandela
Copy link
Member

@ehogan Have you tried running any of these recipes on a different machine, e.g. Jasmin? If I remember correctly, we previously thought the Levante filesystem caused these issues. I just ran recipe_preprocessor_derive_test.yml (except for the cmip6/toz variable, see #3709) on my laptop and it runs without HDF errors.

@ehogan
Copy link
Contributor Author

ehogan commented Jul 2, 2024

@ehogan Have you tried running any of these recipes on a different machine, e.g. Jasmin? If I remember correctly, we previously thought the Levante filesystem caused these issues. I just ran recipe_preprocessor_derive_test.yml (except for the cmip6/toz variable, see #3709) on my laptop and it runs without HDF errors.

I haven't. I have just set up the other three recipes to run on JASMIN, but I won't have the results until tomorrow (given the timings from the v2.9.0 testing):

  • recipe_collins13ipcc.yml (time=7:49:15, mem=272.6)
  • recipe_ipccwg1ar6ch3_atmosphere.yml (time=1:42:01, mem=179.0)
  • recipe_tebaldi21esd.yml (time=1:50:04, mem=135.0)

@ehogan
Copy link
Contributor Author

ehogan commented Jul 3, 2024

I am struggling to get the first two recipes to run on JASMIN (they keep failing with what appear to be various memory related errors), and the third recipe failed due to missing data, even though I had search_esgf: when_missing set in the ESMValTool user configuration file 😞

@ehogan
Copy link
Contributor Author

ehogan commented Jul 3, 2024

Even though there were memory errors, the recipe_collins13ipcc.yml recipe has just completed on JASMIN 🥳

2024-07-03 14:19:22,220 UTC [33611] INFO    Time for running the recipe was: 7:35:57.567925
2024-07-03 14:19:22,518 UTC [33611] INFO    Maximum memory used (estimate): 128.9 GB
[...]
2024-07-03 14:20:02,500 UTC [33611] INFO    Run was successful

@ehogan ehogan changed the title Recipe failures: OSError: [Errno -101] NetCDF: HDF error Recipe failures on DKRZ: OSError: [Errno -101] NetCDF: HDF error Jul 3, 2024
@ehogan
Copy link
Contributor Author

ehogan commented Jul 4, 2024

I am struggling to get the first two recipes to run on JASMIN (they keep failing with what appear to be various memory related errors), and the third recipe failed due to missing data, even though I had search_esgf: when_missing set in the ESMValTool user configuration file 😞

The second recipe also failed due to missing data 😞

@valeriupredoi valeriupredoi changed the title Recipe failures on DKRZ: OSError: [Errno -101] NetCDF: HDF error [v2.11.0 release] Recipe failures on DKRZ: OSError: [Errno -101] NetCDF: HDF error Sep 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants