Replies: 8 comments 4 replies
-
Hi Joey, Thank you for reaching out. I'm sure it could be solved quickly. What is the path of the software stack on Orion modules that you use? |
Beta Was this translation helpful? Give feedback.
-
Please make sure your HAFS modulefile has the following location of the software stack on Orion: |
Beta Was this translation helpful? Give feedback.
-
Hi Joey, |
Beta Was this translation helpful? Give feedback.
-
Hi @jak5808 , I checked in with our Software Integration (SI) Team, who has been working with you, and they are working on troubleshooting this issue, but they may not be able to get back to you until after the holidays since many team members are off this week and/or next. I'm hopeful they will get you an answer sooner rather than later, but in case they don't/can't, I wanted to make sure you know that we haven't forgotten your question! Best, |
Beta Was this translation helpful? Give feedback.
-
@jak5808 -
Is it the correct domain decomposition? |
Beta Was this translation helpful? Give feedback.
-
Hi @jak5808 , Just wanted to check in and see if this issue got resolved or if you have further questions for @natalie-perlin ? Thanks, |
Beta Was this translation helpful? Give feedback.
-
Hey Gillian,
Thanks for checking in. I missed Natalie's last response, for which I
apologize, but at this point it seems like the issue has been resolved. I
have not received any similar error messages recently, although I will keep
an eye out. Did the SI Team figure out what was going wrong? I haven't
changed any of my model configurations or domain decompositions so it was
nothing on my end.
Thanks again,
Joey
…On Mon, Feb 12, 2024 at 11:03 AM Gillian Petro ***@***.***> wrote:
Hi @jak5808 <https://github.com/jak5808> ,
Just wanted to check in and see if this issue got resolved or if you have
further questions for @natalie-perlin <https://github.com/natalie-perlin>
?
Thanks,
Gillian
—
Reply to this email directly, view it on GitHub
<#253 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AQJDAWQOGPYPDHYLRW3UDMTYTI4M3AVCNFSM6AAAAABAQF5CCCVHI2DSMVQWIX3LMV43SRDJONRXK43TNFXW4Q3PNVWWK3TUHM4DINBTGYYTE>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
Joey, |
Beta Was this translation helpful? Give feedback.
-
System: Orion
HAFS version: this issue has occurred across different HAFS branches, including develop
Last month, a number of module libraries were moved from /work/noaa/epic-ps to a new location (/work/noaa/epic/) after the epic-ps directory that contained old libraries was removed. Ever since that move happened, many of my jobs will fail due to the modules being "unreachable". I can rewind the jobs until they succeed, but it can take as many as 4 or 5 rewinds before the modules behave, which is a pain for ensemble jobs. The modules fail as soon as the executable is called with the following error messages included:
An example log file can be found here:
/work2/noaa/aoml-hafsda/knisely/HAFS_May_23_hfsb_dualres_ensda_olbc_augens06/2022082700/00L/hafs_forecast_ens010.log.0
I am currently running these experiments again with a clean .bashrc to see if the issue is due to conflicting modules/environmental settings, however I have talked to multiple fellow community members who are experiencing this too, so I think it's unlikely this is the issue. Please let me know if you would like any more information from me.
Thanks,
Joey
Beta Was this translation helpful? Give feedback.
All reactions