Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

install spack-stack-1.4.0 on gaea C5 after Gaea's upgrade@20241015 #1347

Open
jieshunzhu opened this issue Oct 16, 2024 · 33 comments
Open

install spack-stack-1.4.0 on gaea C5 after Gaea's upgrade@20241015 #1347

jieshunzhu opened this issue Oct 16, 2024 · 33 comments
Assignees
Labels
NOAA-EMC OAR-EPIC NOAA Oceanic and Atmospheric Research and Earth Prediction Innovation Center

Comments

@jieshunzhu
Copy link

Hello,

I am running an old version soca-science which needs to use spack-stack-1.4.0. Previously we followed Dom's instruction and installed spack-stack-1.4.0 on my own directory. But since Gaea's recent upgrade, the old compilers (e.g., PrgEnv-intel/8.3.3;
intel-classic/2022.2.1; cray-mpich/8.1.25) are not available.

Did anyone try to install any version of spack-stack after the Gaea upgrade? Could you please instruct me where to modify and how to install spack-stack-1.4.0?

Thanks,
Jieshun

@jieshunzhu jieshunzhu changed the title build spack-stack-1.4.0 on gaea C5 after Gaea's upgrade@20241015 install spack-stack-1.4.0 on gaea C5 after Gaea's upgrade@20241015 Oct 16, 2024
@climbfuji climbfuji added NOAA-EMC OAR-EPIC NOAA Oceanic and Atmospheric Research and Earth Prediction Innovation Center labels Oct 16, 2024
@RatkoVasic-NOAA
Copy link
Collaborator

@jieshunzhu we are re-installing spack-stacks 1.6.0, 1.7.0 and 1.8.0 on Gaea-C5. Currently, I'm installing 1.6.0.
I suggest you look in my differences between yaml files in old and new installations:

/ncrc/proj/epic/spack-stack/spack-stack-1.6.0/envs/unified-env-OLD-BEFORE_C5-UPGRADE/site/
and
/ncrc/proj/epic/spack-stack/spack-stack-1.6.0/envs/unified-env/site/

Only compilers.yaml and packages.yaml should be changed.

Since you already installed 1.4.0 before, follow same instructions, just change these two yaml files.

Also, before installation make sure new modules are loaded:

module load PrgEnv-intel/8.5.0      (should be already loaded by default)
module load intel-classic/2023.2.0
module load cray-mpich/8.1.28       (should be already loaded by default)
module load  python/3.9.12          (should be already loaded by default)

@jieshunzhu
Copy link
Author

@RatkoVasic-NOAA @climbfuji I tried to reinstall spack-stack1.4.0 by modifying compilers.yaml and packages.yaml. I got an error in "spack install". It looks like the error is related to "mapl". My installation log is here: /gpfs/f5/cfsrl/scratch/JieShun.Zhu/util/spack-stack/c5/spack-stack/log5.install

When you have time, could you please help me take a look at it? Thanks a lot!

@RatkoVasic-NOAA
Copy link
Collaborator

@jieshunzhu
Maybe we can found something in build cache:

Ratko.Vasic@gaea58:/gpfs/f5/epic/scratch/Ratko.Vasic/WM-1.6.0/ufs-weather-model/tests> ll -d /gpfs/f5/cfsrl/scratch/JieShun.Zhu/util/spack-stack/c5/spack-stack/cache/build_stage
drwx------ 19 JieShun.Zhu ncep 4096 Oct 18 08:12 /gpfs/f5/cfsrl/scratch/JieShun.Zhu/util/spack-stack/c5/spack-stack/cache/build_stage

Can you please allow read permission.

@jieshunzhu
Copy link
Author

@RatkoVasic-NOAA Thanks for quick response. I have changed the access permission

@RatkoVasic-NOAA
Copy link
Collaborator

I looked at log file. MAPL installation crashed without any meaningful message:

tail /gpfs/f5/cfsrl/scratch/JieShun.Zhu/util/spack-stack/c5/spack-stack/cache/build_stage/spack-stage-mapl-2.35.2-dply6wc65o7iuanmpy3ubasn5zhlvyb4/spack-build-out.txt

/usr/bin/ranlib ../lib/libMAPL.pfio.a
make[2]: Leaving directory '/gpfs/f5/cfsrl/scratch/JieShun.Zhu/util/spack-stack/c5/spack-stack/cache/build_stage/spack-stage-mapl-2.35.2-dply6wc65o7iuanmpy3ubasn5zhlvyb4/spack-build-dply6wc'
[ 50%] Built target MAPL.pfio
make[1]: Leaving directory '/gpfs/f5/cfsrl/scratch/JieShun.Zhu/util/spack-stack/c5/spack-stack/cache/build_stage/spack-stage-mapl-2.35.2-dply6wc65o7iuanmpy3ubasn5zhlvyb4/spack-build-dply6wc'
make: *** [Makefile:169: all] Error 2

Do you know if it is important to keep that MAPL version ([email protected]/[email protected])?
If not, you can try with next MAPL version: [email protected]/[email protected]

For that you'll have to replace files:

/gpfs/f5/cfsrl/scratch/JieShun.Zhu/util/spack-stack/c5/spack-stack/spack/var/spack/repos/builtin/packages/mapl/package.py
/gpfs/f5/cfsrl/scratch/JieShun.Zhu/util/spack-stack/c5/spack-stack/spack/var/spack/repos/builtin/packages/esmf/package.py

with:

/ncrc/proj/epic/spack-stack/spack-stack-1.5.1/spack/var/spack/repos/builtin/packages/mapl/package.py
/ncrc/proj/epic/spack-stack/spack-stack-1.5.1/spack/var/spack/repos/builtin/packages/esmf/package.py

And update version numbers (mapl and esmf) in:

/gpfs/f5/cfsrl/scratch/JieShun.Zhu/util/spack-stack/c5/spack-stack/envs/unified-dev/common/modules.yaml
/gpfs/f5/cfsrl/scratch/JieShun.Zhu/util/spack-stack/c5/spack-stack/envs/unified-dev/common/packages.yaml

@jieshunzhu
Copy link
Author

Thanks @RatkoVasic-NOAA. Let me try your suggestion....

@jieshunzhu
Copy link
Author

@RatkoVasic-NOAA I tried to build with [email protected]/[email protected]. But at the step of "spack concretize" I got the error saying "==> Error: FilePatch: Patch file backport-b571b3f-from-develop-to-v2.40.3.patch for package builtin.mapl does not exist."

Here is my directory: /gpfs/f5/cfsrl/scratch/JieShun.Zhu/util/spack-stack/c5_20241021t/spack-stack.
Can you help me take a look at my problem when you are available?

In addition, I used the script: /ncrc/proj/epic/spack-stack/spack-stack-1.5.1/spack/var/spack/repos/builtin/packages/mapl/package.py.
Is my error related to this script?

@RatkoVasic-NOAA
Copy link
Collaborator

@RatkoVasic-NOAA I tried to build with [email protected]/[email protected]. But at the step of "spack concretize" I got the error saying "==> Error: FilePatch: Patch file backport-b571b3f-from-develop-to-v2.40.3.patch for package builtin.mapl does not exist."

Here is my directory: /gpfs/f5/cfsrl/scratch/JieShun.Zhu/util/spack-stack/c5_20241021t/spack-stack. Can you help me take a look at my problem when you are available?

In addition, I used the script: /ncrc/proj/epic/spack-stack/spack-stack-1.5.1/spack/var/spack/repos/builtin/packages/mapl/package.py. Is my error related to this script?

@jieshunzhu Yes, there are patch files, so maybe the best is to replace your mapl and esmf directories:

/gpfs/f5/cfsrl/scratch/JieShun.Zhu/util/spack-stack/c5_20241021t/spack-stack/spack/var/spack/repos/builtin/packages/mapl/
/gpfs/f5/cfsrl/scratch/JieShun.Zhu/util/spack-stack/c5_20241021t/spack-stack/spack/var/spack/repos/builtin/packages/esmf/

with

/ncrc/proj/epic/spack-stack/spack-stack-1.5.1/spack/var/spack/repos/builtin/packages/mapl/
/ncrc/proj/epic/spack-stack/spack-stack-1.5.1/spack/var/spack/repos/builtin/packages/esmf/

@jieshunzhu
Copy link
Author

jieshunzhu commented Oct 21, 2024

@RatkoVasic-NOAA Thanks for your quick response. I got your idea and will try replacing the two whole directories.

@jieshunzhu
Copy link
Author

@RatkoVasic-NOAA By replacing the two directories, I got another error in the "spack concretize" step, saying "==> Error: too many values to unpack (expected 1)".

@RatkoVasic-NOAA
Copy link
Collaborator

I don't know if this is making problem, but you have all compilers in spack.yaml:

grep compilers:   /gpfs/f5/cfsrl/scratch/JieShun.Zhu/util/spack-stack/c5_20241021t/spack-stack/envs/unified-dev/spack.yaml
  - compilers: ['%aocc', '%apple-clang', '%gcc', '%intel']

@RatkoVasic-NOAA
Copy link
Collaborator

And, also, you have

      mapl:
        suffixes:
          ^[email protected]~debug snapshot=none: 'esmf-8.2.0'
          ^[email protected]+debug snapshot=none: 'esmf-8.2.0-debug'
          ^[email protected]~debug snapshot=b09: 'esmf-8.3.0b09'
          ^[email protected]+debug snapshot=b09: 'esmf-8.3.0b09-debug'
          ^[email protected]~debug snapshot=none: 'esmf-8.3.0'
          ^[email protected]+debug snapshot=none: 'esmf-8.3.0-debug'
          ^[email protected]~debug snapshot=none: 'esmf-8.4.0'
          ^[email protected]+debug snapshot=none: 'esmf-8.4.0-debug'
          ^[email protected]~debug snapshot=none: 'esmf-8.4.1'
          ^[email protected]+debug snapshot=none: 'esmf-8.4.1-debug'
          ^[email protected]~debug snapshot=none: 'esmf-8.4.2'
          ^[email protected]+debug snapshot=none: 'esmf-8.4.2-debug'
          ^[email protected]~debug snapshot=none: 'esmf-8.5.0'
          ^[email protected]+debug snapshot=none: 'esmf-8.5.0-debug'

twice in /gpfs/f5/cfsrl/scratch/JieShun.Zhu/util/spack-stack/c5_20241021t/spack-stack/envs/unified-dev/common/modules.yaml

@jieshunzhu
Copy link
Author

And, also, you have

      mapl:
        suffixes:
          ^[email protected]~debug snapshot=none: 'esmf-8.2.0'
          ^[email protected]+debug snapshot=none: 'esmf-8.2.0-debug'
          ^[email protected]~debug snapshot=b09: 'esmf-8.3.0b09'
          ^[email protected]+debug snapshot=b09: 'esmf-8.3.0b09-debug'
          ^[email protected]~debug snapshot=none: 'esmf-8.3.0'
          ^[email protected]+debug snapshot=none: 'esmf-8.3.0-debug'
          ^[email protected]~debug snapshot=none: 'esmf-8.4.0'
          ^[email protected]+debug snapshot=none: 'esmf-8.4.0-debug'
          ^[email protected]~debug snapshot=none: 'esmf-8.4.1'
          ^[email protected]+debug snapshot=none: 'esmf-8.4.1-debug'
          ^[email protected]~debug snapshot=none: 'esmf-8.4.2'
          ^[email protected]+debug snapshot=none: 'esmf-8.4.2-debug'
          ^[email protected]~debug snapshot=none: 'esmf-8.5.0'
          ^[email protected]+debug snapshot=none: 'esmf-8.5.0-debug'

twice in /gpfs/f5/cfsrl/scratch/JieShun.Zhu/util/spack-stack/c5_20241021t/spack-stack/envs/unified-dev/common/modules.yaml

@RatkoVasic-NOAA It looks like the modules.yaml @ spack-stack1.4.0 includes parts for "tcl" and "lmod". But in spack-stack1.6.0, there are two separate files - modules_lmod.yaml and modules_tcl.yaml.

@jieshunzhu
Copy link
Author

@climbfuji @RatkoVasic-NOAA @AlexanderRichert-NOAA Hi All, after so many tests, I still got the same error "==> Error: too many values to unpack (expected 1)" in "spack concretize". Could you please give me some hint about my problem? Thanks.

@AlexanderRichert-NOAA
Copy link
Collaborator

Can you provide exact steps to reproduce?

@jieshunzhu
Copy link
Author

jieshunzhu commented Oct 21, 2024

Can you provide exact steps to reproduce?

Thanks for your reply, Alex. Here are my steps.

  1. git clone --recurse-submodules https://github.com/JCSDA/spack-stack -b release/1.4.0
  2. update configs/sites/gaea-c5 with that at /gpfs/f5/cfsrl/scratch/JieShun.Zhu/util/spack-stack/c5_20241021t/spack-stack/configs/sites
  3. replace spack/var/spack/repos/builtin/packages/esmf & mapl with those at /ncrc/proj/epic/spack-stack/spack-stack-1.5.1/
  4. update modules.yaml & packages.yaml with that at /gpfs/f5/cfsrl/scratch/JieShun.Zhu/util/spack-stack/c5_20241021t/spack-stack/configs/common
  5. source setup.sh
  6. spack stack create env --site=gaea-c5 --template=unified-dev --name=unified-dev
  7. spack env activate -p envs/unified-dev
  8. spack concretize >log4.concretize 2>&1 &

S2 is to use the new compilers at Gaea-C5; S3/4 are to use [email protected]/[email protected] following the suggestion by RatkoVasic-NOAA.

@AlexanderRichert-NOAA
Copy link
Collaborator

I am unable to reproduce the issue with those steps-- it concretizes without error. Can you try deleting or relocating your ~/.spack directory? There's always a chance that it's an issue of using multiple spack versions with the same user cache/bootstrap/etc.

@jieshunzhu
Copy link
Author

Thanks @AlexanderRichert-NOAA. Let me try that. Meanwhile, can you also share me with your directory?

@AlexanderRichert-NOAA
Copy link
Collaborator

/ncrc/home1/Alexander.Richert/spack-stack-1.4.0/envs/unified-dev

@jieshunzhu
Copy link
Author

/ncrc/home1/Alexander.Richert/spack-stack-1.4.0/envs/unified-dev

Thanks, Alex!

It looks like modules.yaml and packages.yaml under envs/unified-dev/common were not updated. Would you please update them and try "concretize" again?

In addition, I just deleted ~/.spack, and tried "concretize" again. I still got the same error.

@AlexanderRichert-NOAA
Copy link
Collaborator

I recopied packages.yaml and modules.yaml from /gpfs/f5/cfsrl/scratch/JieShun.Zhu/util/spack-stack/c5_20241021t/spack-stack/configs/common. diff -rq /gpfs/f5/cfsrl/scratch/JieShun.Zhu/util/spack-stack/c5_20241021t/spack-stack/configs/common/ /ncrc/home1/Alexander.Richert/spack-stack-1.4.0/envs/unified-dev/common/ shows no differences.

Are you doing any module loads in your shell rc file, or running in a conda env, anything like that?

@jieshunzhu
Copy link
Author

jieshunzhu commented Oct 22, 2024

Thanks Alex. Please let me know if you get the same error as I got or not, after using the new packages.yaml and modules.yaml.

No, I didnot do additional module loads in my rc files, nor conda env.
In fact, I can run "concretize", if I use the same packages.yaml and modules.yaml you tested.

@AlexanderRichert-NOAA
Copy link
Collaborator

AlexanderRichert-NOAA commented Oct 22, 2024

Okay, that did it. The issue is that the older copy of spack used by spack-stack 1.4.0 doesn't have the newer any_of/one_of logic for the packages:<package>:require configuration. The most straightforward solution is to rewrite the esmf entry:

    esmf:
      version: [8.5.0]
      variants: ~xerces ~pnetcdf snapshot=none ~shared +external-parallelio
      require: 'fflags="-fp-model precise" cxxflags="-fp-model precise"'

edit: The above isn't working, I'll update when I have a working yaml entry...

@AlexanderRichert-NOAA
Copy link
Collaborator

I can't find a clean way to add the -fp-model precise flags because those flags are being set in compilers.yaml and I don't think the older spack knows how to combine those. So you can either delete those or maybe modify the recipe (or implement all those gcc-related flags in packages.yaml rather than compilers.yaml).

@jieshunzhu
Copy link
Author

@AlexanderRichert-NOAA Thanks for the instruction. I chose to delete the flags. Now I can complete the "concretize" step. I am now with the "install" step. I will give you the update here. Thanks again for your help.

@jieshunzhu
Copy link
Author

@AlexanderRichert-NOAA @RatkoVasic-NOAA I tried another two reinstallation with my above steps. I got failures in the "install" step. Both failures are "Error: Exception occurred in writer daemon!", but for different packages (one during pcre2, another one during wgrib2).

  • The first failure: /gpfs/f5/cfsrl/scratch/JieShun.Zhu/util/spack-stack/c5/spack-stack_Fail. (the directory name was spack-stack, and later renamed as spack-stack_Fail)
  • The second failure: /gpfs/f5/cfsrl/scratch/JieShun.Zhu/util/spack-stack/c5/spack-stack (For this one, I tried to ensure that I deleted ~/.spack before installation).

Could you please help me take a look at them?

@RatkoVasic-NOAA
Copy link
Collaborator

@jieshunzhu since this is very old version and it took quite some time without success, I suggest you switch to newer version of the spack stack. Version 1.5.1 is going to be supported for next 2 months, so I'd suggest you try updating your work to at least spack-stack 1.6.0.
@climbfuji @AlexanderRichert-NOAA , your opinion?

@jieshunzhu
Copy link
Author

@RatkoVasic-NOAA Thanks for your suggestion. The problem for me is that I am still using an old version of soca-science. In early of this year, I tried to build it with spack-stack 1.4.1. Even with that, I still got a failure. But spack-stack 1.4.0 works for me. In any case, I will give it another try with 1.6.0 or other versions of spack-stack. I will share my updates here when done. Thanks again.

@climbfuji
Copy link
Collaborator

You might as well jump up to 1.8.0 so that you have a stable release to work with for the next 11 months.

@jieshunzhu
Copy link
Author

jieshunzhu commented Nov 12, 2024

@climbfuji @RatkoVasic-NOAA @AlexanderRichert-NOAA I started with 1.8.0, but got lots of issues. So I turned to 1.6.0, and it seems close to a success in "ecbuild" but with a problem in "bufr". You can see my log file here: /gpfs/f5/cpchso/scratch/JieShun.Zhu/ng-godas/CODEs/JZstable-nightly.20220729_iodaconvert20241112c5spack1.6.0/build/ecbuild.out.

Here is my options used in "ecbuild":

  • ecbuild --build=release -DMPIEXEC_NUMPROC_FLAG="-n" -DBUILD_PYTHON_BINDINGS=ON -DENABLE_LORENZ95_MODEL=OFF -DENABLE_QG_MODEL=OFF -Dbufr_FOUND=False -Dcrtm_FOUND=False -DPython3_EXECUTABLE:FILEPATH=/sw/gaea-c5/spack-envs/base/opt/linux-sles15-x86_64/gcc-7.5.0/python-3.9.12-n7xlvqjslbeeaexdaibryhi7miqjjoa2/bin/python3 ../soca-science/bundle >ecbuild.out 2>&1

@AlexanderRichert-NOAA
Copy link
Collaborator

This reflects a change in the bufr library, I believe, where it no longer builds a separate _d target. I suggest checking the documentation and/or creating an issue under the NCEPLIBS-bufr repo to see what's involved in updating your code to use bufr v12+.

@jieshunzhu
Copy link
Author

@AlexanderRichert-NOAA Thanks for the quick response. I will take a look at the documentation.

@jieshunzhu
Copy link
Author

I think I am able to build my code with spack-stack-1.7.0 with some changes in my code. But I am not able to use spack-stack-1.8.0. Let me give it more tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NOAA-EMC OAR-EPIC NOAA Oceanic and Atmospheric Research and Earth Prediction Innovation Center
Projects
None yet
Development

No branches or pull requests

4 participants