Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unpin or update many packages (mostly Python) in configs/common/packages.yaml, fix S4 site config #1384

Open
wants to merge 19 commits into
base: develop
Choose a base branch
from

Conversation

climbfuji
Copy link
Collaborator

@climbfuji climbfuji commented Nov 17, 2024

Summary

In preparation for spack-stack-1.9.0, this PR unpins or updates several packages in configs/common/packages.yaml (mostly Python packages). Most notably, py-shapely (@ericlingerfelt FYI) and py-numpy (@DavidHuber-NOAA FYI) are updated.

The py-numpy update may require bug fixes with the Intel classic compiler that @DavidHuber-NOAA worked on and that are currently under review in spack develop (see #1276).

Included is an update of the S4 site config, which had several flaws that prevented building and testing this PR.

Testing

Applications affected

All.

Systems affected

None directly.

Dependencies

Issue(s) addressed

Resolves #1065

Checklist

  • This PR addresses one issue/problem/enhancement, or has a very good reason for not doing so.
  • These changes have been tested on the affected systems and applications.
  • All dependency PRs/issues have been resolved and this PR can be merged.

@climbfuji climbfuji changed the title WIP - Unpin or update many packages (mostly Python) in configs/common/packages.yaml Unpin or update many packages (mostly Python) in configs/common/packages.yaml Dec 6, 2024
@climbfuji climbfuji marked this pull request as ready for review December 6, 2024 18:59
meson:
require:
- '@1.5.1:'
#meson:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to self. Remove all these commented-out packages once we are sure it all works as expected

@climbfuji
Copy link
Collaborator Author

All, the NEPTUNE tests passed. Please start testing with UFS, JEDI, ... we need this PR in as soon as possible for spack-stack 1.9.0 (downstream PRs depend on it). Thanks!

@srherbener
Copy link
Collaborator

@climbfuji I'll test JEDI/Skylab.

@climbfuji
Copy link
Collaborator Author

Thanks very much @srherbener! You should expect problems with the shapely update (@ericlingerfelt may know more), but hopefully nothing else.

@srherbener
Copy link
Collaborator

I'm getting concretize errors like this on several platforms:

[ue-intel] [sherbener@s4-submit ue-intel]$ spack concretize | tee log.concretize
==> Warning: Use of plain text `access_token` in mirror config is deprecated, use environment variables instead (access_token_variable)
==> Error: failed to concretize `jedi-geos-env%intel ^esmf@=8.6.1`, `ufs-srw-app-env%intel ^esmf@=8.6.1`, `global-workflow-env%intel ^esmf@=8.6.1`, `ewok-env%intel~cylc+ecflow`, ..., `gsi-env%intel` for the following reasons:
     1. cannot satisfy a requirement for package 'py-setuptools'.
==> Using cached archive: /data/users/sherbener/projects/spack-stack/cache/source_cache/blobs/sha256/3cc99d42a12e6c34bc187d5146cbdba71ca16d19163fe106f1a4cd3e59b01d2c
==> Using cached archive: /data/users/sherbener/projects/spack-stack/cache/source_cache/blobs/sha256/e8518de25baff7a74bdb42193e6e4b0496e7d0688434c42ce4bdc92fe4293a09
==> Installing "clingo-bootstrap@=spack%gcc@=10.2.1~docs+ipo+optimized+python+static_libstdcpp build_system=cmake build_type=Release generator=make patches=bebb819,ec99431 arch=linux-centos7-x86_64" from a buildcache

This particular message is from S4. Does this need another PR to be merged, or an update for the spack submodule commit hash?

Or perhaps pilot error. I did the following (on S4) before attempting to do the spack-stack build:

module purge
module load intel/2023.2 miniconda/3.8-s4

Is that the issue?

Thanks!

@climbfuji
Copy link
Collaborator Author

Definitely no miniconda, those times are long gone

@srherbener
Copy link
Collaborator

@stiggy87 tried this PR in an AMI using Ubuntu 24.04 and gcc 12.3. I'm attempting S4 using [email protected]. In @stiggy87's case version 1.26.4 gets selected for py-numpy, and in my case version 1.23.5 gets selected for numpy. It just so happens that according to the depndencies set up in the py-numpy package.py script, [email protected] is limited to py-setuptools@63 or older, whereas [email protected] is okay with py-setuptools@69. I'm guessing a similar thing is happening with the CI tests.

I think this makes sense, but I'm not clear on why the py-numpy versions come out different in the two cases.

@srherbener
Copy link
Collaborator

Could the different py-numpy versions be caused by the version of python3? In my case it's 3.6.8 and in @stiggy87's case its 3.11.7.

@climbfuji
Copy link
Collaborator Author

climbfuji commented Dec 11, 2024

Your Python is not 3.6.8. This is only used to run spack commands, not inside the environment. It's actually pretty simple, in the S4 site config, py-numpy is hardcoded to 1.23 ^openblas - please try to change that to 1.26.4 ^openblas.

@climbfuji
Copy link
Collaborator Author

Your Python is not 3.6.8. This is only used to run spack commands, not inside the environment. It's actually pretty simple, in the S4 site config, py-numpy is hardcoded to 1.23 ^openblas - please try to change that to 1.26.4 ^openblas.

Or just remove the :1.23 part, leaving only ^openblas (might be the better solution).

@srherbener
Copy link
Collaborator

@climbfuji what you say makes sense, the concretization should be following what is specified in the config not what SPACK_PYTHON is set to.

Setting SPACK_PYTHON to a 3.11.7 version didn't help (as expected)

I'm tying again after removing the ':1.23.5' part of the py-numpy spec in the site/packages.yaml.

@srherbener
Copy link
Collaborator

Concretize is working now! Thanks @climbfuji

@climbfuji
Copy link
Collaborator Author

Concretize is working now! Thanks @climbfuji

Yay. Hopefully the rest will be less bumpy. I will update all the site configs accordingly.

@srherbener
Copy link
Collaborator

The install failed on building antlr. Here is the compile error:

                 from /data/users/sherbener/projects/spack-stack/cache/build_stage/spack-stage-antlr-2.7.7-gauwfia2hci6fltmfweq4d2ygty7o4nk/spack-src/lib/cpp/src/ANTLRUtil.cpp(9):
/usr/include/wchar.h(396): error: identifier "_Float32" is undefined
  extern _Float32 wcstof32 (const wchar_t *__restrict __nptr,

I noticed that there seems to be something suspicious with the antlr spec:

[ue-intel] [sherbener@s4-submit ue-intel]$ spack spec antlr
==> Warning: Use of plain text `access_token` in mirror config is deprecated, use environment variables instead (access_token_variable)
 -   [email protected]%[email protected]+cxx~java~pic~python build_system=autotools patches=33897ad arch=linux-rocky8-skylake_avx512
[e]      ^[email protected]%[email protected] build_system=autotools arch=linux-rocky8-skylake
[e]      ^[email protected]%[email protected]~guile build_system=generic patches=ca60bd9,fe5b60d arch=linux-rocky8-skylake

The arch entry for the dependencies of antlr do not have the _avx512 suffix., whereas the entry for antlr does have that suffix. I seem to recall a recent issue/PR about the AVX512 instruction set, but I'm not sure which is expected (with or without the _avx512 suffix). But it doesn't seem right that there is a mix of with/without the suffix - seems like arch should be consistent for all the packages.

Any thoughts? Thanks!

@climbfuji
Copy link
Collaborator Author

I have no idea, never seen this. The antlr version is the same as it has always been, no updates. Not sure if the _avx512 has anything to do with it, but you might be able to force all packages to build for arch=linux-rocky8-skylake ?

@climbfuji
Copy link
Collaborator Author

I have no idea, never seen this. The antlr version is the same as it has always been, no updates. Not sure if the _avx512 has anything to do with it, but you might be able to force all packages to build for arch=linux-rocky8-skylake ?

FYI, I am trying to build on Derecho (same compiler, [email protected]) - will let you know if I run into the same issue or not.

@srherbener
Copy link
Collaborator

With the knowledge about the py-numpy version requirement in the site/packages.yaml, I went back to Orion and attempted to build there. I'm attempting to build with GNU/OpenMPI and getting this error from concretize:

[ue-gnu] orion-login-2[20] herbener$ spack concretize 2>&1 | tee log.concretize
==> Warning: Use of plain text `access_token` in mirror config is deprecated, use environment variables instead (access_token_variable)
==> Error: failed to concretize `jedi-ufs-env%gnu ^esmf@=8.6.1`, `gsi-env%gnu`, `ewok-env%gnu~cylc+ecflow`, `jedi-fv3-env%gnu`, ..., `esmf@=8.6.1%gnu snapshot=none` for the following reasons:
     1. Cannot set the required compiler: madis%gnu

After some poking around I discovered that madis seems to want to use the intel compiler, which I think led to the concretize error above. Here is what spack spec madis returns, in my unified-env, gnu environment.

[ue-gnu] orion-login-2[25] herbener$ spack spec madis
==> Warning: Use of plain text `access_token` in mirror config is deprecated, use environment variables instead (access_token_variable)
 -   [email protected]%[email protected]+pic~pnetcdf build_system=makefile arch=linux-rocky9-skylake_avx512
[e]      ^[email protected]%[email protected] build_system=autotools arch=linux-rocky9-skylake_avx512
 -       ^[email protected]%[email protected]~guile build_system=generic patches=ca60bd9,fe5b60d arch=linux-rocky9-skylake_avx512
 -       ^[email protected]%[email protected]~doc+pic+shared build_system=autotools arch=linux-rocky9-skylake_avx512
 -           ^[email protected]%[email protected]~classic-names+envmods~external-libfabric~generic-names~ilp64 build_system=generic arch=linux-rocky9-skylake_avx512
 -           ^[email protected]%[email protected]~blosc~byterange+dap~fsync~hdf4~jna~logging+mpi~nczarr_zip+optimize~parallel-netcdf+pic~shared~szip~zstd build_system=autotools patches=0161eb8 arch=linux-rocky9-skylake_avx512
[e]              ^[email protected]%[email protected]+gssapi+ldap~libidn2~librtmp~libssh~libssh2+nghttp2 build_system=autotools libs=shared,static tls=openssl arch=linux-rocky9-skylake_avx512
 -               ^[email protected]%[email protected]~cxx+fortran+hl~ipo~java~map+mpi+shared~subfiling~szip+threadsafe+tools api=default build_system=cmake build_type=Release generator=make patches=82088c8,f42732a arch=linux-rocky9-skylake_avx512
 -                   ^[email protected]%[email protected]~doc~ncurses+ownlibs~qtgui build_system=generic build_type=Release patches=dbc3892 arch=linux-rocky9-skylake_avx512
 -                   ^[email protected]%[email protected]+internal_glib build_system=autotools arch=linux-rocky9-skylake_avx512
 -                   ^[email protected]%[email protected]+compat~new_strategies+opt+pic+shared build_system=autotools arch=linux-rocky9-skylake_avx512
 -               ^[email protected]%[email protected]+pic~python+shared build_system=autotools arch=linux-rocky9-skylake_avx512
 -                   ^[email protected]%[email protected]~pic build_system=autotools libs=shared,static arch=linux-rocky9-skylake_avx512

Any ideas why madis wants to use the intel compiler instead of gcc?

Thanks!

@climbfuji
Copy link
Collaborator Author

Unfortunately, I do not have the bandwidth to debug all tier-1 site configs. We need to share that work, and yes, sometimes it takes hours or days. Can you try to resolve the Orion issues yourself or with the help of EPIC, please?

As far as S4 is concerned: I think the problem is that the site config is bad/wrong. If you look at compilers.yaml, you see that Intel uses gcc@13 as the GCC backend. We've known for years (and there are several open issues in spack-stack) that Intel classic doesn't do well with a gcc@12 or newer backend. I hopped on S4, added [email protected] as the backend for Intel classic, removed the py-numpy variant and also forced arch=linux-rocky8-skylake (the latter is probably not needed, but I don't like that some stuff is concretized with avx512 while some stuff isn't).

The gcc@10 backend change allowed me to compile antlr. I'll try to build the entire stack on S4 with Intel, and if it works I'll update the site config in the PR and point you to the stack for testing.

@srherbener
Copy link
Collaborator

Sorry @climbfuji about all the questions. I'll spread out the questions elsewhere and I greatly appreciate the help and support that you provide. Unfortunately, I don't have the bandwidth for debugging all of these site configurations either. My real need is to just build somewhere so I can test this PR with skylab.

@climbfuji
Copy link
Collaborator Author

Sorry @climbfuji about all the questions. I'll spread out the questions elsewhere and I greatly appreciate the help and support that you provide. Unfortunately, I don't have the bandwidth for debugging all of these site configurations either. My real need is to just build somewhere so I can test this PR with skylab.

Hopefully S4 will be ready for you by the end of today so that you can test next week!

@InnocentSouopgui-NOAA
Copy link
Collaborator

Sorry @climbfuji about all the questions. I'll spread out the questions elsewhere and I greatly appreciate the help and support that you provide. Unfortunately, I don't have the bandwidth for debugging all of these site configurations either. My real need is to just build somewhere so I can test this PR with skylab.

Hopefully S4 will be ready for you by the end of today so that you can test next week!

If the readiness of S4 refers to the network problem experienced last week, then it had been resolved since Tuesday.
I am assisting users with extra libraries there before getting to the task I have on spack-stack for s4.

@climbfuji
Copy link
Collaborator Author

Sorry @climbfuji about all the questions. I'll spread out the questions elsewhere and I greatly appreciate the help and support that you provide. Unfortunately, I don't have the bandwidth for debugging all of these site configurations either. My real need is to just build somewhere so I can test this PR with skylab.

Hopefully S4 will be ready for you by the end of today so that you can test next week!

If the readiness of S4 refers to the network problem experienced last week, then it had been resolved since Tuesday. I am assisting users with extra libraries there before getting to the task I have on spack-stack for s4.

no, just bugs in the site config

@srherbener
Copy link
Collaborator

Sorry @climbfuji about all the questions. I'll spread out the questions elsewhere and I greatly appreciate the help and support that you provide. Unfortunately, I don't have the bandwidth for debugging all of these site configurations either. My real need is to just build somewhere so I can test this PR with skylab.

Hopefully S4 will be ready for you by the end of today so that you can test next week!

Thanks @climbfuji - much appreciated!

@climbfuji
Copy link
Collaborator Author

Sorry @climbfuji about all the questions. I'll spread out the questions elsewhere and I greatly appreciate the help and support that you provide. Unfortunately, I don't have the bandwidth for debugging all of these site configurations either. My real need is to just build somewhere so I can test this PR with skylab.

Hopefully S4 will be ready for you by the end of today so that you can test next week!

Thanks @climbfuji - much appreciated!

@srherbener /data/users/dheinzeller/spst-unpin-update/envs/ue-intel-2021.10.0/install/modulefiles/Core

@climbfuji climbfuji changed the title Unpin or update many packages (mostly Python) in configs/common/packages.yaml Unpin or update many packages (mostly Python) in configs/common/packages.yaml, fix S4 site config Dec 16, 2024
@srherbener
Copy link
Collaborator

@srherbener /data/users/dheinzeller/spst-unpin-update/envs/ue-intel-2021.10.0/install/modulefiles/Core

Thanks @climbfuji! I am building jedi-bundle now, and will run ctests and skylab.

@srherbener
Copy link
Collaborator

Okay, I'll forge ahead with python 3.6.8.
Note that the documentation here https://spack-stack.readthedocs.io/en/latest/PreConfiguredSites.html#create-local-environment says to make sure you have python 3.8+ available and is the default.

That is from the old days when we forced spack to use an external Python in the environment. We should remove that. All we need is Python 3.6+ to drive spack and to build the environments.

Note, I just submitted a PR (#1420) to correct the documentation.

@srherbener
Copy link
Collaborator

jedi-bundle looks good, I'm running skylab now.

@@ -294,4 +305,5 @@ packages:
wrf-io:
require: '@1.2.0'
zstd:
require: '@1.5.2 +programs'
#require: '@1.5.2 +programs'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason not to pin the zstd version? It's not a huge issue, but I would tend to lean toward pinning versions for packages that are used by UWM, in other words I would generally put zstd in the same category as hdf5, netcdf, etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

Update shapely from 1.8.0 to latest version 2.x.y
4 participants