Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CMake 3.27.9 causes SCORPIO configuration errors on Frontier with crayclanggpu when OMP_NUM_THREADS > 1 #6750

Open
dqwu opened this issue Nov 16, 2024 · 2 comments
Assignees
Labels
CMake build system Frontier SCORPIO The E3SM I/O library (derived from PIO)

Comments

@dqwu
Copy link
Contributor

dqwu commented Nov 16, 2024

PR #6689 explicitly loads the Core/24.07 module on Frontier. The only available CMake module with Core/24.07 is cmake/3.27.9. This version breaks the crayclanggpu build when OMP_NUM_THREADS > 1, particularly after PR #6747 re-enabled PIO_ENABLE_TOOLS for SCORPIO.

Steps to Reproduce on Frontier

git clone https://github.com/E3SM-Project/E3SM.git
cd E3SM

git submodule update --init --recursive

cd cime/scripts

./create_newcase --machine=frontier --compiler=crayclanggpu --case X_f19_g16 --compset X --res f19_g16
cd X_f19_g16

./xmlchange LND_NTHRDS=2

./case.setup

./case.build

CMake Error Message

CMake Error at /autofs/nccs-svm1_sw/frontier/spack-envs/core-24.07/opt/gcc-7.5.0/cmake-3.27.9-pyxnvhiskwepbw5itqyipzyhhfw3yitk/share/cmake-3.27/Modules/FindPackageHandleStandardArgs.cmake:230 (message):
  Could NOT find MPI (missing: MPI_Fortran_FOUND) (found version "3.1")
Call Stack (most recent call first):
  /autofs/nccs-svm1_sw/frontier/spack-envs/core-24.07/opt/gcc-7.5.0/cmake-3.27.9-pyxnvhiskwepbw5itqyipzyhhfw3yitk/share/cmake-3.27/Modules/FindPackageHandleStandardArgs.cmake:600 (_FPHSA_FAILURE_MESSAGE)
  /autofs/nccs-svm1_sw/frontier/spack-envs/core-24.07/opt/gcc-7.5.0/cmake-3.27.9-pyxnvhiskwepbw5itqyipzyhhfw3yitk/share/cmake-3.27/Modules/FindMPI.cmake:1837 (find_package_handle_standard_args)
  tools/spio_finfo/CMakeLists.txt:21 (find_package)

This issue is also reproducible with standalone SCORPIO builds. It seems related to CMake versions 3.22 or higher, as described in E3SM-Project/scorpio#517, which mentions a similar issue occurring when CMAKE_SYSTEM_NAME is set to Catamount.

Tests with Different CMake Versions

[Failing with CMake/3.27.9]

. /usr/share/lmod/lmod/init/sh
module reset
module switch Core Core/24.07
module load cmake/3.27.9
module load craype-accel-amd-gfx90a rocm/5.4.0

git clone https://github.com/E3SM-Project/scorpio.git
cd scorpio

mkdir build1
cd build1

FC=ftn CC=cc CXX=mpicxx \
LDFLAGS="-fopenmp" \
cmake -Wno-dev \
-DWITH_NETCDF=OFF \
-DPnetCDF_PATH=/opt/cray/pe/parallel-netcdf/1.12.3.1/crayclang/14.0 \
..

[Failing with CMake/3.22.2]

. /usr/share/lmod/lmod/init/sh
module reset
module switch Core Core/24.00
module load cmake/3.22.2
module load craype-accel-amd-gfx90a rocm/5.4.0

git clone https://github.com/E3SM-Project/scorpio.git
cd scorpio

mkdir build2
cd build2

FC=ftn CC=cc CXX=mpicxx \
LDFLAGS="-fopenmp" \
cmake -Wno-dev \
-DWITH_NETCDF=OFF \
-DPnetCDF_PATH=/opt/cray/pe/parallel-netcdf/1.12.3.1/crayclang/14.0 \
..

[Working with CMake/3.21.3]

. /usr/share/lmod/lmod/init/sh
module reset
module switch Core Core/24.00
module load cmake/3.21.3
module load craype-accel-amd-gfx90a rocm/5.4.0

git clone https://github.com/E3SM-Project/scorpio.git
cd scorpio

mkdir build3
cd build3

FC=ftn CC=cc CXX=mpicxx \
LDFLAGS="-fopenmp" \
cmake -Wno-dev \
-DWITH_NETCDF=OFF \
-DPnetCDF_PATH=/opt/cray/pe/parallel-netcdf/1.12.3.1/crayclang/14.0 \
..

[Working with /usr/bin/cmake (3.20.4)]

. /usr/share/lmod/lmod/init/sh
module reset
module switch Core Core/24.07
module load craype-accel-amd-gfx90a rocm/5.4.0

git clone https://github.com/E3SM-Project/scorpio.git
cd scorpio

mkdir build4
cd build4

FC=ftn CC=cc CXX=mpicxx \
LDFLAGS="-fopenmp" \
/usr/bin/cmake -Wno-dev \
-DWITH_NETCDF=OFF \
-DPnetCDF_PATH=/opt/cray/pe/parallel-netcdf/1.12.3.1/crayclang/14.0 \
..

Possible Fixes

  1. Switch to the older Core/24.00 module to use cmake/3.21.3 with the crayclanggpu compiler.
  2. Continue using the latest Core/24.07, but use the default system CMake (version 3.20.4, located at /usr/bin/cmake).
@dqwu dqwu added SCORPIO The E3SM I/O library (derived from PIO) CMake build system Frontier labels Nov 16, 2024
@dqwu
Copy link
Contributor Author

dqwu commented Nov 16, 2024

@trey-ornl This issue appears to have been introduced in CMake 3.22 and persists through version 3.27. As shown in the tests above, the error occurs consistently with cmake/3.22.2 and cmake/3.27.9 but does not occur with cmake/3.21.3 or the system-installed CMake version 3.20.4. This suggests a long-standing bug in CMake that has yet to be resolved. That is why crayclang-scream still uses cmake/3.21.3.

@trey-ornl
Copy link
Contributor

@dqwu Yes, there appears to be a disagreement between CMake and Cray Fortran that emerges at CMake 3.22. I find it odd to use CXX=mpicxx, and I'm surprised it works. For frontier-scream-gpu with crayclang-scream, we load Core/24.00 and cmake/3.21.3. The newer compiler configuration, craygnuamdpgu, uses Gnu Fortran, which works with the default Core/24.07 and cmake/3.27.9.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CMake build system Frontier SCORPIO The E3SM I/O library (derived from PIO)
Projects
None yet
Development

No branches or pull requests

3 participants