Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Frontier machine/compilers following system update #6579

Merged
merged 1 commit into from
Sep 6, 2024

Conversation

grnydawn
Copy link
Contributor

@grnydawn grnydawn commented Sep 2, 2024

  • Retain current software modules instead of updating to the latest versions to prioritize reliability.
  • Add linker options to use GCC 12.2, addressing linker errors.
  • Utilize Fortran linker to resolve additional linker errors.
  • Replace hipcc with mpicxx for MPICXX macro in the GPU compiler definitions.
  • Adjust compiler priority to prioritize reliability over performance.
  • Temporarily comment out ADIOS2 configurations

@grnydawn grnydawn self-assigned this Sep 2, 2024
@grnydawn grnydawn added Frontier Cray Cray compiler related issues GNU GNU compiler related issues AMD-compiler Issues related to AMD Compiler labels Sep 2, 2024
Copy link

github-actions bot commented Sep 2, 2024

PR Preview Action v1.4.7
🚀 Deployed preview to https://E3SM-Project.github.io/E3SM/pr-preview/pr-6579/
on branch gh-pages at 2024-09-05 18:59 UTC

@rljacob
Copy link
Member

rljacob commented Sep 3, 2024

How was this tested?

@grnydawn
Copy link
Contributor Author

grnydawn commented Sep 3, 2024

The following table summarizes the test results. Test results are from running the e3sm_developer test suite without debug cases. It took an excessive amount of time to build debug cases with the crayclang compiler.

Test Result crayclang (Current) crayclang (Latest) amdclang (Current) amdclang (Latest) gnu (Current) gnu (Latest)
PASS 63 11 58 44 65 71
FAIL 9 60 14 38 7 12
Total 72 71 72 82 72 83
Avg. Build Time (secs) 1137 1619 289 293 133 123

@grnydawn
Copy link
Contributor Author

grnydawn commented Sep 3, 2024

The crayclang compiler (both the current and latest versions) has internal compiler issues(segfault from optcg compiler internal module) and excessive compile times : OLCF tickets for the latest version: OLCFHELP-19210, OLCFHELP-19356, and OLCFHELP-19435.

The amdclang compiler has a segfault issue during the compilation of some test cases. Essentially, AMD has asked us to wait until they release a new Fortran compiler based on LLVM.

@rljacob
Copy link
Member

rljacob commented Sep 4, 2024

Try testing with this suite: e3sm_gpucxx

@grnydawn
Copy link
Contributor Author

grnydawn commented Sep 4, 2024

@rljacob , I got following test result with e3sm_gpucxx test suite.

amdclang amdclanggpu crayclang crayclanggpu gnu gnugpu
SCREAM FAIL MODEL_BUILD FAIL SHAREDLIB_BUILD FAIL MODEL_BUILD FAIL MODEL_BUILD FAIL MODEL_BUILD FAIL MODEL_BUILD
MMF PASS PASS PEND MODEL_BUILD PEND MODEL_BUILD PASS PASS

All scream failures except amdclanggpu compiler case are caused by the following error:

-- Configuring incomplete, errors occurred!
...
CMake Error at cmake/build_eamxx.cmake:34 (include):
include could not find requested file:
No macro file found: .../cmake_macros/frontier.cmake in build_eamxx.cmake

@grnydawn
Copy link
Contributor Author

grnydawn commented Sep 4, 2024

The following cmake statements in "$E3SM/components/cmake/build_eamxx.cmake" caused the error:

set(SCREAM_MACH_FILE_ROOT ${CMAKE_SOURCE_DIR}/eamxx/cmake/machine-files)
if (EXISTS ${SCREAM_MACH_FILE_ROOT}/${MACH}-${COMPILER}.cmake)
    include(${SCREAM_MACH_FILE_ROOT}/${MACH}-${COMPILER}.cmake)
else()
    include(${SCREAM_MACH_FILE_ROOT}/${MACH}.cmake)

The only cmake file that Scream currently supports for Frontier is "frontier-scream-gpu.cmake". So, none of the above machine/compiler names worked.

@jgfouca
Copy link
Member

jgfouca commented Sep 4, 2024

@grnydawn , eamxx won't work if there's no eamxx machine file. Eamxx clears all E3SM settings and manages its own flags, so you will end up with no compiler flags if there's no eamxx machine file.

@jgfouca
Copy link
Member

jgfouca commented Sep 4, 2024

Eamxx is currently using it's own machine files for frontier, so you probably don't need to worry about this for now.

@rljacob
Copy link
Member

rljacob commented Sep 4, 2024

What stops EAMxx from using the regular E3SM build system components?

@jgfouca
Copy link
Member

jgfouca commented Sep 4, 2024

@rljacob , a long time ago, we thought eamxx would need to manage it's own flags, even when part of a CIME build. I think we have since decided that we should just use the CIME system but I haven't gotten around to changing things yet.

- Retain current software modules instead of updating to the latest versions.
- Add linker options to use GCC 12.2, addressing linker errors.
- Utilize Fortran linker to resolve additional linker errors.
- Replace hipcc with mpicxx for MPICXX macro in the GPU compiler definitions.
- Adjust compiler priority to prioritize reliability over performance.
- Temporarily comment out ADIOS2 configurations
@grnydawn grnydawn force-pushed the ykim/frontier/updates_July_2024 branch from 54f3748 to 25fe732 Compare September 5, 2024 18:07
@grnydawn
Copy link
Contributor Author

grnydawn commented Sep 5, 2024

If there are no objections, I will merge this PR to the next branch, and, if no new issues arise, into the master branch.

grnydawn added a commit that referenced this pull request Sep 5, 2024
Update Frontier machine/compilers following system update

[BFB] no code change. retain current software modules
@grnydawn grnydawn merged commit 7470acf into master Sep 6, 2024
21 checks passed
@grnydawn grnydawn deleted the ykim/frontier/updates_July_2024 branch September 6, 2024 17:32
@grnydawn grnydawn restored the ykim/frontier/updates_July_2024 branch September 6, 2024 17:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
AMD-compiler Issues related to AMD Compiler Cray Cray compiler related issues Frontier GNU GNU compiler related issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants