Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Print LmodError when loading GCCcore-12.2.0-based modules on zen4 #841

Draft
wants to merge 6 commits into
base: 2023.06-software.eessi.io
Choose a base branch
from

Conversation

casparvl
Copy link
Collaborator

@casparvl casparvl commented Dec 10, 2024

Implements the idea from https://gitlab.com/eessi/support/-/issues/37#note_2159031831

But, not currently working, because the first module that gets installed that uses GCCcore-12.2.0 as dependency will try to load it (even with --module-only), which then fails:

== creating module...
  >> generating module file @ /home/casparl/eessi/versions/2023.06/software/linux/x86_64/amd/zen4/modules/all/GCCcore/12.2.0.lua
== ... (took < 1 sec)
== permissions [skipped]
== packaging [skipped]
  >> running command:
        [started at: 2024-12-10 18:12:47]
        [working dir: /gpfs/home4/casparl/eessi/versions/2023.06/software/linux/x86_64/amd/zen4/modules]
        [output logged in /scratch-local/casparl.8987353/eb-bscq9hvh/easybuild-run_cmd-pxom1tke.log]
        bzip2 /home/casparl/eessi/versions/2023.06/software/linux/x86_64/amd/zen4/software/GCCcore/12.2.0/easybuild/easybuild-GCCcore-12.2.0-20241210.181247.log
  >> command completed: exit 0, ran in < 1s
== COMPLETED: Installation ended successfully (took 1 secs)
== Results of the build can be found in the log file(s) /home/casparl/eessi/versions/2023.06/software/linux/x86_64/amd/zen4/software/GCCcore/12.2.0/easybuild/easybuild-GCCcore-12.2.0-20241210.181247.log.bz2
== processing EasyBuild easyconfig /cvmfs/software.eessi.io/versions/2023.06/software/linux/x86_64/amd/zen4/software/EasyBuild/4.9.4/easybuild/easyconfigs/p/pkgconf/pkgconf-1.9.3-GCCcore-12.2.0.eb
== building and installing pkgconf/1.9.3-GCCcore-12.2.0...
  >> installation prefix: /home/casparl/eessi/versions/2023.06/software/linux/x86_64/amd/zen4/software/pkgconf/1.9.3-GCCcore-12.2.0
== fetching files [skipped]
== creating build dir, resetting environment...
  >> build dir: /tmp/casparl/easybuild/build/pkgconf/1.9.3/GCCcore-12.2.0
== Running post-ready hook...
== ... (took < 1 sec)
== unpacking [skipped]
== patching [skipped]
== preparing...
== Running pre-prepare hook...
== ... (took < 1 sec)
== FAILED: Installation ended unsuccessfully (build directory: /tmp/casparl/easybuild/build/pkgconf/1.9.3/GCCcore-12.2.0): build failed (first 300 chars): Module command '/usr/share/lmod/lmod/libexec/lmod python show GCCcore/12.2.0' failed with exit code 1; stderr: Lmod has detected
the following error: Unable to load module because of error when evaluating modulefile:
     /home/casparl/eessi/versions/2023.06/software/linux/x86_64/amd/zen4/modules/al (took 0 secs)
== Results of the build can be found in the log file(s) /scratch-local/casparl.8987353/eb-bscq9hvh/easybuild-pkgconf-1.9.3-20241210.181248.DZrFb.log
0:00:02  1 out of 37 easyconfigs done: GCCcore/12.2.0 (OK)
ERROR: Build of /cvmfs/software.eessi.io/versions/2023.06/software/linux/x86_64/amd/zen4/software/EasyBuild/4.9.4/easybuild/easyconfigs/p/pkgconf/pkgconf-1.9.3-GCCcore-12.2.0.eb failed (err: "build failed (first 300 chars): Module command '/usr/share/lmod/lmod/libexec/lmod python show GCCcore/12.2.0' failed with exit code 1; stderr: Lmod has detected the following error: Unable to load module because of error when evaluating modulefile:\n     /home/casparl/eessi/versions/2023.06/software/linux/x86_64/amd/zen4/modules/al")

maybe we can make that pre-prepare hook do nothing. Or skip the prepare phase. NOt sure...

Copy link

eessi-bot bot commented Dec 10, 2024

Instance eessi-bot-mc-aws is configured to build for:

  • architectures: x86_64/generic, x86_64/intel/haswell, x86_64/intel/skylake_avx512, x86_64/amd/zen2, x86_64/amd/zen3, aarch64/generic, aarch64/neoverse_n1, aarch64/neoverse_v1
  • repositories: eessi.io-2023.06-compat, eessi-hpc.org-2023.06-software, eessi-hpc.org-2023.06-compat, eessi.io-2023.06-software

@riscv-eessi-io-bot
Copy link

Instance eessi-bot-riscv is configured to build for:

  • architectures: riscv64/generic
  • repositories: riscv.eessi.io-20240402

Copy link

eessi-bot bot commented Dec 10, 2024

Instance eessi-bot-mc-azure is configured to build for:

  • architectures: x86_64/amd/zen4
  • repositories: eessi.io-2023.06-compat, eessi.io-2023.06-software

eb_hooks.py Outdated Show resolved Hide resolved
eb_hooks.py Outdated Show resolved Hide resolved
…e of the step-hooks, so we can unset it at the end
eb_hooks.py Outdated Show resolved Hide resolved
eb_hooks.py Show resolved Hide resolved
eb_hooks.py Show resolved Hide resolved
@casparvl
Copy link
Collaborator Author

casparvl commented Dec 11, 2024

Ok, this PR is more or less ready, but we should create a known issues page on the zen4 tree missing 2022b / GCCcore 12.2.0. I think it can be very simple and state that because of issues observed with the OpenBLAS from that toolchain generation, we decided not to support it. It also makes sense: zen4 was only release end of 2022, so the 2022b stack would have had very little support for it.

@casparvl casparvl marked this pull request as ready for review December 11, 2024 16:57
@casparvl casparvl marked this pull request as draft December 11, 2024 16:58
@casparvl
Copy link
Collaborator Author

Ok, I'll need EESSI/docs#357 to be merged first. Then, I'll put a link to that part of the docs in the LmodError.

EESSI_IGNORE_ZEN4_GCC1220_ENVVAR="EESSI_IGNORE_LMOD_ERROR_ZEN4_GCC1220"

def is_gcccore_1220_based(ecname, ecversion, tcname, tcversion):
"""Checks if this easyconfig either _is_ or _uses_ a GCCcore-12.2.2 based toolchain"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"""Checks if this easyconfig either _is_ or _uses_ a GCCcore-12.2.2 based toolchain"""
"""Checks if this easyconfig either _is_ or _uses_ a GCCcore-12.2.0 based toolchain"""

# Need to escape newline character so that the newline character actually ends up in the module file
# (otherwise, it splits the string, and a 2-line string ends up in the modulefile, resulting in syntax error)
errmsg = "EasyConfigs using toolchains based on GCCcore-12.2.0 are not supported for the Zen4 architecture.\\n"
errmsg += "See https://www.eessi.io/docs/known_issues/eessi-2023.06/"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
errmsg += "See https://www.eessi.io/docs/known_issues/eessi-2023.06/"
errmsg += "See https://www.eessi.io/docs/known_issues/eessi-2023.06/#gcc-1220-and-foss-2022b-based-modules-cannot-be-loaded-on-zen4-architecture"

# Make sure a single environment variable name is used for this throughout the hooks
EESSI_IGNORE_ZEN4_GCC1220_ENVVAR="EESSI_IGNORE_LMOD_ERROR_ZEN4_GCC1220"

def is_gcccore_1220_based(ecname, ecversion, tcname, tcversion):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels brittle, there's no sanity checking on the arguments. If you give the arguments in the wrong order, the function will happily proceed. Don't particularly want you to bend over backwards to check the arguments, I think kwargs with default None would at least be clear and less error prone.

EESSI_FORCE_ATTR)


# We do this as early as possible - and remove it all the way in the last step hook (post_testcases_hook)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is no longer accurate (I think)



# We do this as early as possible - and remove it all the way in the last step hook (post_testcases_hook)
def pre_prepare_hook_ignore_zen4_gcccore1220_error(self, *args, **kwargs):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there really any harm in collapsing this into your pre_fetch hook? It would be nice to keep the setting and unsetting unified for our future selves to understand better what would need to be changed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or is the environment variable not persistent?

os.environ[EESSI_IGNORE_ZEN4_GCC1220_ENVVAR] = "1"


def post_prepare_hook_ignore_zen4_gcccore1220_error(self, *args, **kwargs):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, is there any harm to just adding this to post_module_hook?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants