Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

{2023.06}[foss/2023a] cuDNN 8.9.2.26 w/ CUDA 12.1.1 (part 1) #772

Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
53 commits
Select commit Hold shift + click to select a range
4c5c5a3
{2023.06}[foss/2023a] cuDNN 8.9.2.26 w/ CUDA 12.1.1
truib Oct 2, 2024
c5c7dcc
Merge branch '2023.06-software.eessi.io' of github-trz:EESSI/software…
truib Oct 2, 2024
da7c1e4
use post sanity-check hook for cuDNN
truib Oct 2, 2024
454d2bb
install cuDNN under host_injections before installing it under /cvmfs
truib Oct 3, 2024
6824d75
use post_postproc hook to convert some cuDNN files to symlinks
truib Oct 3, 2024
0e92d8d
Merge branch '2023.06-software.eessi.io' of github-trz:EESSI/software…
truib Oct 3, 2024
044e168
explain idea for extension_based and reformat its definition
truib Oct 3, 2024
f983fed
explain why we need to obtain the extension and improve cond expr
truib Oct 3, 2024
6e95efd
use local var for conditional expression + slightly reorder code
truib Oct 3, 2024
21916fd
code golf
truib Oct 3, 2024
bf26846
improve comment (also anticipating additional cu* libraries in the fu…
truib Oct 3, 2024
e1ba74f
improve parameter name
truib Oct 3, 2024
1f1eada
explain use of rstrip
truib Oct 3, 2024
fc00d0c
raise error if search string wasn't found
truib Oct 3, 2024
e968608
improved docstring
truib Oct 3, 2024
9c6c3a4
Merge branch '2023.06-software.eessi.io' of github-trz:EESSI/software…
truib Oct 4, 2024
97d5b67
use TMPDIR as base for temporary storage
truib Oct 4, 2024
8e7a0e8
attempt to use a single easystack file for CUDA/cu* packages
truib Oct 4, 2024
57f5a48
various improvements for inject_gpu_property
truib Oct 4, 2024
21ffc18
various improvements for install_cuda_and_libraries.sh
truib Oct 4, 2024
a3edc20
show available *CUDA* modules for easier debugging
truib Oct 4, 2024
7f601dc
print and adjust MODULEPATH
truib Oct 4, 2024
e3101b0
implement option 3 to install module files in hidden directory
truib Oct 4, 2024
e9018fb
Move to gpu_support/nvidia subdir
Oct 9, 2024
54753c3
Make comment more explicit that this is only about nvidia GPU support
Oct 9, 2024
2b76a54
Moved easystack file
Oct 9, 2024
086ba5a
Change how EESSI_SITE_INSTALL is used
Oct 9, 2024
ecd30ca
First attempt at making this loop over EasyStack files, loading the c…
Oct 9, 2024
18cdaa2
not sure why this is not working, see if this solves it
Oct 9, 2024
5716424
This was the only way in which I got this to work. Otherwise, it does…
Oct 9, 2024
9f3853c
Added include-easyblocks-from-commit
Oct 9, 2024
b9017d4
Easystack no longer passed as option. Comment is outdated, since we n…
Oct 9, 2024
eea2879
Make sure easystack file for host_injections is shipped
Oct 9, 2024
33199d7
Remove rebuild, change comments that were out of date
Oct 9, 2024
06cd2ea
Only loop over the easystacks with CUDA in the name
Oct 9, 2024
a27683e
Merge branch '2023.06-software.eessi.io' of github-trz:EESSI/software…
truib Oct 14, 2024
d5572ea
Merge branch 'EESSI:2023.06-software.eessi.io' into 2023.06-software.…
trz42 Oct 14, 2024
b68fdfa
Merge branch '2023.06-software.eessi.io-cuDNN-8.9.2.26-part-1' of git…
truib Oct 14, 2024
0e7c9d8
add a bit more debug output, use *SITE_SOFTWARE_PATH and minor tweaks
truib Oct 15, 2024
d57f8d8
replace TAB with WHITESPACEs
truib Oct 15, 2024
02d3e1e
show more msgs when building and init full environment
truib Oct 15, 2024
b28017a
use zero length env vars
truib Oct 15, 2024
20aacdc
fix syntax issue
truib Oct 15, 2024
88bdb88
tweak variable expansion in test
truib Oct 15, 2024
7b8ba8b
dont use hooks when installing into host_injections
truib Oct 15, 2024
a95d546
revert variable expansion and unset certain variables instead
truib Oct 15, 2024
45fa6b1
log if Lmod rc/SitePackage are being created
truib Oct 15, 2024
c3482b2
show full path to Lmod RC/SitePackage when created
truib Oct 15, 2024
db90ca7
adjust path to lua files if building accelerator software
truib Oct 15, 2024
6a0223c
small typo fixed
truib Oct 16, 2024
affe37b
add comment to clarify setting of MODULEPATH
trz42 Oct 16, 2024
d2d95e9
clarify need for option
trz42 Oct 16, 2024
77f3bc9
revert to silent sourcing, keep initialising full environment and cla…
truib Oct 16, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 26 additions & 8 deletions EESSI-install-software.sh
Original file line number Diff line number Diff line change
Expand Up @@ -161,19 +161,21 @@ _eessi_software_path=${EESSI_PREFIX}/software/${EESSI_OS_TYPE}/${EESSI_SOFTWARE_
_lmod_cfg_dir=${_eessi_software_path}/.lmod
_lmod_rc_file=${_lmod_cfg_dir}/lmodrc.lua
if [ ! -f ${_lmod_rc_file} ]; then
echo "Lmod file '${_lmod_rc_file}' does not exist yet; creating it..."
command -V python3
python3 ${TOPDIR}/create_lmodrc.py ${_eessi_software_path}
fi
_lmod_sitepackage_file=${_lmod_cfg_dir}/SitePackage.lua
if [ ! -f ${_lmod_sitepackage_file} ]; then
echo "Lmod file '${_lmod_sitepackage_file}' does not exist yet; creating it..."
command -V python3
python3 ${TOPDIR}/create_lmodsitepackage.py ${_eessi_software_path}
fi

# Set all the EESSI environment variables (respecting $EESSI_SOFTWARE_SUBDIR_OVERRIDE)
# $EESSI_SILENT - don't print any messages
# $EESSI_BASIC_ENV - give a basic set of environment variables
EESSI_SILENT=1 EESSI_BASIC_ENV=1 source $TOPDIR/init/eessi_environment_variables
# $EESSI_SILENT - don't print any messages if set (use 'unset EESSI_SILENT' to let script show messages)
# $EESSI_BASIC_ENV - give a basic set of environment variables if set (use 'EESSI_BASIC_ENV=' to let script initialise a full environment)
EESSI_SILENT=1 EESSI_BASIC_ENV= source $TOPDIR/init/eessi_environment_variables
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, now I'm confused. I think @boegel 's statement was more to adapt the comments (or remove them), wasn't it? Because EESSI_BASIC_ENV= will set the EESSI_BASIC_ENV, which will result in a failure because we were missing one of the environment variables then (EESSI_SOFTWARE_PATH or something? I don't remember).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I was only pointing out that the comments didn't agree with the code

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We check with -z $EESSI_BASIC_ENV, which is false if $EESSI_BASIC_ENV is defined but empty (or undefined):

bash-3.2$ EESSI_BASIC_ENV=; if [ ! -z $EESSI_BASIC_ENV ]; then echo "EESSI_BASIC_ENV is set: '$EESSI_BASIC_ENV'"; fi
bash-3.2$ EESSI_BASIC_ENV=1; if [ ! -z $EESSI_BASIC_ENV ]; then echo "EESSI_BASIC_ENV is set: '$EESSI_BASIC_ENV'"; fi
EESSI_BASIC_ENV is set: '1'
bash-3.2$ unset EESSI_BASIC_ENV; if [ ! -z $EESSI_BASIC_ENV ]; then echo "EESSI_BASIC_ENV is set: '$EESSI_BASIC_ENV'"; fi
bash-3.2$


if [[ -z ${EESSI_SOFTWARE_SUBDIR} ]]; then
fatal_error "Failed to determine software subdirectory?!"
Expand Down Expand Up @@ -243,12 +245,13 @@ if [[ "${EESSI_CVMFS_REPO}" != /cvmfs/dev.eessi.io ]]; then
${TOPDIR}/install_scripts.sh --prefix ${EESSI_PREFIX}
fi

# Install full CUDA SDK in host_injections
# Install full CUDA SDK and cu* libraries in host_injections
# Hardcode this for now, see if it works
# TODO: We should make a nice yaml and loop over all CUDA versions in that yaml to figure out what to install
# Allow skipping CUDA SDK install in e.g. CI environments
# The install_cuda... script uses EasyBuild. So, we need to check if we have EB
# or skip this step.
echo "Going to install full CUDA SDK and cu* libraries under host_injections if necessary"
module_avail_out=$TMPDIR/ml.out
module avail 2>&1 | grep EasyBuild &> ${module_avail_out}
if [[ $? -eq 0 ]]; then
Expand All @@ -258,10 +261,15 @@ else
export skip_cuda_install=True
fi

temp_install_storage=${TMPDIR}/temp_install_storage
mkdir -p ${temp_install_storage}
if [ -z "${skip_cuda_install}" ] || [ ! "${skip_cuda_install}" ]; then
${EESSI_PREFIX}/scripts/gpu_support/nvidia/install_cuda_host_injections.sh -c 12.1.1 --accept-cuda-eula
${EESSI_PREFIX}/scripts/gpu_support/nvidia/install_cuda_and_libraries.sh \
-t ${temp_install_storage} \
--accept-cuda-eula \
--accept-cudnn-eula
else
echo "Skipping installation of CUDA SDK in host_injections, since the --skip-cuda-install flag was passed OR no EasyBuild module was found"
echo "Skipping installation of CUDA SDK and cu* libraries in host_injections, since the --skip-cuda-install flag was passed OR no EasyBuild module was found"
fi

# Install NVIDIA drivers in host_injections (if they exist)
Expand Down Expand Up @@ -318,20 +326,30 @@ else
done
fi

echo ">> Creating/updating Lmod RC file..."
export LMOD_CONFIG_DIR="${EASYBUILD_INSTALLPATH}/.lmod"
lmod_rc_file="$LMOD_CONFIG_DIR/lmodrc.lua"
if [[ ! -z ${EESSI_ACCELERATOR_TARGET} ]]; then
# EESSI_ACCELERATOR_TARGET is set, so let's remove the accelerator path from $lmod_rc_file
lmod_rc_file=$(echo ${lmod_rc_file} | sed "s@/accel/${EESSI_ACCELERATOR_TARGET}@@")
echo "Path to lmodrc.lua changed to '${lmod_rc_file}'"
fi
lmodrc_changed=$(cat ${pr_diff} | grep '^+++' | cut -f2 -d' ' | sed 's@^[a-z]/@@g' | grep '^create_lmodrc.py$' > /dev/null; echo $?)
if [ ! -f $lmod_rc_file ] || [ ${lmodrc_changed} == '0' ]; then
echo ">> Creating/updating Lmod RC file (${lmod_rc_file})..."
python3 $TOPDIR/create_lmodrc.py ${EASYBUILD_INSTALLPATH}
check_exit_code $? "$lmod_rc_file created" "Failed to create $lmod_rc_file"
fi

echo ">> Creating/updating Lmod SitePackage.lua ..."
export LMOD_PACKAGE_PATH="${EASYBUILD_INSTALLPATH}/.lmod"
lmod_sitepackage_file="$LMOD_PACKAGE_PATH/SitePackage.lua"
if [[ ! -z ${EESSI_ACCELERATOR_TARGET} ]]; then
# EESSI_ACCELERATOR_TARGET is set, so let's remove the accelerator path from $lmod_sitepackage_file
lmod_sitepackage_file=$(echo ${lmod_sitepackage_file} | sed "s@/accel/${EESSI_ACCELERATOR_TARGET}@@")
echo "Path to SitePackage.lua changed to '${lmod_sitepackage_file}'"
fi
sitepackage_changed=$(cat ${pr_diff} | grep '^+++' | cut -f2 -d' ' | sed 's@^[a-z]/@@g' | grep '^create_lmodsitepackage.py$' > /dev/null; echo $?)
if [ ! -f "$lmod_sitepackage_file" ] || [ "${sitepackage_changed}" == '0' ]; then
echo ">> Creating/updating Lmod SitePackage.lua (${lmod_sitepackage_file})..."
python3 $TOPDIR/create_lmodsitepackage.py ${EASYBUILD_INSTALLPATH}
check_exit_code $? "$lmod_sitepackage_file created" "Failed to create $lmod_sitepackage_file"
fi
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
easyconfigs:
- CUDA-12.1.1.eb
- cuDNN-8.9.2.26-CUDA-12.1.1.eb
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for completeness' sake, I would also list CUDA-12.1.1.eb in here.

It's already installed, so won't make a difference w.r.t. produced installation, but it looks a bit strange not having it listed explicitly in an accel easystack file...

I was also wondering whether we need both this easystack file and the one under scripts/gpu_support/nvidia.

I understand we do because the latter gets shipped with EESSI, but I'm wondering if we should symlink them or something, since they should stay in sync, no?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keeping them in sync would be good. Not sure it works with a symlink though.

If we keep the file under scripts/gpu_support_nvidia and use a symlink from easystacks/.../accel/nividia/some_easystack_file will a change (addition of another package, say cuTENSOR) let our build procedure notice the change?

If we symlink in the other direction, will our install_scripts.sh file just copy the symlink or the easystack file the symlink points to?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made an attempt to arrange for that (only having one file ... or better not duplicates with the same content) in 8e7a0e8

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After todays GPU tiger meeting, we agreed to separate these after all. The list of what is needed for host_injections is different from what we install. It's only those modules that we have to strip down that we have to then install in host_injections. Keeping that in a separate file is ok: it's not duplication, since that is the unique list of package that are needed in host injections (for nvidia GPU support)

205 changes: 156 additions & 49 deletions eb_hooks.py
Original file line number Diff line number Diff line change
Expand Up @@ -756,64 +756,170 @@ def post_postproc_cuda(self, *args, **kwargs):
if 'libcudart' not in allowlist:
raise EasyBuildError("Did not find 'libcudart' in allowlist: %s" % allowlist)

# iterate over all files in the CUDA installation directory
for dir_path, _, files in os.walk(self.installdir):
for filename in files:
full_path = os.path.join(dir_path, filename)
# we only really care about real files, i.e. not symlinks
if not os.path.islink(full_path):
# check if the current file name stub is part of the allowlist
basename = filename.split('.')[0]
if basename in allowlist:
self.log.debug("%s is found in allowlist, so keeping it: %s", basename, full_path)
else:
self.log.debug("%s is not found in allowlist, so replacing it with symlink: %s",
basename, full_path)
# if it is not in the allowlist, delete the file and create a symlink to host_injections

# the host_injections path is under a fixed repo/location for CUDA
host_inj_path = re.sub(EESSI_INSTALLATION_REGEX, HOST_INJECTIONS_LOCATION, full_path)
# CUDA itself doesn't care about compute capability so remove this duplication from
# under host_injections (symlink to a single CUDA installation for all compute
# capabilities)
accel_subdir = os.getenv("EESSI_ACCELERATOR_TARGET")
if accel_subdir:
host_inj_path = host_inj_path.replace("/accel/%s" % accel_subdir, '')
# make sure source and target of symlink are not the same
if full_path == host_inj_path:
raise EasyBuildError("Source (%s) and target (%s) are the same location, are you sure you "
"are using this hook for an EESSI installation?",
full_path, host_inj_path)
remove_file(full_path)
symlink(host_inj_path, full_path)
# replace files that are not distributable with symlinks into
# host_injections
replace_non_distributable_files_with_symlinks(self.log, self.installdir, self.name, allowlist)
else:
raise EasyBuildError("CUDA-specific hook triggered for non-CUDA easyconfig?!")


def post_postproc_cudnn(self, *args, **kwargs):
"""
Remove files from cuDNN installation that we are not allowed to ship,
and replace them with a symlink to a corresponding installation under host_injections.
"""

# We need to check if we are doing an EESSI-distributed installation
eessi_installation = bool(re.search(EESSI_INSTALLATION_REGEX, self.installdir))

if self.name == 'cuDNN' and eessi_installation:
print_msg("Replacing files in cuDNN installation that we can not ship with symlinks to host_injections...")

allowlist = ['LICENSE']

# read cuDNN LICENSE, construct allowlist based on section "2. Distribution" that specifies list of files that can be shipped
license_path = os.path.join(self.installdir, 'LICENSE')
search_string = "2. Distribution. The following portions of the SDK are distributable under the Agreement:"
casparvl marked this conversation as resolved.
Show resolved Hide resolved
found_search_string = False
with open(license_path) as infile:
for line in infile:
if line.strip().startswith(search_string):
found_search_string = True
# remove search string, split into words, remove trailing
# dots '.' and only retain words starting with a dot '.'
distributable = line[len(search_string):]
# distributable looks like ' the runtime files .so and .dll.'
# note the '.' after '.dll'
for word in distributable.split():
if word[0] == '.':
# rstrip is used to remove the '.' after '.dll'
allowlist.append(word.rstrip('.'))
casparvl marked this conversation as resolved.
Show resolved Hide resolved
if not found_search_string:
# search string wasn't found in LICENSE file
raise EasyBuildError("search string '%s' was not found in license file '%s';"
"hence installation may be replaced by symlinks only",
search_string, license_path)

allowlist = sorted(set(allowlist))
self.log.info("Allowlist for files in cuDNN installation that can be redistributed: " + ', '.join(allowlist))

# replace files that are not distributable with symlinks into
# host_injections
replace_non_distributable_files_with_symlinks(self.log, self.installdir, self.name, allowlist)
else:
raise EasyBuildError("cuDNN-specific hook triggered for non-cuDNN easyconfig?!")


def replace_non_distributable_files_with_symlinks(log, install_dir, pkg_name, allowlist):
"""
Replace files that cannot be distributed with symlinks into host_injections
"""
# Different packages use different ways to specify which files or file
# 'types' may be redistributed. For CUDA, the 'EULA.txt' lists full file
# names. For cuDNN, the 'LICENSE' lists file endings/suffixes (e.g., '.so')
# that can be redistributed.
# The map 'extension_based' defines which of these two ways are employed. If
# full file names are used it maps a package name (key) to False (value). If
# endings/suffixes are used, it maps a package name to True. Later we can
# easily use this data structure to employ the correct method for
# postprocessing an installation.
extension_based = {
"CUDA": False,
"cuDNN": True,
}
if not pkg_name in extension_based:
raise EasyBuildError("Don't know how to strip non-distributable files from package %s.", pkg_name)

# iterate over all files in the package installation directory
for dir_path, _, files in os.walk(install_dir):
for filename in files:
full_path = os.path.join(dir_path, filename)
# we only really care about real files, i.e. not symlinks
if not os.path.islink(full_path):
check_by_extension = extension_based[pkg_name] and '.' in filename
if check_by_extension:
# if the allowlist only contains extensions, we have to
# determine that from filename. we assume the extension is
# the second element when splitting the filename at dots
# (e.g., for 'libcudnn_adv_infer.so.8.9.2' the extension
# would be '.so')
extension = '.' + filename.split('.')[1]
# check if the current file name stub or its extension is part of the allowlist
basename = filename.split('.')[0]
if basename in allowlist:
log.debug("%s is found in allowlist, so keeping it: %s", basename, full_path)
elif check_by_extension and extension in allowlist:
log.debug("%s is found in allowlist, so keeping it: %s", extension, full_path)
else:
print_name = filename if extension_based[pkg_name] else basename
log.debug("%s is not found in allowlist, so replacing it with symlink: %s",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit confusing, since filename will never be explicitly in the allowlist (only extensions will be)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The allowlist is created in the post_postproc_{cuda,cudnn} function. For CUDA it contains 'EULA' (note without suffix '.txt'), 'README' and a list of file name "stubs" (only the first component when the full file names are split at '.'). For cuDNN, it contains 'LICENSE', '.so', ...

print_name, full_path)
# the host_injections path is under a fixed repo/location for CUDA or cuDNN
host_inj_path = re.sub(EESSI_INSTALLATION_REGEX, HOST_INJECTIONS_LOCATION, full_path)
# CUDA and cu* libraries themselves don't care about compute capability so remove this
# duplication from under host_injections (symlink to a single CUDA or cu* library
# installation for all compute capabilities)
accel_subdir = os.getenv("EESSI_ACCELERATOR_TARGET")
if accel_subdir:
host_inj_path = host_inj_path.replace("/accel/%s" % accel_subdir, '')
# make sure source and target of symlink are not the same
if full_path == host_inj_path:
raise EasyBuildError("Source (%s) and target (%s) are the same location, are you sure you "
"are using this hook for an EESSI installation?",
full_path, host_inj_path)
remove_file(full_path)
symlink(host_inj_path, full_path)


def inject_gpu_property(ec):
"""
Add 'gpu' property, via modluafooter easyconfig parameter
Add 'gpu' property and EESSI<PACKAGE>VERSION envvars via modluafooter
easyconfig parameter, and drop dependencies to build dependencies
"""
ec_dict = ec.asdict()
# Check if CUDA is in the dependencies, if so add the 'gpu' Lmod property
if ('CUDA' in [dep[0] for dep in iter(ec_dict['dependencies'])]):
ec.log.info("Injecting gpu as Lmod arch property and envvar with CUDA version")
key = 'modluafooter'
value = 'add_property("arch","gpu")'
cuda_version = 0
for dep in iter(ec_dict['dependencies']):
# Make CUDA a build dependency only (rpathing saves us from link errors)
if 'CUDA' in dep[0]:
cuda_version = dep[1]
ec_dict['dependencies'].remove(dep)
if dep not in ec_dict['builddependencies']:
ec_dict['builddependencies'].append(dep)
value = '\n'.join([value, 'setenv("EESSICUDAVERSION","%s")' % cuda_version])
if key in ec_dict:
if value not in ec_dict[key]:
ec[key] = '\n'.join([ec_dict[key], value])
# Check if CUDA, cuDNN, you-name-it is in the dependencies, if so
# - drop dependency to build dependency
# - add 'gpu' Lmod property
# - add envvar with package version
pkg_names = ( "CUDA", "cuDNN" )
pkg_versions = { }
add_gpu_property = ''

for pkg_name in pkg_names:
# Check if pkg_name is in the dependencies, if so drop dependency to build
# dependency and set variable for later adding the 'gpu' Lmod property
# to '.remove' dependencies from ec_dict['dependencies'] we make a copy,
# iterate over the copy and can then savely use '.remove' on the original
# ec_dict['dependencies'].
deps = ec_dict['dependencies'][:]
if (pkg_name in [dep[0] for dep in deps]):
add_gpu_property = 'add_property("arch","gpu")'
for dep in deps:
if pkg_name == dep[0]:
# make pkg_name a build dependency only (rpathing saves us from link errors)
ec.log.info("Dropping dependency on %s to build dependency" % pkg_name)
ec_dict['dependencies'].remove(dep)
if dep not in ec_dict['builddependencies']:
ec_dict['builddependencies'].append(dep)
# take note of version for creating the modluafooter
pkg_versions[pkg_name] = dep[1]
if add_gpu_property:
ec.log.info("Injecting gpu as Lmod arch property and envvars for dependencies with their version")
modluafooter = 'modluafooter'
extra_mod_footer_lines = [add_gpu_property]
for pkg_name, version in pkg_versions.items():
envvar = "EESSI%sVERSION" % pkg_name.upper()
ocaisa marked this conversation as resolved.
Show resolved Hide resolved
extra_mod_footer_lines.append('setenv("%s","%s")' % (envvar, version))
# take into account that modluafooter may already be set
if modluafooter in ec_dict:
value = ec_dict[modluafooter]
for line in extra_mod_footer_lines:
if not line in value:
value = '\n'.join([value, line])
ec[modluafooter] = value
else:
ec[key] = value
ec[modluafooter] = '\n'.join(extra_mod_footer_lines)

return ec


Expand Down Expand Up @@ -873,4 +979,5 @@ def inject_gpu_property(ec):

POST_POSTPROC_HOOKS = {
'CUDA': post_postproc_cuda,
'cuDNN': post_postproc_cudnn,
}
8 changes: 4 additions & 4 deletions init/eessi_environment_variables
Original file line number Diff line number Diff line change
Expand Up @@ -153,10 +153,10 @@ if [ -d $EESSI_PREFIX ]; then
fi

# Fix wrong path for RHEL >=8 libcurl
# This is required here because we ship curl in our compat layer. If we only provided
# curl as a module file we could instead do this via a `modluafooter` in an EasyBuild
# hook (or via an Lmod hook)
rhel_libcurl_file="/etc/pki/tls/certs/ca-bundle.crt"
# This is required here because we ship curl in our compat layer. If we only provided
# curl as a module file we could instead do this via a `modluafooter` in an EasyBuild
# hook (or via an Lmod hook)
rhel_libcurl_file="/etc/pki/tls/certs/ca-bundle.crt"
if [ -f $rhel_libcurl_file ]; then
show_msg "Found libcurl CAs file at RHEL location, setting CURL_CA_BUNDLE"
export CURL_CA_BUNDLE=$rhel_libcurl_file
Expand Down
11 changes: 10 additions & 1 deletion install_scripts.sh
Original file line number Diff line number Diff line change
Expand Up @@ -122,10 +122,19 @@ copy_files_by_list ${TOPDIR}/scripts ${INSTALL_PREFIX}/scripts "${script_files[@

# Copy files for the scripts/gpu_support/nvidia directory
nvidia_files=(
install_cuda_host_injections.sh link_nvidia_host_libraries.sh
install_cuda_and_libraries.sh
install_cuda_host_injections.sh
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we deprecate this script in favor of install_cuda_and_libraries.sh?

We should definitely update the documentation at https://www.eessi.io/docs/gpu if we're going forward with install_cuda_and_libraries.sh

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could/should do that, but maybe in a separate PR that coordinates the change in the docs?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, deprecation can be done later. First, this has to be deployed. Then, we can put it in the docs, and people can start using it. Only then should we deprecate the old method.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

follow-up on this via #789

link_nvidia_host_libraries.sh
)
copy_files_by_list ${TOPDIR}/scripts/gpu_support/nvidia ${INSTALL_PREFIX}/scripts/gpu_support/nvidia "${nvidia_files[@]}"

# Easystacks to be used to install software in host injections
host_injections_easystacks=(
eessi-2023.06-eb-4.9.4-2023a-CUDA-host-injections.yml
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to follow up on this in a future PR, but I wonder if we need a hardcoded list here, can't we use a glob here like eessi-*-CUDA-host-injections.yml, so we don't need to remember to update this list whenever an additional easystack file is added under scripts/gpu_support/nvidia/easystacks?

)
copy_files_by_list ${TOPDIR}/scripts/gpu_support/nvidia/easystacks \
${INSTALL_PREFIX}/scripts/gpu_support/nvidia/easystacks "${host_injections_easystacks[@]}"

# Copy over EasyBuild hooks file used for installations
hook_files=(
eb_hooks.py
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# This EasyStack provides a list of all the EasyConfigs that should be installed in host_injections
# for nvidia GPU support, because they cannot (fully) be shipped as part of EESSI due to license constraints
easyconfigs:
- CUDA-12.1.1.eb
- cuDNN-8.9.2.26-CUDA-12.1.1.eb:
options:
# needed to enforce acceptance of EULA in cuDNN easyblock,
# see https://github.com/easybuilders/easybuild-easyblocks/pull/3473
include-easyblocks-from-commit: 11afb88ec55e0ca431cbe823696aa43e2a9bfca8
Loading