-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
{2023.06}[foss/2023a] cuDNN 8.9.2.26 w/ CUDA 12.1.1 (part 1) #772
Changes from 50 commits
4c5c5a3
c5c7dcc
da7c1e4
454d2bb
6824d75
0e92d8d
044e168
f983fed
6e95efd
21916fd
bf26846
e1ba74f
1f1eada
fc00d0c
e968608
9c6c3a4
97d5b67
8e7a0e8
57f5a48
21ffc18
a3edc20
7f601dc
e3101b0
e9018fb
54753c3
2b76a54
086ba5a
ecd30ca
18cdaa2
5716424
9f3853c
b9017d4
eea2879
33199d7
06cd2ea
a27683e
d5572ea
b68fdfa
0e7c9d8
d57f8d8
02d3e1e
b28017a
20aacdc
88bdb88
7b8ba8b
a95d546
45fa6b1
c3482b2
db90ca7
6a0223c
affe37b
d2d95e9
77f3bc9
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
easyconfigs: | ||
- CUDA-12.1.1.eb | ||
- cuDNN-8.9.2.26-CUDA-12.1.1.eb | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Just for completeness' sake, I would also list It's already installed, so won't make a difference w.r.t. produced installation, but it looks a bit strange not having it listed explicitly in an I was also wondering whether we need both this easystack file and the one under I understand we do because the latter gets shipped with EESSI, but I'm wondering if we should symlink them or something, since they should stay in sync, no? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Keeping them in sync would be good. Not sure it works with a symlink though. If we keep the file under If we symlink in the other direction, will our There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Made an attempt to arrange for that (only having one file ... or better not duplicates with the same content) in 8e7a0e8 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. After todays GPU tiger meeting, we agreed to separate these after all. The list of what is needed for |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -756,64 +756,170 @@ def post_postproc_cuda(self, *args, **kwargs): | |
if 'libcudart' not in allowlist: | ||
raise EasyBuildError("Did not find 'libcudart' in allowlist: %s" % allowlist) | ||
|
||
# iterate over all files in the CUDA installation directory | ||
for dir_path, _, files in os.walk(self.installdir): | ||
for filename in files: | ||
full_path = os.path.join(dir_path, filename) | ||
# we only really care about real files, i.e. not symlinks | ||
if not os.path.islink(full_path): | ||
# check if the current file name stub is part of the allowlist | ||
basename = filename.split('.')[0] | ||
if basename in allowlist: | ||
self.log.debug("%s is found in allowlist, so keeping it: %s", basename, full_path) | ||
else: | ||
self.log.debug("%s is not found in allowlist, so replacing it with symlink: %s", | ||
basename, full_path) | ||
# if it is not in the allowlist, delete the file and create a symlink to host_injections | ||
|
||
# the host_injections path is under a fixed repo/location for CUDA | ||
host_inj_path = re.sub(EESSI_INSTALLATION_REGEX, HOST_INJECTIONS_LOCATION, full_path) | ||
# CUDA itself doesn't care about compute capability so remove this duplication from | ||
# under host_injections (symlink to a single CUDA installation for all compute | ||
# capabilities) | ||
accel_subdir = os.getenv("EESSI_ACCELERATOR_TARGET") | ||
if accel_subdir: | ||
host_inj_path = host_inj_path.replace("/accel/%s" % accel_subdir, '') | ||
# make sure source and target of symlink are not the same | ||
if full_path == host_inj_path: | ||
raise EasyBuildError("Source (%s) and target (%s) are the same location, are you sure you " | ||
"are using this hook for an EESSI installation?", | ||
full_path, host_inj_path) | ||
remove_file(full_path) | ||
symlink(host_inj_path, full_path) | ||
# replace files that are not distributable with symlinks into | ||
# host_injections | ||
replace_non_distributable_files_with_symlinks(self.log, self.installdir, self.name, allowlist) | ||
else: | ||
raise EasyBuildError("CUDA-specific hook triggered for non-CUDA easyconfig?!") | ||
|
||
|
||
def post_postproc_cudnn(self, *args, **kwargs): | ||
""" | ||
Remove files from cuDNN installation that we are not allowed to ship, | ||
and replace them with a symlink to a corresponding installation under host_injections. | ||
""" | ||
|
||
# We need to check if we are doing an EESSI-distributed installation | ||
eessi_installation = bool(re.search(EESSI_INSTALLATION_REGEX, self.installdir)) | ||
|
||
if self.name == 'cuDNN' and eessi_installation: | ||
print_msg("Replacing files in cuDNN installation that we can not ship with symlinks to host_injections...") | ||
|
||
allowlist = ['LICENSE'] | ||
|
||
# read cuDNN LICENSE, construct allowlist based on section "2. Distribution" that specifies list of files that can be shipped | ||
license_path = os.path.join(self.installdir, 'LICENSE') | ||
search_string = "2. Distribution. The following portions of the SDK are distributable under the Agreement:" | ||
casparvl marked this conversation as resolved.
Show resolved
Hide resolved
|
||
found_search_string = False | ||
with open(license_path) as infile: | ||
for line in infile: | ||
if line.strip().startswith(search_string): | ||
found_search_string = True | ||
# remove search string, split into words, remove trailing | ||
# dots '.' and only retain words starting with a dot '.' | ||
distributable = line[len(search_string):] | ||
# distributable looks like ' the runtime files .so and .dll.' | ||
# note the '.' after '.dll' | ||
for word in distributable.split(): | ||
if word[0] == '.': | ||
# rstrip is used to remove the '.' after '.dll' | ||
allowlist.append(word.rstrip('.')) | ||
casparvl marked this conversation as resolved.
Show resolved
Hide resolved
|
||
if not found_search_string: | ||
# search string wasn't found in LICENSE file | ||
raise EasyBuildError("search string '%s' was not found in license file '%s';" | ||
"hence installation may be replaced by symlinks only", | ||
search_string, license_path) | ||
|
||
allowlist = sorted(set(allowlist)) | ||
self.log.info("Allowlist for files in cuDNN installation that can be redistributed: " + ', '.join(allowlist)) | ||
|
||
# replace files that are not distributable with symlinks into | ||
# host_injections | ||
replace_non_distributable_files_with_symlinks(self.log, self.installdir, self.name, allowlist) | ||
else: | ||
raise EasyBuildError("cuDNN-specific hook triggered for non-cuDNN easyconfig?!") | ||
|
||
|
||
def replace_non_distributable_files_with_symlinks(log, install_dir, pkg_name, allowlist): | ||
""" | ||
Replace files that cannot be distributed with symlinks into host_injections | ||
""" | ||
# Different packages use different ways to specify which files or file | ||
# 'types' may be redistributed. For CUDA, the 'EULA.txt' lists full file | ||
# names. For cuDNN, the 'LICENSE' lists file endings/suffixes (e.g., '.so') | ||
# that can be redistributed. | ||
# The map 'extension_based' defines which of these two ways are employed. If | ||
# full file names are used it maps a package name (key) to False (value). If | ||
# endings/suffixes are used, it maps a package name to True. Later we can | ||
# easily use this data structure to employ the correct method for | ||
# postprocessing an installation. | ||
extension_based = { | ||
"CUDA": False, | ||
"cuDNN": True, | ||
} | ||
if not pkg_name in extension_based: | ||
raise EasyBuildError("Don't know how to strip non-distributable files from package %s.", pkg_name) | ||
|
||
# iterate over all files in the package installation directory | ||
for dir_path, _, files in os.walk(install_dir): | ||
for filename in files: | ||
full_path = os.path.join(dir_path, filename) | ||
# we only really care about real files, i.e. not symlinks | ||
if not os.path.islink(full_path): | ||
check_by_extension = extension_based[pkg_name] and '.' in filename | ||
if check_by_extension: | ||
# if the allowlist only contains extensions, we have to | ||
# determine that from filename. we assume the extension is | ||
# the second element when splitting the filename at dots | ||
# (e.g., for 'libcudnn_adv_infer.so.8.9.2' the extension | ||
# would be '.so') | ||
extension = '.' + filename.split('.')[1] | ||
# check if the current file name stub or its extension is part of the allowlist | ||
basename = filename.split('.')[0] | ||
if basename in allowlist: | ||
log.debug("%s is found in allowlist, so keeping it: %s", basename, full_path) | ||
elif check_by_extension and extension in allowlist: | ||
log.debug("%s is found in allowlist, so keeping it: %s", extension, full_path) | ||
else: | ||
print_name = filename if extension_based[pkg_name] else basename | ||
log.debug("%s is not found in allowlist, so replacing it with symlink: %s", | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is a bit confusing, since There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The allowlist is created in the |
||
print_name, full_path) | ||
# the host_injections path is under a fixed repo/location for CUDA or cuDNN | ||
host_inj_path = re.sub(EESSI_INSTALLATION_REGEX, HOST_INJECTIONS_LOCATION, full_path) | ||
# CUDA and cu* libraries themselves don't care about compute capability so remove this | ||
# duplication from under host_injections (symlink to a single CUDA or cu* library | ||
# installation for all compute capabilities) | ||
accel_subdir = os.getenv("EESSI_ACCELERATOR_TARGET") | ||
if accel_subdir: | ||
host_inj_path = host_inj_path.replace("/accel/%s" % accel_subdir, '') | ||
# make sure source and target of symlink are not the same | ||
if full_path == host_inj_path: | ||
raise EasyBuildError("Source (%s) and target (%s) are the same location, are you sure you " | ||
"are using this hook for an EESSI installation?", | ||
full_path, host_inj_path) | ||
remove_file(full_path) | ||
symlink(host_inj_path, full_path) | ||
|
||
|
||
def inject_gpu_property(ec): | ||
""" | ||
Add 'gpu' property, via modluafooter easyconfig parameter | ||
Add 'gpu' property and EESSI<PACKAGE>VERSION envvars via modluafooter | ||
easyconfig parameter, and drop dependencies to build dependencies | ||
""" | ||
ec_dict = ec.asdict() | ||
# Check if CUDA is in the dependencies, if so add the 'gpu' Lmod property | ||
if ('CUDA' in [dep[0] for dep in iter(ec_dict['dependencies'])]): | ||
ec.log.info("Injecting gpu as Lmod arch property and envvar with CUDA version") | ||
key = 'modluafooter' | ||
value = 'add_property("arch","gpu")' | ||
cuda_version = 0 | ||
for dep in iter(ec_dict['dependencies']): | ||
# Make CUDA a build dependency only (rpathing saves us from link errors) | ||
if 'CUDA' in dep[0]: | ||
cuda_version = dep[1] | ||
ec_dict['dependencies'].remove(dep) | ||
if dep not in ec_dict['builddependencies']: | ||
ec_dict['builddependencies'].append(dep) | ||
value = '\n'.join([value, 'setenv("EESSICUDAVERSION","%s")' % cuda_version]) | ||
if key in ec_dict: | ||
if value not in ec_dict[key]: | ||
ec[key] = '\n'.join([ec_dict[key], value]) | ||
# Check if CUDA, cuDNN, you-name-it is in the dependencies, if so | ||
# - drop dependency to build dependency | ||
# - add 'gpu' Lmod property | ||
# - add envvar with package version | ||
pkg_names = ( "CUDA", "cuDNN" ) | ||
pkg_versions = { } | ||
add_gpu_property = '' | ||
|
||
for pkg_name in pkg_names: | ||
# Check if pkg_name is in the dependencies, if so drop dependency to build | ||
# dependency and set variable for later adding the 'gpu' Lmod property | ||
# to '.remove' dependencies from ec_dict['dependencies'] we make a copy, | ||
# iterate over the copy and can then savely use '.remove' on the original | ||
# ec_dict['dependencies']. | ||
deps = ec_dict['dependencies'][:] | ||
if (pkg_name in [dep[0] for dep in deps]): | ||
add_gpu_property = 'add_property("arch","gpu")' | ||
for dep in deps: | ||
if pkg_name == dep[0]: | ||
# make pkg_name a build dependency only (rpathing saves us from link errors) | ||
ec.log.info("Dropping dependency on %s to build dependency" % pkg_name) | ||
ec_dict['dependencies'].remove(dep) | ||
if dep not in ec_dict['builddependencies']: | ||
ec_dict['builddependencies'].append(dep) | ||
# take note of version for creating the modluafooter | ||
pkg_versions[pkg_name] = dep[1] | ||
if add_gpu_property: | ||
ec.log.info("Injecting gpu as Lmod arch property and envvars for dependencies with their version") | ||
modluafooter = 'modluafooter' | ||
extra_mod_footer_lines = [add_gpu_property] | ||
for pkg_name, version in pkg_versions.items(): | ||
envvar = "EESSI%sVERSION" % pkg_name.upper() | ||
ocaisa marked this conversation as resolved.
Show resolved
Hide resolved
|
||
extra_mod_footer_lines.append('setenv("%s","%s")' % (envvar, version)) | ||
# take into account that modluafooter may already be set | ||
if modluafooter in ec_dict: | ||
value = ec_dict[modluafooter] | ||
for line in extra_mod_footer_lines: | ||
if not line in value: | ||
value = '\n'.join([value, line]) | ||
ec[modluafooter] = value | ||
else: | ||
ec[key] = value | ||
ec[modluafooter] = '\n'.join(extra_mod_footer_lines) | ||
|
||
return ec | ||
|
||
|
||
|
@@ -873,4 +979,5 @@ def inject_gpu_property(ec): | |
|
||
POST_POSTPROC_HOOKS = { | ||
'CUDA': post_postproc_cuda, | ||
'cuDNN': post_postproc_cudnn, | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -122,10 +122,19 @@ copy_files_by_list ${TOPDIR}/scripts ${INSTALL_PREFIX}/scripts "${script_files[@ | |
|
||
# Copy files for the scripts/gpu_support/nvidia directory | ||
nvidia_files=( | ||
install_cuda_host_injections.sh link_nvidia_host_libraries.sh | ||
install_cuda_and_libraries.sh | ||
install_cuda_host_injections.sh | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Shouldn't we deprecate this script in favor of We should definitely update the documentation at https://www.eessi.io/docs/gpu if we're going forward with There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We could/should do that, but maybe in a separate PR that coordinates the change in the docs? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree, deprecation can be done later. First, this has to be deployed. Then, we can put it in the docs, and people can start using it. Only then should we deprecate the old method. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. follow-up on this via #789 |
||
link_nvidia_host_libraries.sh | ||
) | ||
copy_files_by_list ${TOPDIR}/scripts/gpu_support/nvidia ${INSTALL_PREFIX}/scripts/gpu_support/nvidia "${nvidia_files[@]}" | ||
|
||
# Easystacks to be used to install software in host injections | ||
host_injections_easystacks=( | ||
eessi-2023.06-eb-4.9.4-2023a-CUDA-host-injections.yml | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Happy to follow up on this in a future PR, but I wonder if we need a hardcoded list here, can't we use a glob here like |
||
) | ||
copy_files_by_list ${TOPDIR}/scripts/gpu_support/nvidia/easystacks \ | ||
${INSTALL_PREFIX}/scripts/gpu_support/nvidia/easystacks "${host_injections_easystacks[@]}" | ||
|
||
# Copy over EasyBuild hooks file used for installations | ||
hook_files=( | ||
eb_hooks.py | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
# This EasyStack provides a list of all the EasyConfigs that should be installed in host_injections | ||
# for nvidia GPU support, because they cannot (fully) be shipped as part of EESSI due to license constraints | ||
easyconfigs: | ||
- CUDA-12.1.1.eb | ||
- cuDNN-8.9.2.26-CUDA-12.1.1.eb: | ||
options: | ||
# Needed for support for --accept-uela-for option | ||
trz42 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
include-easyblocks-from-commit: 11afb88ec55e0ca431cbe823696aa43e2a9bfca8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this doesn't agree with what's going on below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reverted it back to sourcing it silently. However, we need the full environment to be initialised at this stage or some needed environment variable is not set (particularly,
EESSI_SITE_SOFTWARE_PATH
). Improved the comments. Will repeat tests (removing host-injections, building, ...).See 77f3bc9