-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
{2023.06}[foss/2023a] cuDNN 8.9.2.26 w/ CUDA 12.1.1 (part 1) #772
Changes from 6 commits
4c5c5a3
c5c7dcc
da7c1e4
454d2bb
6824d75
0e92d8d
044e168
f983fed
6e95efd
21916fd
bf26846
e1ba74f
1f1eada
fc00d0c
e968608
9c6c3a4
97d5b67
8e7a0e8
57f5a48
21ffc18
a3edc20
7f601dc
e3101b0
e9018fb
54753c3
2b76a54
086ba5a
ecd30ca
18cdaa2
5716424
9f3853c
b9017d4
eea2879
33199d7
06cd2ea
a27683e
d5572ea
b68fdfa
0e7c9d8
d57f8d8
02d3e1e
b28017a
20aacdc
88bdb88
7b8ba8b
a95d546
45fa6b1
c3482b2
db90ca7
6a0223c
affe37b
d2d95e9
77f3bc9
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
easyconfigs: | ||
- cuDNN-8.9.2.26-CUDA-12.1.1.eb | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Just for completeness' sake, I would also list It's already installed, so won't make a difference w.r.t. produced installation, but it looks a bit strange not having it listed explicitly in an I was also wondering whether we need both this easystack file and the one under I understand we do because the latter gets shipped with EESSI, but I'm wondering if we should symlink them or something, since they should stay in sync, no? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Keeping them in sync would be good. Not sure it works with a symlink though. If we keep the file under If we symlink in the other direction, will our There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Made an attempt to arrange for that (only having one file ... or better not duplicates with the same content) in 8e7a0e8 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. After todays GPU tiger meeting, we agreed to separate these after all. The list of what is needed for |
Original file line number | Diff line number | Diff line change | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
@@ -756,64 +756,141 @@ def post_postproc_cuda(self, *args, **kwargs): | |||||||||||||
if 'libcudart' not in allowlist: | ||||||||||||||
raise EasyBuildError("Did not find 'libcudart' in allowlist: %s" % allowlist) | ||||||||||||||
|
||||||||||||||
# iterate over all files in the CUDA installation directory | ||||||||||||||
for dir_path, _, files in os.walk(self.installdir): | ||||||||||||||
for filename in files: | ||||||||||||||
full_path = os.path.join(dir_path, filename) | ||||||||||||||
# we only really care about real files, i.e. not symlinks | ||||||||||||||
if not os.path.islink(full_path): | ||||||||||||||
# check if the current file name stub is part of the allowlist | ||||||||||||||
basename = filename.split('.')[0] | ||||||||||||||
if basename in allowlist: | ||||||||||||||
self.log.debug("%s is found in allowlist, so keeping it: %s", basename, full_path) | ||||||||||||||
else: | ||||||||||||||
self.log.debug("%s is not found in allowlist, so replacing it with symlink: %s", | ||||||||||||||
basename, full_path) | ||||||||||||||
# if it is not in the allowlist, delete the file and create a symlink to host_injections | ||||||||||||||
|
||||||||||||||
# the host_injections path is under a fixed repo/location for CUDA | ||||||||||||||
host_inj_path = re.sub(EESSI_INSTALLATION_REGEX, HOST_INJECTIONS_LOCATION, full_path) | ||||||||||||||
# CUDA itself doesn't care about compute capability so remove this duplication from | ||||||||||||||
# under host_injections (symlink to a single CUDA installation for all compute | ||||||||||||||
# capabilities) | ||||||||||||||
accel_subdir = os.getenv("EESSI_ACCELERATOR_TARGET") | ||||||||||||||
if accel_subdir: | ||||||||||||||
host_inj_path = host_inj_path.replace("/accel/%s" % accel_subdir, '') | ||||||||||||||
# make sure source and target of symlink are not the same | ||||||||||||||
if full_path == host_inj_path: | ||||||||||||||
raise EasyBuildError("Source (%s) and target (%s) are the same location, are you sure you " | ||||||||||||||
"are using this hook for an EESSI installation?", | ||||||||||||||
full_path, host_inj_path) | ||||||||||||||
remove_file(full_path) | ||||||||||||||
symlink(host_inj_path, full_path) | ||||||||||||||
# replace files that are not distributable with symlinks into | ||||||||||||||
# host_injections | ||||||||||||||
replace_non_distributable_files_with_symlinks(self.log, self.installdir, self.name, allowlist) | ||||||||||||||
else: | ||||||||||||||
raise EasyBuildError("CUDA-specific hook triggered for non-CUDA easyconfig?!") | ||||||||||||||
|
||||||||||||||
|
||||||||||||||
def post_postproc_cudnn(self, *args, **kwargs): | ||||||||||||||
""" | ||||||||||||||
Remove files from cuDNN installation that we are not allowed to ship, | ||||||||||||||
and replace them with a symlink to a corresponding installation under host_injections. | ||||||||||||||
""" | ||||||||||||||
|
||||||||||||||
# We need to check if we are doing an EESSI-distributed installation | ||||||||||||||
eessi_installation = bool(re.search(EESSI_INSTALLATION_REGEX, self.installdir)) | ||||||||||||||
|
||||||||||||||
if self.name == 'cuDNN' and eessi_installation: | ||||||||||||||
print_msg("Replacing files in cuDNN installation that we can not ship with symlinks to host_injections...") | ||||||||||||||
|
||||||||||||||
allowlist = ['LICENSE'] | ||||||||||||||
|
||||||||||||||
# read cuDNN LICENSE, construct allowlist based on section "2. Distribution" that specifies list of files that can be shipped | ||||||||||||||
license_path = os.path.join(self.installdir, 'LICENSE') | ||||||||||||||
search_string = "2. Distribution. The following portions of the SDK are distributable under the Agreement:" | ||||||||||||||
casparvl marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||||||
with open(license_path) as infile: | ||||||||||||||
for line in infile: | ||||||||||||||
if line.strip().startswith(search_string): | ||||||||||||||
# remove search string, split into words, remove trailing | ||||||||||||||
# dots '.' and only retain words starting with a dot '.' | ||||||||||||||
distributable = line[len(search_string):] | ||||||||||||||
for word in distributable.split(): | ||||||||||||||
if word[0] == '.': | ||||||||||||||
allowlist.append(word.rstrip('.')) | ||||||||||||||
casparvl marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||||||
|
||||||||||||||
allowlist = sorted(set(allowlist)) | ||||||||||||||
self.log.info("Allowlist for files in cuDNN installation that can be redistributed: " + ', '.join(allowlist)) | ||||||||||||||
|
||||||||||||||
# replace files that are not distributable with symlinks into | ||||||||||||||
# host_injections | ||||||||||||||
replace_non_distributable_files_with_symlinks(self.log, self.installdir, self.name, allowlist) | ||||||||||||||
else: | ||||||||||||||
raise EasyBuildError("cuDNN-specific hook triggered for non-cuDNN easyconfig?!") | ||||||||||||||
|
||||||||||||||
|
||||||||||||||
def replace_non_distributable_files_with_symlinks(log, install_dir, package, allowlist): | ||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe use There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. changed in e1ba74f |
||||||||||||||
""" | ||||||||||||||
Replace files that cannot be distributed with symlinks into host_injections | ||||||||||||||
""" | ||||||||||||||
extension_based = { "CUDA": False, "cuDNN": True } | ||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would reformat this for clarify, and add a comment above to explain what
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Addressed in 044e168 |
||||||||||||||
if not package in extension_based: | ||||||||||||||
raise EasyBuildError("Don't know how to strip non-distributable files from package %s.", package) | ||||||||||||||
|
||||||||||||||
# iterate over all files in the package installation directory | ||||||||||||||
for dir_path, _, files in os.walk(install_dir): | ||||||||||||||
for filename in files: | ||||||||||||||
full_path = os.path.join(dir_path, filename) | ||||||||||||||
# we only really care about real files, i.e. not symlinks | ||||||||||||||
if not os.path.islink(full_path): | ||||||||||||||
# check if the current file name stub is part of the allowlist | ||||||||||||||
basename = filename.split('.')[0] | ||||||||||||||
if extension_based[package]: | ||||||||||||||
if '.' in filename: | ||||||||||||||
extension = '.' + filename.split('.')[1] | ||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
I'm not sure what this does, so probably deserve a comment above, perhaps with an example? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. addressed in f983fed |
||||||||||||||
if basename in allowlist: | ||||||||||||||
log.debug("%s is found in allowlist, so keeping it: %s", basename, full_path) | ||||||||||||||
elif extension_based[package] and '.' in filename and extension in allowlist: | ||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We're using the same condition twice, so we should introduce a local variable for this, something like:
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. implemented in 6e95efd |
||||||||||||||
log.debug("%s is found in allowlist, so keeping it: %s", extension, full_path) | ||||||||||||||
else: | ||||||||||||||
if extension_based[package]: | ||||||||||||||
print_name = filename | ||||||||||||||
else: | ||||||||||||||
print_name = basename | ||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. code golf:
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. implemented in 21916fd |
||||||||||||||
log.debug("%s is not found in allowlist, so replacing it with symlink: %s", | ||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is a bit confusing, since There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The allowlist is created in the |
||||||||||||||
print_name, full_path) | ||||||||||||||
# the host_injections path is under a fixed repo/location for CUDA or cuDNN | ||||||||||||||
host_inj_path = re.sub(EESSI_INSTALLATION_REGEX, HOST_INJECTIONS_LOCATION, full_path) | ||||||||||||||
# CUDA and cuDNN itself don't care about compute capability so remove this duplication from | ||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. addressed in bf26846 |
||||||||||||||
# under host_injections (symlink to a single CUDA or cuDNN installation for all compute | ||||||||||||||
# capabilities) | ||||||||||||||
accel_subdir = os.getenv("EESSI_ACCELERATOR_TARGET") | ||||||||||||||
if accel_subdir: | ||||||||||||||
host_inj_path = host_inj_path.replace("/accel/%s" % accel_subdir, '') | ||||||||||||||
# make sure source and target of symlink are not the same | ||||||||||||||
if full_path == host_inj_path: | ||||||||||||||
raise EasyBuildError("Source (%s) and target (%s) are the same location, are you sure you " | ||||||||||||||
"are using this hook for an EESSI installation?", | ||||||||||||||
full_path, host_inj_path) | ||||||||||||||
remove_file(full_path) | ||||||||||||||
symlink(host_inj_path, full_path) | ||||||||||||||
|
||||||||||||||
|
||||||||||||||
def inject_gpu_property(ec): | ||||||||||||||
""" | ||||||||||||||
Add 'gpu' property, via modluafooter easyconfig parameter | ||||||||||||||
Add 'gpu' property EESSI<PACKAGE>VERSION envvars and drop dependencies to | ||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. improved in e968608 |
||||||||||||||
build dependencies, via modluafooter easyconfig parameter | ||||||||||||||
""" | ||||||||||||||
ec_dict = ec.asdict() | ||||||||||||||
# Check if CUDA is in the dependencies, if so add the 'gpu' Lmod property | ||||||||||||||
if ('CUDA' in [dep[0] for dep in iter(ec_dict['dependencies'])]): | ||||||||||||||
ec.log.info("Injecting gpu as Lmod arch property and envvar with CUDA version") | ||||||||||||||
# Check if CUDA, cuDNN, you-name-it is in the dependencies, if so | ||||||||||||||
# - drop dependency to build dependency | ||||||||||||||
# - add 'gpu' Lmod property | ||||||||||||||
# - add envvar with package version | ||||||||||||||
packages_list = ( "CUDA", "cuDNN" ) | ||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This should be a constant, which is also used in There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could be a constant. Not sure if it "should" be? If we would want to use that to also check if the mapping in Not sure if either adds much value. |
||||||||||||||
packages_version = { } | ||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. done in 57f5a48 |
||||||||||||||
add_gpu_property = '' | ||||||||||||||
|
||||||||||||||
for package in packages_list: | ||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
(it's a tuple, not a list, so a bit misleading otherwise, and no need to get very wordy with the loop variable) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. List of packages stored in a tuple 😛 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Changed in 57f5a48 |
||||||||||||||
# Check if package is in the dependencies, if so drop dependency to build | ||||||||||||||
# dependency and set variable for later adding the 'gpu' Lmod property | ||||||||||||||
if (package in [dep[0] for dep in iter(ec_dict['dependencies'])]): | ||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Using There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah, it's because we're iterating over a list that we're modifying with Maybe we should use There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Changed in 57f5a48 |
||||||||||||||
add_gpu_property = 'add_property("arch","gpu")' | ||||||||||||||
for dep in iter(ec_dict['dependencies']): | ||||||||||||||
if package in dep[0]: | ||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmm, this should be There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Changed in 57f5a48 |
||||||||||||||
# make package a build dependency only (rpathing saves us from link errors) | ||||||||||||||
ec.log.info("Dropping dependency on %s to build dependency" % package) | ||||||||||||||
ec_dict['dependencies'].remove(dep) | ||||||||||||||
if dep not in ec_dict['builddependencies']: | ||||||||||||||
ec_dict['builddependencies'].append(dep) | ||||||||||||||
# take note of version for creating the modluafooter | ||||||||||||||
packages_version[package] = dep[1] | ||||||||||||||
if add_gpu_property: | ||||||||||||||
ec.log.info("Injecting gpu as Lmod arch property and envvars for dependencies with their version") | ||||||||||||||
key = 'modluafooter' | ||||||||||||||
value = 'add_property("arch","gpu")' | ||||||||||||||
cuda_version = 0 | ||||||||||||||
for dep in iter(ec_dict['dependencies']): | ||||||||||||||
# Make CUDA a build dependency only (rpathing saves us from link errors) | ||||||||||||||
if 'CUDA' in dep[0]: | ||||||||||||||
cuda_version = dep[1] | ||||||||||||||
ec_dict['dependencies'].remove(dep) | ||||||||||||||
if dep not in ec_dict['builddependencies']: | ||||||||||||||
ec_dict['builddependencies'].append(dep) | ||||||||||||||
value = '\n'.join([value, 'setenv("EESSICUDAVERSION","%s")' % cuda_version]) | ||||||||||||||
if key in ec_dict: | ||||||||||||||
if value not in ec_dict[key]: | ||||||||||||||
ec[key] = '\n'.join([ec_dict[key], value]) | ||||||||||||||
values = [add_gpu_property] | ||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. changed in 57f5a48 |
||||||||||||||
for package, version in packages_version.items(): | ||||||||||||||
envvar = "EESSI%sVERSION" % package.upper() | ||||||||||||||
values.append('setenv("%s","%s")' % (envvar, version)) | ||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why are we actually injecting these? Is that just so we can tell which There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't know why it was added for CUDA. For cuDNN, I just do the same. Maybe it was inspired by some
If we want to change that, we should probably do the change in a separate PR before adding |
||||||||||||||
if not key in ec_dict: | ||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. positive logic is easier to follow: # take into account that modluafooter may already be set
if key in ec_dict:
...
else:
... and maybe we should rename There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. changed in 57f5a48 |
||||||||||||||
ec[key] = '\n'.join(values) | ||||||||||||||
else: | ||||||||||||||
ec[key] = value | ||||||||||||||
new_value = ec_dict[key] | ||||||||||||||
for value in values: | ||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. if There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. changed in 57f5a48 |
||||||||||||||
if not value in new_value: | ||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. having a hit here seems unlikely to me, but ok, doesn't hurt I guess :) |
||||||||||||||
new_value = '\n'.join([new_value, value]) | ||||||||||||||
ec[key] = new_value | ||||||||||||||
|
||||||||||||||
return ec | ||||||||||||||
|
||||||||||||||
|
||||||||||||||
|
@@ -873,4 +950,5 @@ def inject_gpu_property(ec): | |||||||||||||
|
||||||||||||||
POST_POSTPROC_HOOKS = { | ||||||||||||||
'CUDA': post_postproc_cuda, | ||||||||||||||
'cuDNN': post_postproc_cudnn, | ||||||||||||||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -122,7 +122,10 @@ copy_files_by_list ${TOPDIR}/scripts ${INSTALL_PREFIX}/scripts "${script_files[@ | |
|
||
# Copy files for the scripts/gpu_support/nvidia directory | ||
nvidia_files=( | ||
install_cuda_host_injections.sh link_nvidia_host_libraries.sh | ||
eessi-2023.06-cuda-and-libraries.yml | ||
install_cuda_and_libraries.sh | ||
install_cuda_host_injections.sh | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Shouldn't we deprecate this script in favor of We should definitely update the documentation at https://www.eessi.io/docs/gpu if we're going forward with There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We could/should do that, but maybe in a separate PR that coordinates the change in the docs? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree, deprecation can be done later. First, this has to be deployed. Then, we can put it in the docs, and people can start using it. Only then should we deprecate the old method. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. follow-up on this via #789 |
||
link_nvidia_host_libraries.sh | ||
) | ||
copy_files_by_list ${TOPDIR}/scripts/gpu_support/nvidia ${INSTALL_PREFIX}/scripts/gpu_support/nvidia "${nvidia_files[@]}" | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
easyconfigs: | ||
- CUDA-12.1.1.eb | ||
- cuDNN-8.9.2.26-CUDA-12.1.1.eb |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@trz42 This doesn't look like an ideal path to use, even if it's temporary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reusing
TMPDIR
via 97d5b67