Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

{2023.06}[foss/2023a] cuDNN 8.9.2.26 w/ CUDA 12.1.1 (part 1) #772

Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
Show all changes
53 commits
Select commit Hold shift + click to select a range
4c5c5a3
{2023.06}[foss/2023a] cuDNN 8.9.2.26 w/ CUDA 12.1.1
truib Oct 2, 2024
c5c7dcc
Merge branch '2023.06-software.eessi.io' of github-trz:EESSI/software…
truib Oct 2, 2024
da7c1e4
use post sanity-check hook for cuDNN
truib Oct 2, 2024
454d2bb
install cuDNN under host_injections before installing it under /cvmfs
truib Oct 3, 2024
6824d75
use post_postproc hook to convert some cuDNN files to symlinks
truib Oct 3, 2024
0e92d8d
Merge branch '2023.06-software.eessi.io' of github-trz:EESSI/software…
truib Oct 3, 2024
044e168
explain idea for extension_based and reformat its definition
truib Oct 3, 2024
f983fed
explain why we need to obtain the extension and improve cond expr
truib Oct 3, 2024
6e95efd
use local var for conditional expression + slightly reorder code
truib Oct 3, 2024
21916fd
code golf
truib Oct 3, 2024
bf26846
improve comment (also anticipating additional cu* libraries in the fu…
truib Oct 3, 2024
e1ba74f
improve parameter name
truib Oct 3, 2024
1f1eada
explain use of rstrip
truib Oct 3, 2024
fc00d0c
raise error if search string wasn't found
truib Oct 3, 2024
e968608
improved docstring
truib Oct 3, 2024
9c6c3a4
Merge branch '2023.06-software.eessi.io' of github-trz:EESSI/software…
truib Oct 4, 2024
97d5b67
use TMPDIR as base for temporary storage
truib Oct 4, 2024
8e7a0e8
attempt to use a single easystack file for CUDA/cu* packages
truib Oct 4, 2024
57f5a48
various improvements for inject_gpu_property
truib Oct 4, 2024
21ffc18
various improvements for install_cuda_and_libraries.sh
truib Oct 4, 2024
a3edc20
show available *CUDA* modules for easier debugging
truib Oct 4, 2024
7f601dc
print and adjust MODULEPATH
truib Oct 4, 2024
e3101b0
implement option 3 to install module files in hidden directory
truib Oct 4, 2024
e9018fb
Move to gpu_support/nvidia subdir
Oct 9, 2024
54753c3
Make comment more explicit that this is only about nvidia GPU support
Oct 9, 2024
2b76a54
Moved easystack file
Oct 9, 2024
086ba5a
Change how EESSI_SITE_INSTALL is used
Oct 9, 2024
ecd30ca
First attempt at making this loop over EasyStack files, loading the c…
Oct 9, 2024
18cdaa2
not sure why this is not working, see if this solves it
Oct 9, 2024
5716424
This was the only way in which I got this to work. Otherwise, it does…
Oct 9, 2024
9f3853c
Added include-easyblocks-from-commit
Oct 9, 2024
b9017d4
Easystack no longer passed as option. Comment is outdated, since we n…
Oct 9, 2024
eea2879
Make sure easystack file for host_injections is shipped
Oct 9, 2024
33199d7
Remove rebuild, change comments that were out of date
Oct 9, 2024
06cd2ea
Only loop over the easystacks with CUDA in the name
Oct 9, 2024
a27683e
Merge branch '2023.06-software.eessi.io' of github-trz:EESSI/software…
truib Oct 14, 2024
d5572ea
Merge branch 'EESSI:2023.06-software.eessi.io' into 2023.06-software.…
trz42 Oct 14, 2024
b68fdfa
Merge branch '2023.06-software.eessi.io-cuDNN-8.9.2.26-part-1' of git…
truib Oct 14, 2024
0e7c9d8
add a bit more debug output, use *SITE_SOFTWARE_PATH and minor tweaks
truib Oct 15, 2024
d57f8d8
replace TAB with WHITESPACEs
truib Oct 15, 2024
02d3e1e
show more msgs when building and init full environment
truib Oct 15, 2024
b28017a
use zero length env vars
truib Oct 15, 2024
20aacdc
fix syntax issue
truib Oct 15, 2024
88bdb88
tweak variable expansion in test
truib Oct 15, 2024
7b8ba8b
dont use hooks when installing into host_injections
truib Oct 15, 2024
a95d546
revert variable expansion and unset certain variables instead
truib Oct 15, 2024
45fa6b1
log if Lmod rc/SitePackage are being created
truib Oct 15, 2024
c3482b2
show full path to Lmod RC/SitePackage when created
truib Oct 15, 2024
db90ca7
adjust path to lua files if building accelerator software
truib Oct 15, 2024
6a0223c
small typo fixed
truib Oct 16, 2024
affe37b
add comment to clarify setting of MODULEPATH
trz42 Oct 16, 2024
d2d95e9
clarify need for option
trz42 Oct 16, 2024
77f3bc9
revert to silent sourcing, keep initialising full environment and cla…
truib Oct 16, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 6 additions & 3 deletions EESSI-install-software.sh
Original file line number Diff line number Diff line change
Expand Up @@ -234,7 +234,7 @@ pr_diff=$(ls [0-9]*.diff | head -1)
# for now, this just reinstalls all scripts. Note the most elegant, but works
${TOPDIR}/install_scripts.sh --prefix ${EESSI_PREFIX}

# Install full CUDA SDK in host_injections
# Install full CUDA SDK and cu* libraries in host_injections
# Hardcode this for now, see if it works
# TODO: We should make a nice yaml and loop over all CUDA versions in that yaml to figure out what to install
# Allow skipping CUDA SDK install in e.g. CI environments
Expand All @@ -250,9 +250,12 @@ else
fi

if [ -z "${skip_cuda_install}" ] || [ ! "${skip_cuda_install}" ]; then
${EESSI_PREFIX}/scripts/gpu_support/nvidia/install_cuda_host_injections.sh -c 12.1.1 --accept-cuda-eula
${EESSI_PREFIX}/scripts/gpu_support/nvidia/install_cuda_and_libraries.sh \
-e ${EESSI_PREFIX}/scripts/gpu_support/nvidia/eessi-2023.06-cuda-and-libraries.yml \
-t /tmp/temp \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@trz42 This doesn't look like an ideal path to use, even if it's temporary?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reusing TMPDIR via 97d5b67

--accept-cuda-eula
else
echo "Skipping installation of CUDA SDK in host_injections, since the --skip-cuda-install flag was passed OR no EasyBuild module was found"
echo "Skipping installation of CUDA SDK and cu* libraries in host_injections, since the --skip-cuda-install flag was passed OR no EasyBuild module was found"
fi

# Install NVIDIA drivers in host_injections (if they exist)
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
easyconfigs:
- cuDNN-8.9.2.26-CUDA-12.1.1.eb
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for completeness' sake, I would also list CUDA-12.1.1.eb in here.

It's already installed, so won't make a difference w.r.t. produced installation, but it looks a bit strange not having it listed explicitly in an accel easystack file...

I was also wondering whether we need both this easystack file and the one under scripts/gpu_support/nvidia.

I understand we do because the latter gets shipped with EESSI, but I'm wondering if we should symlink them or something, since they should stay in sync, no?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keeping them in sync would be good. Not sure it works with a symlink though.

If we keep the file under scripts/gpu_support_nvidia and use a symlink from easystacks/.../accel/nividia/some_easystack_file will a change (addition of another package, say cuTENSOR) let our build procedure notice the change?

If we symlink in the other direction, will our install_scripts.sh file just copy the symlink or the easystack file the symlink points to?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made an attempt to arrange for that (only having one file ... or better not duplicates with the same content) in 8e7a0e8

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After todays GPU tiger meeting, we agreed to separate these after all. The list of what is needed for host_injections is different from what we install. It's only those modules that we have to strip down that we have to then install in host_injections. Keeping that in a separate file is ok: it's not duplication, since that is the unique list of package that are needed in host injections (for nvidia GPU support)

174 changes: 126 additions & 48 deletions eb_hooks.py
Original file line number Diff line number Diff line change
Expand Up @@ -756,64 +756,141 @@ def post_postproc_cuda(self, *args, **kwargs):
if 'libcudart' not in allowlist:
raise EasyBuildError("Did not find 'libcudart' in allowlist: %s" % allowlist)

# iterate over all files in the CUDA installation directory
for dir_path, _, files in os.walk(self.installdir):
for filename in files:
full_path = os.path.join(dir_path, filename)
# we only really care about real files, i.e. not symlinks
if not os.path.islink(full_path):
# check if the current file name stub is part of the allowlist
basename = filename.split('.')[0]
if basename in allowlist:
self.log.debug("%s is found in allowlist, so keeping it: %s", basename, full_path)
else:
self.log.debug("%s is not found in allowlist, so replacing it with symlink: %s",
basename, full_path)
# if it is not in the allowlist, delete the file and create a symlink to host_injections

# the host_injections path is under a fixed repo/location for CUDA
host_inj_path = re.sub(EESSI_INSTALLATION_REGEX, HOST_INJECTIONS_LOCATION, full_path)
# CUDA itself doesn't care about compute capability so remove this duplication from
# under host_injections (symlink to a single CUDA installation for all compute
# capabilities)
accel_subdir = os.getenv("EESSI_ACCELERATOR_TARGET")
if accel_subdir:
host_inj_path = host_inj_path.replace("/accel/%s" % accel_subdir, '')
# make sure source and target of symlink are not the same
if full_path == host_inj_path:
raise EasyBuildError("Source (%s) and target (%s) are the same location, are you sure you "
"are using this hook for an EESSI installation?",
full_path, host_inj_path)
remove_file(full_path)
symlink(host_inj_path, full_path)
# replace files that are not distributable with symlinks into
# host_injections
replace_non_distributable_files_with_symlinks(self.log, self.installdir, self.name, allowlist)
else:
raise EasyBuildError("CUDA-specific hook triggered for non-CUDA easyconfig?!")


def post_postproc_cudnn(self, *args, **kwargs):
"""
Remove files from cuDNN installation that we are not allowed to ship,
and replace them with a symlink to a corresponding installation under host_injections.
"""

# We need to check if we are doing an EESSI-distributed installation
eessi_installation = bool(re.search(EESSI_INSTALLATION_REGEX, self.installdir))

if self.name == 'cuDNN' and eessi_installation:
print_msg("Replacing files in cuDNN installation that we can not ship with symlinks to host_injections...")

allowlist = ['LICENSE']

# read cuDNN LICENSE, construct allowlist based on section "2. Distribution" that specifies list of files that can be shipped
license_path = os.path.join(self.installdir, 'LICENSE')
search_string = "2. Distribution. The following portions of the SDK are distributable under the Agreement:"
casparvl marked this conversation as resolved.
Show resolved Hide resolved
with open(license_path) as infile:
for line in infile:
if line.strip().startswith(search_string):
# remove search string, split into words, remove trailing
# dots '.' and only retain words starting with a dot '.'
distributable = line[len(search_string):]
for word in distributable.split():
if word[0] == '.':
allowlist.append(word.rstrip('.'))
casparvl marked this conversation as resolved.
Show resolved Hide resolved

allowlist = sorted(set(allowlist))
self.log.info("Allowlist for files in cuDNN installation that can be redistributed: " + ', '.join(allowlist))

# replace files that are not distributable with symlinks into
# host_injections
replace_non_distributable_files_with_symlinks(self.log, self.installdir, self.name, allowlist)
else:
raise EasyBuildError("cuDNN-specific hook triggered for non-cuDNN easyconfig?!")


def replace_non_distributable_files_with_symlinks(log, install_dir, package, allowlist):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe use pkg_name rather than package (latter is kind of vague)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed in e1ba74f

"""
Replace files that cannot be distributed with symlinks into host_injections
"""
extension_based = { "CUDA": False, "cuDNN": True }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would reformat this for clarify, and add a comment above to explain what True or False means?

Suggested change
extension_based = { "CUDA": False, "cuDNN": True }
extension_based = {
"CUDA": False,
"cuDNN": True,
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 044e168

if not package in extension_based:
raise EasyBuildError("Don't know how to strip non-distributable files from package %s.", package)

# iterate over all files in the package installation directory
for dir_path, _, files in os.walk(install_dir):
for filename in files:
full_path = os.path.join(dir_path, filename)
# we only really care about real files, i.e. not symlinks
if not os.path.islink(full_path):
# check if the current file name stub is part of the allowlist
basename = filename.split('.')[0]
if extension_based[package]:
if '.' in filename:
extension = '.' + filename.split('.')[1]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if extension_based[package]:
if '.' in filename:
extension = '.' + filename.split('.')[1]
if extension_based[package] and '.' in filename:
extension = '.' + filename.split('.')[1]

I'm not sure what this does, so probably deserve a comment above, perhaps with an example?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed in f983fed

if basename in allowlist:
log.debug("%s is found in allowlist, so keeping it: %s", basename, full_path)
elif extension_based[package] and '.' in filename and extension in allowlist:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're using the same condition twice, so we should introduce a local variable for this, something like:

Suggested change
elif extension_based[package] and '.' in filename and extension in allowlist:
check_by_extension = extension_based[package] and '.' in filename
if check_by_extension:
extension = '.' + filename.split('.')[1]
...
elif check_by_extension and extension in allowlist:

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

implemented in 6e95efd

log.debug("%s is found in allowlist, so keeping it: %s", extension, full_path)
else:
if extension_based[package]:
print_name = filename
else:
print_name = basename
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code golf:

Suggested change
if extension_based[package]:
print_name = filename
else:
print_name = basename
print_name = filename if extension_based[package] else basename

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

implemented in 21916fd

log.debug("%s is not found in allowlist, so replacing it with symlink: %s",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit confusing, since filename will never be explicitly in the allowlist (only extensions will be)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The allowlist is created in the post_postproc_{cuda,cudnn} function. For CUDA it contains 'EULA' (note without suffix '.txt'), 'README' and a list of file name "stubs" (only the first component when the full file names are split at '.'). For cuDNN, it contains 'LICENSE', '.so', ...

print_name, full_path)
# the host_injections path is under a fixed repo/location for CUDA or cuDNN
host_inj_path = re.sub(EESSI_INSTALLATION_REGEX, HOST_INJECTIONS_LOCATION, full_path)
# CUDA and cuDNN itself don't care about compute capability so remove this duplication from
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# CUDA and cuDNN itself don't care about compute capability so remove this duplication from
# CUDA and cu* libraries themselves don't care about compute capability so remove this duplication from

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed in bf26846

# under host_injections (symlink to a single CUDA or cuDNN installation for all compute
# capabilities)
accel_subdir = os.getenv("EESSI_ACCELERATOR_TARGET")
if accel_subdir:
host_inj_path = host_inj_path.replace("/accel/%s" % accel_subdir, '')
# make sure source and target of symlink are not the same
if full_path == host_inj_path:
raise EasyBuildError("Source (%s) and target (%s) are the same location, are you sure you "
"are using this hook for an EESSI installation?",
full_path, host_inj_path)
remove_file(full_path)
symlink(host_inj_path, full_path)


def inject_gpu_property(ec):
"""
Add 'gpu' property, via modluafooter easyconfig parameter
Add 'gpu' property EESSI<PACKAGE>VERSION envvars and drop dependencies to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Add 'gpu' property EESSI<PACKAGE>VERSION envvars and drop dependencies to
Add 'gpu' property + EESSI<PACKAGE>VERSION envvars, and drop dependencies to

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

improved in e968608

build dependencies, via modluafooter easyconfig parameter
"""
ec_dict = ec.asdict()
# Check if CUDA is in the dependencies, if so add the 'gpu' Lmod property
if ('CUDA' in [dep[0] for dep in iter(ec_dict['dependencies'])]):
ec.log.info("Injecting gpu as Lmod arch property and envvar with CUDA version")
# Check if CUDA, cuDNN, you-name-it is in the dependencies, if so
# - drop dependency to build dependency
# - add 'gpu' Lmod property
# - add envvar with package version
packages_list = ( "CUDA", "cuDNN" )
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be a constant, which is also used in replace_non_distributable_files_with_symlinks to check whether the mapping in extension_based is complete?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be a constant. Not sure if it "should" be?

If we would want to use that to also check if the mapping in extension_based is complete, we would assume that we call this function (replace...) for all (CUDA-dependent) packages we run inject_gpu_property for?

Not sure if either adds much value.

packages_version = { }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pkg_versions

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done in 57f5a48

add_gpu_property = ''

for package in packages_list:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for package in packages_list:
for pkg_name in pkg_names:

(it's a tuple, not a list, so a bit misleading otherwise, and no need to get very wordy with the loop variable)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

List of packages stored in a tuple 😛

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed in 57f5a48

# Check if package is in the dependencies, if so drop dependency to build
# dependency and set variable for later adding the 'gpu' Lmod property
if (package in [dep[0] for dep in iter(ec_dict['dependencies'])]):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using iter seems overkill to me, also below, I don't see the point in doing that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, it's because we're iterating over a list that we're modifying with .remove.

Maybe we should use deps = ec_dict['dependencies'][:] instead (make a copy), and loop over deps (while still doing the .remove on ec_dict['dependencies'], and add a comment to clarify why that's done?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed in 57f5a48

add_gpu_property = 'add_property("arch","gpu")'
for dep in iter(ec_dict['dependencies']):
if package in dep[0]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, this should be ==, not in, we're comparing software names here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed in 57f5a48

# make package a build dependency only (rpathing saves us from link errors)
ec.log.info("Dropping dependency on %s to build dependency" % package)
ec_dict['dependencies'].remove(dep)
if dep not in ec_dict['builddependencies']:
ec_dict['builddependencies'].append(dep)
# take note of version for creating the modluafooter
packages_version[package] = dep[1]
if add_gpu_property:
ec.log.info("Injecting gpu as Lmod arch property and envvars for dependencies with their version")
key = 'modluafooter'
value = 'add_property("arch","gpu")'
cuda_version = 0
for dep in iter(ec_dict['dependencies']):
# Make CUDA a build dependency only (rpathing saves us from link errors)
if 'CUDA' in dep[0]:
cuda_version = dep[1]
ec_dict['dependencies'].remove(dep)
if dep not in ec_dict['builddependencies']:
ec_dict['builddependencies'].append(dep)
value = '\n'.join([value, 'setenv("EESSICUDAVERSION","%s")' % cuda_version])
if key in ec_dict:
if value not in ec_dict[key]:
ec[key] = '\n'.join([ec_dict[key], value])
values = [add_gpu_property]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

values is a bit confusing here, maybe extra_mod_footer_lines is better?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed in 57f5a48

for package, version in packages_version.items():
envvar = "EESSI%sVERSION" % package.upper()
values.append('setenv("%s","%s")' % (envvar, version))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we actually injecting these?

Is that just so we can tell which CUDA or cuDNN we're using (since it's a build dep no module will be loaded for these)?
Should we use EESSI_<pkgname>_VERSION to make this a bit more readable for humans?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know why it was added for CUDA. For cuDNN, I just do the same.

Maybe it was inspired by some EBVERSIONCUDA envvar name? If we change the names we would have to rebuild some module files. See list below

$ grep VERSION /cvmfs/software.eessi.io/versions/2023.06/software/linux/x86_64/amd/zen2/accel/nvidia/cc80/modules/all/*/*.lua | sed -e 's/.*all//'
/CUDA/12.1.1.lua:setenv("EBVERSIONCUDA", "12.1.1")
/CUDA-Samples/12.1-GCC-12.3.0-CUDA-12.1.1.lua:setenv("EBVERSIONCUDAMINSAMPLES", "12.1")
/CUDA-Samples/12.1-GCC-12.3.0-CUDA-12.1.1.lua:setenv("EESSICUDAVERSION","12.1.1")
/ESPResSo/4.2.2-foss-2023a-CUDA-12.1.1.lua:setenv("EBVERSIONESPRESSO", "4.2.2")
/ESPResSo/4.2.2-foss-2023a-CUDA-12.1.1.lua:setenv("EESSICUDAVERSION","12.1.1")
/LAMMPS/2Aug2023_update2-foss-2023a-kokkos-CUDA-12.1.1.lua:setenv("EBVERSIONLAMMPS", "2Aug2023_update2")
/LAMMPS/2Aug2023_update2-foss-2023a-kokkos-CUDA-12.1.1.lua:setenv("EESSICUDAVERSION","12.1.1")
/NCCL/2.18.3-GCCcore-12.3.0-CUDA-12.1.1.lua:setenv("EBVERSIONNCCL", "2.18.3")
/NCCL/2.18.3-GCCcore-12.3.0-CUDA-12.1.1.lua:setenv("EESSICUDAVERSION","12.1.1")
/OSU-Micro-Benchmarks/7.2-gompi-2023a-CUDA-12.1.1.lua:setenv("EBVERSIONOSUMINMICROMINBENCHMARKS", "7.2")
/OSU-Micro-Benchmarks/7.2-gompi-2023a-CUDA-12.1.1.lua:setenv("EESSICUDAVERSION","1.14.1")
/UCC-CUDA/1.2.0-GCCcore-12.3.0-CUDA-12.1.1.lua:setenv("EBVERSIONUCCMINCUDA", "1.2.0")
/UCC-CUDA/1.2.0-GCCcore-12.3.0-CUDA-12.1.1.lua:setenv("EESSICUDAVERSION","12.1.1")
/UCX-CUDA/1.14.1-GCCcore-12.3.0-CUDA-12.1.1.lua:setenv("EBVERSIONUCXMINCUDA", "1.14.1")
/UCX-CUDA/1.14.1-GCCcore-12.3.0-CUDA-12.1.1.lua:setenv("EESSICUDAVERSION","12.1.1")

If we want to change that, we should probably do the change in a separate PR before adding cuDNN.

if not key in ec_dict:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

positive logic is easier to follow:

# take into account that modluafooter may already be set
if key in ec_dict:
    ...
else:
    ...

and maybe we should rename key to modluafooter to avoid confusion (or mod_footer if you want to plan ahead to a time where we may also have Tcl modules, but I doubt that'll happen)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed in 57f5a48

ec[key] = '\n'.join(values)
else:
ec[key] = value
new_value = ec_dict[key]
for value in values:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if values is renamed to extra_mod_footer_lines and you iterate here with for line in extra_mod_footer_lines, then new_value can just be value :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed in 57f5a48

if not value in new_value:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

having a hit here seems unlikely to me, but ok, doesn't hurt I guess :)

new_value = '\n'.join([new_value, value])
ec[key] = new_value

return ec


Expand Down Expand Up @@ -873,4 +950,5 @@ def inject_gpu_property(ec):

POST_POSTPROC_HOOKS = {
'CUDA': post_postproc_cuda,
'cuDNN': post_postproc_cudnn,
}
5 changes: 4 additions & 1 deletion install_scripts.sh
Original file line number Diff line number Diff line change
Expand Up @@ -122,7 +122,10 @@ copy_files_by_list ${TOPDIR}/scripts ${INSTALL_PREFIX}/scripts "${script_files[@

# Copy files for the scripts/gpu_support/nvidia directory
nvidia_files=(
install_cuda_host_injections.sh link_nvidia_host_libraries.sh
eessi-2023.06-cuda-and-libraries.yml
install_cuda_and_libraries.sh
install_cuda_host_injections.sh
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we deprecate this script in favor of install_cuda_and_libraries.sh?

We should definitely update the documentation at https://www.eessi.io/docs/gpu if we're going forward with install_cuda_and_libraries.sh

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could/should do that, but maybe in a separate PR that coordinates the change in the docs?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, deprecation can be done later. First, this has to be deployed. Then, we can put it in the docs, and people can start using it. Only then should we deprecate the old method.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

follow-up on this via #789

link_nvidia_host_libraries.sh
)
copy_files_by_list ${TOPDIR}/scripts/gpu_support/nvidia ${INSTALL_PREFIX}/scripts/gpu_support/nvidia "${nvidia_files[@]}"

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
easyconfigs:
- CUDA-12.1.1.eb
- cuDNN-8.9.2.26-CUDA-12.1.1.eb
Loading
Loading