-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
{2023.06}[foss/2023a] PyTorch v2.1.2 w/ CUDA 12.1.1 #825
base: 2023.06-software.eessi.io
Are you sure you want to change the base?
Changes from 2 commits
e4f37fe
4504675
6140430
eff0de6
4218d01
9322c19
892337a
13f5946
d3345ea
8df8a21
32e1c8e
4ac3218
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,4 @@ | ||
easyconfigs: | ||
- CUDA-12.1.1.eb | ||
- cuDNN-8.9.2.26-CUDA-12.1.1.eb | ||
- PyTorch-2.1.2-foss-2023a-CUDA-12.1.1.eb |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -604,6 +604,32 @@ def pre_configure_hook_LAMMPS_zen4(self, *args, **kwargs): | |
raise EasyBuildError("LAMMPS-specific hook triggered for non-LAMMPS easyconfig?!") | ||
|
||
|
||
|
||
def pre_configure_hook_pytorch_add_cupti_libdir(self, *args, **kwargs): | ||
""" | ||
Pre-configure hook for PyTorch: add directory $EESSI_SOFTWARE_PATH/software/CUDA/12.1.1/extras/CUPTI/lib64 to LIBRARY_PATH | ||
""" | ||
if self.name == 'PyTorch' and self.version == '2.1.2': | ||
if 'cudaver' in self.cfg.template_values and self.cfg.template_values['cudaver'] == '12.1.1': | ||
_cudaver = self.cfg.template_values['cudaver'] | ||
print_msg("pre_configure_hook_pytorch_add_cupti_libdir: CUDA version: '%s'" % _cudaver) | ||
_library_path = os.getenv('LIBRARY_PATH') | ||
print_msg("pre_configure_hook_pytorch_add_cupti_libdir: library_path: '%s'", _library_path) | ||
_eessi_software_path = os.getenv('EESSI_SOFTWARE_PATH') | ||
print_msg("pre_configure_hook_pytorch_add_cupti_libdir: eessi_software_path: '%s'", _eessi_software_path) | ||
_cupti_lib_dir = os.path.join(_eessi_software_path, 'software', 'CUDA', _cudaver, 'extras', 'CUPTI', 'lib64') | ||
print_msg("pre_configure_hook_pytorch_add_cupti_libdir: cupti_lib_dir: '%s'", _cupti_lib_dir) | ||
if _library_path: | ||
env.setvar('LIBRARY_PATH', ':'.join([_library_path, _cupti_lib_dir])) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This seems like a bug in our CUDA installation/module, no? I'm fine with proceeding like this for now, even if we also fix it somewhere else this won't cause trouble, but there's probably a more general fix for this? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Right, we might find a better solution by changing the CUDA module, eg, by adding the directory to LIBRARY_PATH through the module. It could be a worthwhile effort to try. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @boegel could it be that A little before that line |
||
else: | ||
env.setvar('LIBRARY_PATH', _cupti_lib_dir) | ||
print_msg("pre_configure_hook_pytorch_add_cupti_libdir: LIBRARY_PATH: '%s'", os.getenv('LIBRARY_PATH')) | ||
else: | ||
print_msg("PyTorch/2.1.2-specific pre_configure hook triggered for non-CUDA or non-CUDA/12.1.1 easyconfig triggered; NOT adding CUPTI lib64 dir to LIBRARY_PATH") | ||
else: | ||
raise EasyBuildError("PyTorch/2.1.2-specific pre_configure hook triggered for non-PyTorch/2.1.2 easyconfig?!") | ||
|
||
|
||
def pre_test_hook(self, *args, **kwargs): | ||
"""Main pre-test hook: trigger custom functions based on software name.""" | ||
if self.name in PRE_TEST_HOOKS: | ||
|
@@ -995,6 +1021,7 @@ def inject_gpu_property(ec): | |
'OpenBLAS': pre_configure_hook_openblas_optarch_generic, | ||
'WRF': pre_configure_hook_wrf_aarch64, | ||
'LAMMPS': pre_configure_hook_LAMMPS_zen4, | ||
'PyTorch': pre_configure_hook_pytorch_add_cupti_libdir, | ||
'Score-P': pre_configure_hook_score_p, | ||
} | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see a reason to make this specific to a particular
PyTorch
version orCUDA
version?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We only know that the failure happens in this specific case. If we apply it to other cases, we will not know whether it was necessary or not.