Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

README instructions outdated (?), build fails with CUDA enabled #200

Open
samuelpmishLLNL opened this issue Oct 19, 2021 · 5 comments
Open

Comments

@samuelpmishLLNL
Copy link

samuelpmishLLNL commented Oct 19, 2021

On an Ubuntu 20.04 machine, with cuda 11.4 and g++ 9.3, I follow the instructions on the README:

$ git clone [email protected]:LLNL/CHAI.git
...
$ cd CHAI
$ git submodule update --init --recursive
...
$ mkdir build && cd build
$ cmake -DCUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda ../
...
-- CUDA Support is Off 
...
CMake Warning:
  Manually-specified variables were not used by the project:

    CUDA_TOOLKIT_ROOT_DIR

So, it seems the toolkit directory is being ignored and not actually enabling cuda (?). If we force cuda to be enabled, cmake configures as one would expect, but the library itself fails to build:

$ cmake -DCUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda -DENABLE_CUDA=ON ../
...
-- CUDA Support is ON
...
-- Configuring done
-- Generating done
-- Build files have been written to: ...
$ make -j
[  0%] Building CXX object blt/thirdparty_builtin/googletest-master-2020-01-07/googletest/CMakeFiles/gtest.dir/src/gtest-all.cc.o
[  0%] Building CXX object blt/tests/smoke/CMakeFiles/blt_cuda_version_smoke.dir/blt_cuda_version_smoke.cpp.o
[  3%] Building CUDA object blt/tests/smoke/CMakeFiles/blt_cuda_smoke.dir/blt_cuda_smoke.cpp.o
...
[ 94%] Linking CUDA device code CMakeFiles/chai-example.exe.dir/cmake_device_link.o
/usr/bin/ld: ../lib/libumpire.a(Allocator.cpp.o): in function `__sti____cudaRegisterAll()':
tmpxft_0001b8fd_00000000-6_Allocator.cudafe1.cpp:(.text+0xee3): undefined reference to `__cudaRegisterLinkedBinary_44_tmpxft_0001b8fd_00000000_7_Allocator_cpp1_ii_a17095a1'
/usr/bin/ld: ../lib/libumpire.a(Replay.cpp.o): in function `__sti____cudaRegisterAll()':
tmpxft_0001b8fe_00000000-6_Replay.cudafe1.cpp:(.text+0x6fb): undefined reference to `__cudaRegisterLinkedBinary_41_tmpxft_0001b8fe_00000000_7_Replay_cpp1_ii_5eca6429'
/usr/bin/ld: ../lib/libumpire.a(ResourceManager.cpp.o): in function `__sti____cudaRegisterAll()':
tmpxft_0001b8f9_00000000-6_ResourceManager.cudafe1.cpp:(.text+0xe1a3): undefined reference to `__cudaRegisterLinkedBinary_50_tmpxft_0001b8f9_00000000_7_ResourceManager_cpp1_ii_42a9a1b2'

There are many more errors like this.

@davidbeckingsale
Copy link
Member

What version of CMake are you using?

@samuelpmishLLNL
Copy link
Author

What version of CMake are you using?

3.20.1

@davidbeckingsale
Copy link
Member

Okay, so there are two things here - the ENABLE_CUDA option is required (but at one point in time was the default, so wasn't needed). For the build errors, I'm not sure. We have a build configuration in CI very similar to what you describe and it's working fine: https://github.com/LLNL/CHAI/blob/develop/Dockerfile#L54

@samuelpmishLLNL
Copy link
Author

Okay, so there are two things here - the ENABLE_CUDA option is required (but at one point in time was the default, so wasn't needed)

Then can you please modify the main README to provide up-to-date instructions on how to build?

For the build errors, I'm not sure. We have a build configuration in CI very similar to what you describe and it's working fine

Our build of CHAI was broken when installing through spack, so I tried compiling manually and both cases produced the same errors indicated above.

FROM axom/compilers:nvcc-10 AS nvcc

Perhaps it's worth testing against the most recent major release of cuda (v10.0 is ~3 years old)

@samuelpmishLLNL
Copy link
Author

samuelpmishLLNL commented Oct 20, 2021

Update: when configuring with an old version of CMake (3.14), CHAI does build without error:

$ /path/to/cmake-3.14.0/bin/cmake -DCUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda -DENABLE_CUDA=ON ../
...
$ make 
...
[100%] Built target managed_ptr_tests
[100%] Linking CUDA device code CMakeFiles/managed_array_tests.dir/cmake_device_link.o
[100%] Linking CXX executable ../../bin/managed_array_tests
[100%] Built target managed_array_tests
[100%] Linking CUDA device code CMakeFiles/primary_pool_tests.dir/cmake_device_link.o
[100%] Linking CXX executable ../../../../../bin/primary_pool_tests
[100%] Built target primary_pool_tests
$

Perhaps the discrepancy is related to some of the recent changes to CMake's built-in support for CUDA. It would be good if CHAI could discover which versions of CMake it does support, and indicate that on the README / documentation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants