Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmake updates worth considering #9

Open
svenevs opened this issue Oct 19, 2022 · 6 comments
Open

cmake updates worth considering #9

svenevs opened this issue Oct 19, 2022 · 6 comments

Comments

@svenevs
Copy link

svenevs commented Oct 19, 2022

Redirect from #8 (comment)

This project should consider

  1. Stop using the deprecated find_package(CUDA). You only need the cuda runtime for this project from a quick glance (so no need to have project(... LANGUAGES CUDA) since no CUDA source files. The update process may be painful depending on what you were or were not using from find_package(CUDA). You should only need to link against CUDA::cudart from a cursory glance.
  2. Right now a hard dependency on hdf5-shared shows up in the generated cmake config files that are installed. That may be problematic for an application that is consuming hdf5-static? Unclear what the right solution is.
  3. The header file directory is not included in the interface library include directories for the generated cmake config files that are installed -- see NOTE 4 at the bottom of the linked code.
@brtnfld
Copy link
Contributor

brtnfld commented Oct 20, 2022

We are working on updating CMake; first, we are trying to get it to work with GitHub actions. It looks to me that the installation instructions are not quite right. I've removed the use of find_package(CUDA), but it still has an issue finding/linking the cuda libraries. I'm wondering, though, if we should be using FindCUDAToolkit instead.

Currently, we are experimenting with it at, no successful build yet:
https://github.com/brtnfld/vfd-gds

@svenevs
Copy link
Author

svenevs commented Oct 20, 2022

We are working on updating CMake; first, we are trying to get it to work with GitHub actions.

Always an adventure 🙂 Are CMake changes needed to get them running at all? It could be an environment problem, I don't see anything overtly wrong with the current system -- it's just the old cmake style.

I've removed the use of find_package(CUDA), but it still has an issue finding/linking the cuda libraries.

The process is a bit more involved, partly because you have to adopt some target-based approaches related to the packaging.

From what I can see this library needs to be able to (1) compile and link against the cuda runtime and (2) specifically cufile. In the previous versions of CMake tactics you'd accumulate a list of include directories, libraries to link, and compiler definitions and store them all in a list. Since

add_library(hdf5_vfd_gds ${HDF5_VFD_GDS_SRCS})

is going to ultimately include / link against

vfd-gds/CMakeLists.txt

Lines 85 to 93 in 4906c76

set(HDF5_VFD_GDS_EXT_INCLUDE_DEPENDENCIES
${HDF5_VFD_GDS_EXT_INCLUDE_DEPENDENCIES}
${CUDA_INCLUDE_DIRS}
)
set(HDF5_VFD_GDS_EXT_LIB_DEPENDENCIES
${HDF5_VFD_GDS_EXT_LIB_DEPENDENCIES}
${CUDA_LIBRARIES}
${HDF5_VFD_GDS_CUFILE_LIB}
)

where CUDA_INCLUDE_DIRS and CUDA_LIBRARIES came from find_package(CUDA). So to replace it using the target-based approach, you instead do (untested not on cuda machine)

find_package(CUDAToolkit)
# ... at some point after `add_library` aka this target exists ...
# note: you can `add_library(target-name "")` at the beginning of your CMake with no sources
# so that the target exists immediately and then do `target_sources` later on, mentioning because
# your dependencies search happens before the `add_subdirectory`
# https://cmake.org/cmake/help/latest/module/FindCUDAToolkit.html#cuda-toolkit-rt-lib
target_link_libraries(hdf5_vfd_gds PUBLIC CUDA::cudart)  # or CUDA::cudart_static but less common

Similarly for cufile you can CUDA::cuFile (but this was added in 3.25 so that is your minimum version). If that's too recent you can keep the find_library there but instead of

set (HDF5_VFD_GDS_CUFILE_DIR ${CUDA_TOOLKIT_ROOT_DIR} CACHE PATH "Cufile installation directory for Nvidia GDS support")

just do it after finding CUDAToolkit and use CUDAToolkit_LIBRARY_DIR for the search path (CUDA_TOOLKIT_ROOT_DIR also comes from find_package(CUDA)). That option (set(...CACHE)) can probably just be removed?

Note that once you start linking against targets in your build system, consumers need it in your config file. find_dependency(CUDAToolkit), more info here.

Hope that helps clarify some!

@brtnfld
Copy link
Contributor

brtnfld commented Oct 20, 2022

Indeed, if I don't use PR #5 CMake changes, it compiles successfully. But when I try to run the tests it fails because it is looking for
libcuda.so.1, but what is installed is:

libOpenCL.so
libOpenCL.so.1
libOpenCL.so.1.0
libOpenCL.so.1.0.0
libcudadevrt.a
libcudart.so
libcudart.so.11.0
libcudart.so.11.8.89
libcudart_static.a
libcufile.so
libcufile.so.0
libcufile.so.1.4.0
libcufile_rdma.so
libcufile_rdma.so.1
libcufile_rdma.so.1.4.0
libcufile_rdma_static.a
libcufile_static.a
libculibos.a

@svenevs
Copy link
Author

svenevs commented Oct 21, 2022

But when I try to run the tests it fails because it is looking for libcuda.so.1

Hmm, that does seem like an installation problem. That's a pretty critical library to be missing (that's the driver api, as opposed to the runtime api cudart).

sudo apt-get install -y cuda-nvcc-11-8 libcufile-dev-11-8

I'm not sure about the ubuntu packaging and the -nvcc package specifically, the docs say to install the meta package. I think cuda-11-8 will also be available (worth testing in a docker image rather than debugging package names via GH actions 😉).

Alternatively if you wanted to get fancy you might be able to download the runfile and cache the download rather than reinstalling the cuda packages each time. Or possibly use one of the cuda docker images?

it compiles successfully. But when I try to run the tests it fails

All that said, even if libcuda.so ends up being installed / that is resolved -- can you run GPU / CUDA anything in the GH hosted runners? I don't think they have GPUs attached, and if a GPU is required to actually run the tests you'd need to have a CI setup that has one or host it.

@brtnfld
Copy link
Contributor

brtnfld commented Oct 21, 2022

Right, I don't think it works without a GPU attached. So we decided to leave GitHub actions to test if it builds only.

I'm not aware of any ECP machines which support GPU storage direct. Is there currently a system that E4S will be installed on that supports it that you know of?

@svenevs
Copy link
Author

svenevs commented Oct 25, 2022

I'm not aware of any ECP machines which support GPU storage direct. Is there currently a system that E4S will be installed on that supports it that you know of?

I am so sorry, it turns out this package was more in the "nice to have" category. I mistakenly thought one of the sites did have GDS enabled which is why I got so excited about just stamping a new release w/o any cmake changes 🙃

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants