Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] new CUDA CI+development Docker container #1162

Draft
wants to merge 30 commits into
base: develop
Choose a base branch
from

Conversation

BenWibking
Copy link
Collaborator

@BenWibking BenWibking commented Aug 27, 2024

PR Summary

Updates the CUDA CI image to CUDA 12.1, Ubuntu 22.04, and Clang 19.

Adds a .devcontainer/devcontainer.json file to support GitHub Codespaces and VSCode Dev Containers out of the box.

The old container was based on an image that was no longer supported upstream. Unfortunately, the new image is also deprecated, since we have to use an old CUDA 12 minor version. (Ascent cannot be built with CUDA 12.4+ due to a lack of VTK-m support for those versions.)

This container should be built as a multi-arch image for both x86_64 and ARM support, e.g.:

docker build --file Dockerfile.nvcc --tag parthenon-hpc-lab/new-ci-image --platform linux/amd64,linux/arm64 .

NOTE: It must be built in a VM with a large amount of memory. I was not able to build the container image without 15 GB RAM and 4 GB swap for the Docker VM (otherwise, the VTK-m build will be killed by the OOM killer).

PR Checklist

  • Code passes cpplint
  • New features are documented.
  • Adds a test for any bugs fixed. Adds tests for new features.
  • Code is formatted
  • Changes are summarized in CHANGELOG.md
  • Change is breaking (API, behavior, ...)
    • Change is additionally added to CHANGELOG.md in the breaking section
    • PR is marked as breaking
    • Short summary API changes at the top of the PR (plus optionally with an automated update/fix script)
  • CI has been triggered on Darwin for performance regression tests.
  • Docs build
  • (@lanl.gov employees) Update copyright on changed files

Update to CUDA 12.6, Ubuntu 24.04, and Clang 19.
@BenWibking
Copy link
Collaborator Author

BenWibking commented Aug 27, 2024

Ascent cannot be compiled with CUDA 12.4+, because VTK-m does not support it (https://gitlab.kitware.com/vtk/vtk-m/-/issues/790).

vtk-m also fails to build with gcc 12+:

56.81 [  8%] Building CXX object vtkm/cont/CMakeFiles/vtkm_cont.dir/DataSetBuilderUniform.cxx.o
56.81 cd /usr/local/build/vtk-m-v2.1.0/vtkm/cont && /usr/bin/g++ -DVTKMDIY_MPI_AS_LIB -DVTKMDIY_NO_THREADS -Dvtkm_cont_EXPORTS -I/usr/local/source/vtk-m-v2.1.0/vtkm/thirdparty/optionparser -I/usr/local/source/vtk-m-v2.1.0/vtkm/thirdparty/diy -I/usr/local/source/vtk-m-v2.1.0/vtkm/thirdparty/lcl/vtkmlcl -I/usr/local/source/vtk-m-v2.1.0/vtkm/thirdparty/loguru -I/usr/local/source/vtk-m-v2.1.0 -I/usr/local/build/vtk-m-v2.1.0/include -isystem /usr/local/source/vtk-m-v2.1.0/vtkm/thirdparty/diy/vtkmdiy/include -isystem /usr/local/build/vtk-m-v2.1.0/vtkm/thirdparty/diy/vtkmdiy/include/vtkmdiy/mpi -O3 -DNDEBUG -std=c++14 -fPIC -fvisibility=hidden -Wall -Wcast-align -Wextra -Wpointer-arith -Wformat -Wformat-security -Wshadow -Wunused -fno-common -Wno-unused-function -Wchar-subscripts -Wfloat-conversion -Wodr -ffunction-sections -MD -MT vtkm/cont/CMakeFiles/vtkm_cont.dir/DataSetBuilderUniform.cxx.o -MF CMakeFiles/vtkm_cont.dir/DataSetBuilderUniform.cxx.o.d -o CMakeFiles/vtkm_cont.dir/DataSetBuilderUniform.cxx.o -c /usr/local/source/vtk-m-v2.1.0/vtkm/cont/DataSetBuilderUniform.cxx
56.84 /usr/local/source/vtk-m-v2.1.0/vtkm/exec/cuda/internal/ThrustPatches.h(213): error: this declaration has no storage class or type specifier
56.84     __thrust_exec_check_disable__
56.84     ^
56.84
56.84 /usr/local/source/vtk-m-v2.1.0/vtkm/exec/cuda/internal/ThrustPatches.h(214): warning #1835-D: attribute "__host__" does not apply here
56.84       __attribute__((host)) __attribute__((device))
56.84                      ^
56.84
56.84 Remark: The warnings can be suppressed with "-diag-suppress <warning-number>"
56.84
56.84 /usr/local/source/vtk-m-v2.1.0/vtkm/exec/cuda/internal/ThrustPatches.h(214): warning #1835-D: attribute "__device__" does not apply here
56.84       __attribute__((host)) __attribute__((device))
56.84                                            ^
56.84
56.84 /usr/local/source/vtk-m-v2.1.0/vtkm/exec/cuda/internal/ThrustPatches.h(215): error: expected a ";"
56.84       stateless_resource_allocator()
56.84       ^
56.84
56.84 /usr/local/source/vtk-m-v2.1.0/vtkm/exec/cuda/internal/ThrustPatches.h(236): warning #12-D: parsing restarts here after previous syntax error
56.84   };
56.84   ^
56.84
56.97 /usr/local/source/vtk-m-v2.1.0/vtkm/exec/cuda/internal/WrappedOperators.h(198): error: namespace "thrust::THRUST_200500_800_NS::detail" has no member class "is_arithmetic"
56.97     : public thrust::detail::is_arithmetic<T>
56.97                              ^
56.97
56.97 /usr/local/source/vtk-m-v2.1.0/vtkm/exec/cuda/internal/WrappedOperators.h(198): error: class or struct definition is missing
56.97     : public thrust::detail::is_arithmetic<T>
56.97                                           ^

@BenWibking
Copy link
Collaborator Author

BenWibking commented Aug 28, 2024

@pgrete do we need Ascent in the CI container anymore? it doesn't appear to work with either CUDA 12.6 or gcc 12+.

@pgrete
Copy link
Collaborator

pgrete commented Aug 28, 2024

Given that we still support Ascent (and still in principle interested in using it), it'd be great if we could keep it.
Also based on the spack recipe and and some issue on Github I'm under the impression that it should work with Cuda 12.
I'm wondering if it's possible to get the entire software stack build with spack in a single line (and then just keep the libs/binaries in the container).

@BenWibking
Copy link
Collaborator Author

BenWibking commented Aug 28, 2024

Given that we still support Ascent (and still in principle interested in using it), it'd be great if we could keep it. Also based on the spack recipe and and some issue on Github I'm under the impression that it should work with Cuda 12. I'm wondering if it's possible to get the entire software stack build with spack in a single line (and then just keep the libs/binaries in the container).

The Gitlab VTK-m issues suggest it doesn't work with CUDA 12.4 or 12.5, and it failed for me with 12.6. Maybe it works with earlier minor versions. However, 12.6 is the only version that has an Ubuntu 24.04 container.

Feel free to push changes to this branch -- I can't work on this more at the moment.

@BenWibking
Copy link
Collaborator Author

@pgrete However, everything except Ascent works now, both on x86-64 and arm64.

@BenWibking
Copy link
Collaborator Author

I've checked that Ascent builds with CUDA 12.0. So it stops working somewhere between CUDA 12.1-12.4.

@pgrete
Copy link
Collaborator

pgrete commented Aug 29, 2024

Thanks for tackling the version inconsistencies.
With regard to the content of the PR itself, I suggest to fix the versions (e.g. of adios2 and openpmd) so that the script also works when executed in the future.
Also, I looked at the diff between the build_ascent.sh in the PR and the one in the Ascent repo.
How about fixing the version of Ascent, the get a copy of that version, apply a few line patch and compile from there?
I'm happy you make those changes if you agree (you already did the heavily lifting in figuring out what's compatible).

@BenWibking
Copy link
Collaborator Author

Thanks for tackling the version inconsistencies. With regard to the content of the PR itself, I suggest to fix the versions (e.g. of adios2 and openpmd) so that the script also works when executed in the future.

adios2 and openpmd build now with the version in this PR.

Also, I looked at the diff between the build_ascent.sh in the PR and the one in the Ascent repo. How about fixing the version of Ascent, the get a copy of that version, apply a few line patch and compile from there?

I don't follow your suggestion. Can you explain what you mean by this? Do you want us to download the build_ascent.sh script from the Ascent repo and then apply a patch to it?

@BenWibking
Copy link
Collaborator Author

BenWibking commented Aug 29, 2024

@pgrete Everything builds now if I set enable_cuda=OFF in the Ascent build. It fails when enable_cuda=ON for both ARM and x86-64 Docker image builds.

I can't easily test x86_64 Docker builds with Ascent CUDA support, since those build extremely slowly on my (M1) Mac laptop. Can you try an x86_64 Docker image build?

Update: The issue is that when CUDA support for Ascent is enabled, the compiler is OOM killed. I am running Docker with resource settings set to 12 GB RAM and 1 GB swap. This is apparently not enough. The actual error message is:

Killed
	gmake[2]: *** [vtkm/filter/flow/CMakeFiles/vtkm_filter_flow.dir/build.make:163: vtkm/filter/flow/CMakeFiles/vtkm_filter_flow.dir/FilterParticleAdvectionUnsteadyState.cxx.o] Error 137

@BenWibking
Copy link
Collaborator Author

BenWibking commented Aug 29, 2024

@pgrete I finally got it to build with 15 GB of RAM and 4 GB of swap (for the Docker VM). Can you build it and verify that it works, and if so, upload it to DockerHub?

@pgrete
Copy link
Collaborator

pgrete commented Aug 30, 2024

I just pushed the changes I had in mind (using the build script from the Ascent src with a small patch) and using fixed version for ADIOS2 and OpenPMD.
Everything (the docker) builds BUT it doesn't work with Parthenon...

$ cmake -B build-ascent -DCMAKE_BUILD_TYPE=Release -DMACHINE_VARIANT=cuda-mpi -DMACHINE_CFG=$(pwd)/cmake/machinecfg/CI.cmake -DPARTHENON_ENABLE_ASCENT=ON -DAscent_DIR=/usr/local/ascent-checkout/lib/cmake/ascent

...

-- Building performance tests.
-- Building regression tests.
-- Creating BLT MPI targets...
-- FindMPI Enabled  (ENABLE_FIND_MPI == ON)
-- Found MPI_C: /opt/openmpi/lib/libmpi.so (found version "3.1") 
-- Found MPI: TRUE (found version "3.1")  
-- BLT MPI Compile Flags:  
-- BLT MPI Include Paths:  /opt/openmpi/include
-- BLT MPI Libraries:      /opt/openmpi/lib/libmpi.so
-- BLT MPI Link Flags:     SHELL:-Wl,-rpath -Wl,/opt/openmpi/lib -Wl,--enable-new-dtags
-- MPI Executable:       /opt/openmpi/bin/mpiexec
-- MPI Num Proc Flag:    -n
-- MPI Command Append:   
-- Creating BLT CUDA targets...
CMake Error at /usr/share/cmake-3.22/Modules/CMakeDetermineCompilerId.cmake:726 (message):
  Compiling the CUDA compiler identification source file
  "CMakeCUDACompilerId.cu" failed.

  Compiler: /usr/local/cuda/bin/nvcc

  Build flags:

  Id flags:
  --keep;--keep-dir;tmp;-ccbin=/parthenon/external/Kokkos/bin/nvcc_wrapper -v

  

  The output was:

  2

  #$ _NVVM_BRANCH_=nvvm

  #$ _SPACE_=

  #$ _CUDART_=cudart

  #$ _HERE_=/usr/local/cuda/bin

  #$ _THERE_=/usr/local/cuda/bin

  #$ _TARGET_SIZE_=

  #$ _TARGET_DIR_=

  #$ _TARGET_DIR_=targets/x86_64-linux

  #$ TOP=/usr/local/cuda/bin/..

  #$ NVVMIR_LIBRARY_DIR=/usr/local/cuda/bin/../nvvm/libdevice

  #$
  LD_LIBRARY_PATH=/usr/local/cuda/bin/../lib:/opt/openmpi/lib:/usr/local/nvidia/lib:/usr/local/nvidia/lib64


  #$
  PATH=/usr/local/cuda/bin/../nvvm/bin:/usr/local/cuda/bin:/opt/openmpi/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin


  #$ INCLUDES="-I/usr/local/cuda/bin/../targets/x86_64-linux/include"

  #$ LIBRARIES= "-L/usr/local/cuda/bin/../targets/x86_64-linux/lib/stubs"
  "-L/usr/local/cuda/bin/../targets/x86_64-linux/lib"

  #$ CUDAFE_FLAGS=

  #$ PTXAS_FLAGS=

  #$ rm tmp/a_dlink.reg.c

  #$ "/parthenon/external/Kokkos/bin"/nvcc_wrapper -D__CUDA_ARCH_LIST__=520
  -E -x c++ -D__CUDACC__ -D__NVCC__
  "-I/usr/local/cuda/bin/../targets/x86_64-linux/include"
  -D__CUDACC_VER_MAJOR__=12 -D__CUDACC_VER_MINOR__=0
  -D__CUDACC_VER_BUILD__=140 -D__CUDA_API_VER_MAJOR__=12
  -D__CUDA_API_VER_MINOR__=0 -D__NVCC_DIAG_PRAGMA_SUPPORT__=1 -include
  "cuda_runtime.h" -m64 "CMakeCUDACompilerId.cu" -o
  "tmp/CMakeCUDACompilerId.cpp4.ii"

  <command-line>: warning: "__CUDA_ARCH_LIST__" redefined

  <command-line>: note: this is the location of the previous definition

  #$ cudafe++ --c++17 --gnu_version=110400 --display_error_number
  --orig_src_file_name "CMakeCUDACompilerId.cu" --orig_src_path_name
  "/parthenon/build-ascent/CMakeFiles/3.22.1/CompilerIdCUDA/CMakeCUDACompilerId.cu"
  --allow_managed --m64 --parse_templates --gen_c_file_name
  "tmp/CMakeCUDACompilerId.cudafe1.cpp" --stub_file_name
  "CMakeCUDACompilerId.cudafe1.stub.c" --gen_module_id_file
  --module_id_file_name "tmp/CMakeCUDACompilerId.module_id"
  "tmp/CMakeCUDACompilerId.cpp4.ii"

  /usr/local/cuda/bin/../targets/x86_64-linux/include/crt/sm_70_rt.hpp(94):
  error: identifier "__match32_any_sync" is undefined

  

  /usr/local/cuda/bin/../targets/x86_64-linux/include/crt/sm_70_rt.hpp(98):
  error: identifier "__match32_any_sync" is undefined

  

  /usr/local/cuda/bin/../targets/x86_64-linux/include/crt/sm_70_rt.hpp(103):
  error: identifier "__match64_any_sync" is undefined

  

  /usr/local/cuda/bin/../targets/x86_64-linux/include/crt/sm_70_rt.hpp(104):
  error: identifier "__match32_any_sync" is undefined

  

  /usr/local/cuda/bin/../targets/x86_64-linux/include/crt/sm_70_rt.hpp(109):
  error: identifier "__match64_any_sync" is undefined

  

  /usr/local/cuda/bin/../targets/x86_64-linux/include/crt/sm_70_rt.hpp(110):
  error: identifier "__match32_any_sync" is undefined

  

  /usr/local/cuda/bin/../targets/x86_64-linux/include/crt/sm_70_rt.hpp(114):
  error: identifier "__match64_any_sync" is undefined

  

  /usr/local/cuda/bin/../targets/x86_64-linux/include/crt/sm_70_rt.hpp(118):
  error: identifier "__match64_any_sync" is undefined

  

  /usr/local/cuda/bin/../targets/x86_64-linux/include/crt/sm_70_rt.hpp(122):
  error: identifier "__match32_any_sync" is undefined

  

  /usr/local/cuda/bin/../targets/x86_64-linux/include/crt/sm_70_rt.hpp(126):
  error: identifier "__match64_any_sync" is undefined

  

  /usr/local/cuda/bin/../targets/x86_64-linux/include/crt/sm_70_rt.hpp(133):
  error: identifier "__match32_all_sync" is undefined

  

  /usr/local/cuda/bin/../targets/x86_64-linux/include/crt/sm_70_rt.hpp(137):
  error: identifier "__match32_all_sync" is undefined

  

  /usr/local/cuda/bin/../targets/x86_64-linux/include/crt/sm_70_rt.hpp(142):
  error: identifier "__match64_all_sync" is undefined

  

  /usr/local/cuda/bin/../targets/x86_64-linux/include/crt/sm_70_rt.hpp(143):
  error: identifier "__match32_all_sync" is undefined

  

  /usr/local/cuda/bin/../targets/x86_64-linux/include/crt/sm_70_rt.hpp(148):
  error: identifier "__match64_all_sync" is undefined

  

  /usr/local/cuda/bin/../targets/x86_64-linux/include/crt/sm_70_rt.hpp(149):
  error: identifier "__match32_all_sync" is undefined

  

  /usr/local/cuda/bin/../targets/x86_64-linux/include/crt/sm_70_rt.hpp(153):
  error: identifier "__match64_all_sync" is undefined

  

  /usr/local/cuda/bin/../targets/x86_64-linux/include/crt/sm_70_rt.hpp(157):
  error: identifier "__match64_all_sync" is undefined

  

  /usr/local/cuda/bin/../targets/x86_64-linux/include/crt/sm_70_rt.hpp(161):
  error: identifier "__match32_all_sync" is undefined

  

  /usr/local/cuda/bin/../targets/x86_64-linux/include/crt/sm_70_rt.hpp(165):
  error: identifier "__match64_all_sync" is undefined

  

  20 errors detected in the compilation of "CMakeCUDACompilerId.cu".

  # --error 0x2 --

  

  

Call Stack (most recent call first):
  /usr/share/cmake-3.22/Modules/CMakeDetermineCompilerId.cmake:6 (CMAKE_DETERMINE_COMPILER_ID_BUILD)
  /usr/share/cmake-3.22/Modules/CMakeDetermineCompilerId.cmake:48 (__determine_compiler_id_test)
  /usr/share/cmake-3.22/Modules/CMakeDetermineCUDACompiler.cmake:298 (CMAKE_DETERMINE_COMPILER_ID)
  /usr/local/ascent-checkout/lib/cmake/ascent/thirdparty/BLTSetupCUDA.cmake:67 (enable_language)
  /usr/local/ascent-checkout/lib/cmake/ascent/BLTSetupTargets.cmake:97 (include)
  /usr/local/ascent-checkout/lib/cmake/ascent/AscentConfig.cmake:156 (include)
  CMakeLists.txt:370 (find_package)


-- Configuring incomplete, errors occurred!
See also "/parthenon/build-ascent/CMakeFiles/CMakeOutput.log".
See also "/parthenon/build-ascent/CMakeFiles/CMakeError.log".

I don't get it...

$ cat/parthenon/build-ascent/CMakeFiles/CMakeError.log

#$ "/parthenon/external/Kokkos/bin"/nvcc_wrapper -D__CUDA_ARCH_LIST__=520 -E -x c++ -D__CUDACC__ -D__NVCC__  "-I/usr/local/cuda/bin/../targets/x86_64-linux/include"    -D__CUDACC_VER_MAJOR__=12 -D__CUDACC_VER_MINOR__=0 -D__CUDACC_VER_BUILD__=140 -D__CUDA_API_VER_MAJOR__=12 -D__CUDA_API_VER_MINOR__=0 -D__NVCC_DIAG_PRAGMA_SUPPORT__=1 -include "cuda_runtime.h" -m64 "CMakeCUDACompilerId.cu" -o "tmp/CMakeCUDACompilerId.cpp4.ii" 
<command-line>: warning: "__CUDA_ARCH_LIST__" redefined
<command-line>: note: this is the location of the previous definition
#$ cudafe++ --c++17 --gnu_version=110400 --display_error_number --orig_src_file_name "CMakeCUDACompilerId.cu" --orig_src_path_name "/parthenon/build-ascent/CMakeFiles/3.22.1/CompilerIdCUDA/CMakeCUDACompilerId.cu" --allow_managed  --m64 --parse_templates --gen_c_file_name "tmp/CMakeCUDACompilerId.cudafe1.cpp" --stub_file_name "CMakeCUDACompilerId.cudafe1.stub.c" --gen_module_id_file --module_id_file_name "tmp/CMakeCUDACompilerId.module_id" "tmp/CMakeCUDACompilerId.cpp4.ii" 
/usr/local/cuda/bin/../targets/x86_64-linux/include/crt/sm_70_rt.hpp(94): error: identifier "__match32_any_sync" is undefined

/usr/local/cuda/bin/../targets/x86_64-linux/include/crt/sm_70_rt.hpp(98): error: identifier "__match32_any_sync" is undefined

/usr/local/cuda/bin/../targets/x86_64-linux/include/crt/sm_70_rt.hpp(103): error: identifier "__match64_any_sync" is undefined

/usr/local/cuda/bin/../targets/x86_64-linux/include/crt/sm_70_rt.hpp(104): error: identifier "__match32_any_sync" is undefined

/usr/local/cuda/bin/../targets/x86_64-linux/include/crt/sm_70_rt.hpp(109): error: identifier "__match64_any_sync" is undefined

/usr/local/cuda/bin/../targets/x86_64-linux/include/crt/sm_70_rt.hpp(110): error: identifier "__match32_any_sync" is undefined

/usr/local/cuda/bin/../targets/x86_64-linux/include/crt/sm_70_rt.hpp(114): error: identifier "__match64_any_sync" is undefined

/usr/local/cuda/bin/../targets/x86_64-linux/include/crt/sm_70_rt.hpp(118): error: identifier "__match64_any_sync" is undefined

/usr/local/cuda/bin/../targets/x86_64-linux/include/crt/sm_70_rt.hpp(122): error: identifier "__match32_any_sync" is undefined

/usr/local/cuda/bin/../targets/x86_64-linux/include/crt/sm_70_rt.hpp(126): error: identifier "__match64_any_sync" is undefined

/usr/local/cuda/bin/../targets/x86_64-linux/include/crt/sm_70_rt.hpp(133): error: identifier "__match32_all_sync" is undefined

/usr/local/cuda/bin/../targets/x86_64-linux/include/crt/sm_70_rt.hpp(137): error: identifier "__match32_all_sync" is undefined

/usr/local/cuda/bin/../targets/x86_64-linux/include/crt/sm_70_rt.hpp(142): error: identifier "__match64_all_sync" is undefined

/usr/local/cuda/bin/../targets/x86_64-linux/include/crt/sm_70_rt.hpp(143): error: identifier "__match32_all_sync" is undefined

/usr/local/cuda/bin/../targets/x86_64-linux/include/crt/sm_70_rt.hpp(148): error: identifier "__match64_all_sync" is undefined

/usr/local/cuda/bin/../targets/x86_64-linux/include/crt/sm_70_rt.hpp(149): error: identifier "__match32_all_sync" is undefined

/usr/local/cuda/bin/../targets/x86_64-linux/include/crt/sm_70_rt.hpp(153): error: identifier "__match64_all_sync" is undefined

/usr/local/cuda/bin/../targets/x86_64-linux/include/crt/sm_70_rt.hpp(157): error: identifier "__match64_all_sync" is undefined

/usr/local/cuda/bin/../targets/x86_64-linux/include/crt/sm_70_rt.hpp(161): error: identifier "__match32_all_sync" is undefined

/usr/local/cuda/bin/../targets/x86_64-linux/include/crt/sm_70_rt.hpp(165): error: identifier "__match64_all_sync" is undefined

20 errors detected in the compilation of "CMakeCUDACompilerId.cu".
# --error 0x2 --

@BenWibking
Copy link
Collaborator Author

Just wondering, why is the CUDA_ARCH set to 520: -D__CUDA_ARCH_LIST__=520?

Maybe a CUDA 12.0 bug?

@BenWibking
Copy link
Collaborator Author

One change I wanted to make was to have it run insider the container by default as a non-root user. This avoids the OpenMPI warnings about running as root user, and also prevents container users from accidentally destroying system packages, or the pre-installed dependencies.

@BenWibking
Copy link
Collaborator Author

Since Parthenon builds without Ascent support enabled in the container, I think this is an Ascent (or maybe BLT) bug.

I have no interest in Ascent support, so I can't debug this any further.

@BenWibking
Copy link
Collaborator Author

I wonder if it is possible to reproduce this with the BLT CalcPi tutorial: https://github.com/LLNL/blt/tree/develop/docs/tutorial/calc_pi

@BenWibking
Copy link
Collaborator Author

I can't reproduce with BLT alone. It also doesn't work on CUDA 12.1 for me.

My guess is that this is an Ascent bug introduced in the past few months that only manifests for Kokkos apps.

@acreyes
Copy link
Contributor

acreyes commented Aug 31, 2024

Just wondering, why is the CUDA_ARCH set to 520: -D__CUDA_ARCH_LIST__=520?

Maybe a CUDA 12.0 bug?

I think this is just what cmake uses to check the cuda compiler because it is the default in nvcc. The error looks like its trying to use headers for sm_70 architecture. I think this is the default for kokkos' nvcc_wrapper and maybe this is why the problem is only when building parthenon (with CMAKE_CXX_COMPILER=nvcc_wrapper)?

@BenWibking BenWibking changed the title [WIP] update CI Docker container to CUDA 12 [WIP] new CUDA CI Docker container Aug 31, 2024
@BenWibking BenWibking changed the title [WIP] new CUDA CI Docker container [WIP] new CUDA CI+development Docker container Aug 31, 2024
@pgrete
Copy link
Collaborator

pgrete commented Sep 2, 2024

I'm going to ask the Ascent devs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants