Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

backport unreachable #2852

Open
wants to merge 43 commits into
base: main
Choose a base branch
from

Conversation

davebayer
Copy link
Contributor

@davebayer davebayer commented Nov 18, 2024

This PR backports cuda::std::unreachable to C++11, replacing old implementation (_CCCL_UNREACHABLE).

Copy link

copy-pr-bot bot commented Nov 18, 2024

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@miscco
Copy link
Collaborator

miscco commented Nov 18, 2024

It also fixes the implementation, because for device code the implementation was never right as __CUDA_ARCH__ is never defined outside device code.

Can you elaborate on this a bit. Outside of __CUDA_ARCH__ the macro was defined to just be __builtin_unreachable for non MSVC

@davebayer
Copy link
Contributor Author

davebayer commented Nov 18, 2024

It also fixes the implementation, because for device code the implementation was never right as __CUDA_ARCH__ is never defined outside device code.

Can you elaborate on this a bit. Outside of __CUDA_ARCH__ the macro was defined to just be __builtin_unreachable for non MSVC

I didn't know NVHPC does not define __CUDA_ARCH__ in device code.. But I think at least for _CCCL_CUDACC_BELOW(11, 2) and _CCCL_CUDACC_BELOW(11, 3) the old implementation of unreachable should not work, because __CUDA_ARCH__ is never defined outside __device__ or __global__ scope for nvcc.

@miscco
Copy link
Collaborator

miscco commented Nov 18, 2024

/ok to test

@davebayer davebayer changed the title backport and fix unreachable backport unreachable Nov 19, 2024
miscco and others added 29 commits November 22, 2024 16:23
* Move implementation of `_LIBCUDACXX_TEMPLATE` to CCCL

We have emulation for concepts in LIBCUDACXX that was guarded behind C++14

But there is nothing that requires C++14 for just the template headers and we want to use them universally throughout the codebase

Consequently move them to CCCL proper and enable them unconditionally. To ensure that we do not add any hidden dependencies this also adds a barebones implementation of `enable_if_t` and a trailing `enable_if_t`
…ons (NVIDIA#2889)

NVHPC can consume older CTK headers for stdpar, so we need to try and avoid using those
Co-authored-by: Bernhard Manfred Gruber <[email protected]>
Co-authored-by: Michael Schellenberger Costa <[email protected]>
Co-authored-by: Bernhard Manfred Gruber <[email protected]>
Co-authored-by: Eric Niebler <[email protected]>
* Improve build instructions for libcu++
* Add section about the options for the build script
* Delegate more to the contributor guidelines
* Move implementation of `_LIBCUDACXX_TEMPLATE` to CCCL

We have emulation for concepts in LIBCUDACXX that was guarded behind C++14

But there is nothing that requires C++14 for just the template headers and we want to use them universally throughout the codebase

Consequently move them to CCCL proper and enable them unconditionally. To ensure that we do not add any hidden dependencies this also adds a barebones implementation of `enable_if_t` and a trailing `enable_if_t`
Co-authored-by: Michael Schellenberger Costa <[email protected]>
Co-authored-by: Bernhard Manfred Gruber <[email protected]>
Co-authored-by: Eric Niebler <[email protected]>
agent_histogram.cuh:827:37: warning: comparison of different enumeration types
* Drop unused macros

* Do not return a custom struct

* Replace `__MDSPAN_FOLD_AND` with `__fold_and_v` when possible
…VIDIA#2950)

* Implement CUDASTF_DOT_TIMING facility for ctx.cuda_kernel

* clang-format
This is adding the missing sm_100 identifier to nv/target

Fixes NVIDIA#2890
Overwrites all generated PTX header and documentation files and runs `pre-commit run --all-files`. Also exclude generated PTX headers from header check.
Overwrites all generated PTX tests and runs `pre-commit run --all-files`
* Implement copy_bytes for mdspan

* Add final conversion to mdspan and more tests

* mdspan fill_bytes

* Add docs

* Fix issues after rebase

* Help old GCC figure out the types

* Move runtime extents check to a function

* Fix clang and more old GCC fixes
@davebayer davebayer requested a review from a team as a code owner November 25, 2024 21:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

6 participants