Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix device/host call issues in CUDA (when not using relaxed constexpr workaround) #360

Merged
merged 13 commits into from
Oct 4, 2024

Conversation

nmm0
Copy link
Contributor

@nmm0 nmm0 commented Oct 2, 2024

This includes code that Christian wrote and will be included into Kokkos in kokkos/kokkos#7396

This addresses warnings for calling constexpr device code inside of Kokkos with BasicView

nmm0 added a commit to nmm0/kokkos that referenced this pull request Oct 2, 2024
nmm0 added a commit to nmm0/kokkos that referenced this pull request Oct 2, 2024
@@ -199,7 +199,15 @@ struct layout_stride {
template<class IntegralType>
MDSPAN_INLINE_FUNCTION
static constexpr const __strides_storage_t fill_strides(const std::array<IntegralType,extents_type::rank()>& s) {
// avoid warning for use of host std::array operator[]
#if defined(_MDSPAN_HAS_CUDA) || defined(_MDSPAN_HAS_HIP)
const IntegralType* s_ptr = reinterpret_cast<const IntegralType*>(&s);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to figure out what to do properly here, since reinterpret_cast can't be used in a constant expression, so we will remove it for now

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std::array isn't usable on Cuda devices anyway unless you are using -expt-relaxed-constexpr, right? Wouldn't an interface in terms of C-style arrays be a more natural choice?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes and no: we have constructors which have to take std::array because that is the standard. Also a bunch of operations actually do work on device (i.e. construction, destruction, structured binding - because they don't go through member functions, or at least go through defaulted member functions).

That said the above casting approach doesn't work in constexpr so we changed this.

@nmm0
Copy link
Contributor Author

nmm0 commented Oct 2, 2024

  • Move device_support to utility
  • Make construct_sub_strides use a custom c-array like type instead of std::array
  • Remove fake-tuple type access
  • Make std::array constructors for layout_stride non-device

Copy link
Contributor

@tcclevenger tcclevenger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious, if C++17 is the minimum req for Kokkos core, why do we test C++14 for mdspan. Is this the same for other tpls or containers/algorithms?

@@ -206,7 +205,7 @@ struct layout_stride {
class IntegralType,
// The is_convertible condition is added to make sfinae valid
// the extents_type::rank() > 0 is added to avoid use of non-standard zero length c-array
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// the extents_type::rank() > 0 is added to avoid use of non-standard zero length c-array

This part of the comment is no longer needed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are users of mdspan which are not Kokkos users and rely on C++14.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but yeah good observation I just pushed something to remove the comments

@@ -206,7 +205,7 @@ struct layout_stride {
class IntegralType,
// The is_convertible condition is added to make sfinae valid
// the extents_type::rank() > 0 is added to avoid use of non-standard zero length c-array
(std::is_convertible<IntegralType, typename extents_type::index_type>::value && (extents_type::rank() > 0))
(std::is_convertible<IntegralType, typename extents_type::index_type>::value)
)
MDSPAN_INLINE_FUNCTION
// despite the requirement some compilers still complain about zero length array during parsing
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment seems out-of-date now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

@@ -206,7 +205,7 @@ struct layout_stride {
class IntegralType,
// The is_convertible condition is added to make sfinae valid
// the extents_type::rank() > 0 is added to avoid use of non-standard zero length c-array
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// the extents_type::rank() > 0 is added to avoid use of non-standard zero length c-array

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

Comment on lines 69 to 71
template<class T>
MDSPAN_INLINE_FUNCTION
constexpr void maybe_unused_variable(const T&) {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function seems unused.

Suggested change
template<class T>
MDSPAN_INLINE_FUNCTION
constexpr void maybe_unused_variable(const T&) {}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

Comment on lines 84 to 85
MDSPAN_INLINE_FUNCTION_DEFAULTED
constexpr integral_constant(std::integral_constant<T,v>) {};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DEFAULTED seems inappropriate. Do you need the extra annotations at all here?

Suggested change
MDSPAN_INLINE_FUNCTION_DEFAULTED
constexpr integral_constant(std::integral_constant<T,v>) {};
MDSPAN_INLINE_FUNCTION
constexpr integral_constant(std::integral_constant<T,v>) {};

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't be host device in the 1st place

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes it can. Nothing in this calls device functions. Constructor of std::integral_constant is defaulted, so it works too. The only thing which doesn't work of std::integral_constant is the conversion operator to its scaler type.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then what Daniel said

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(minus the inline)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah clear bug (funnily enough one of the primary bugs we had elsewhere)

using type = tuple_member<T, Idx>;
template<class Other>
MDSPAN_FUNCTION
constexpr auto operator + (Other v) const {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using + seems a little weird. What | or || be slightly better?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we just need something which expands.

constexpr integral_constant(std::integral_constant<T,v>) {};

static constexpr T value = v;
MDSPAN_INLINE_FUNCTION constexpr operator value_type() const noexcept {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer MDSPAN_FUNCTION here and below

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done



// same as std::integral_constant but with __host__ __device__ annotations on
// the implicit conversion function and the call operator
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need/want conversion from to the std:: counterpart?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because there are places where we pass in std::integral_constant, and std::integral_constant is the standard approved way to pass certain ways to certain functions. For example submdspan(a, std::integral_constant<int, 1>()) is fine in device code.

MDSPAN_INLINE_FUNCTION constexpr value_type operator()() const noexcept {
return value;
}
MDSPAN_INLINE_FUNCTION constexpr operator std::integral_constant<T,v>() const noexcept {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No way that can be annotated host device

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course it works:

https://godbolt.org/z/bb8qdT31Y

We can use any standard library stuff as long as we don't call non-defaulted member functions. And we don't here.

Comment on lines 84 to 85
MDSPAN_INLINE_FUNCTION_DEFAULTED
constexpr integral_constant(std::integral_constant<T,v>) {};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't be host device in the 1st place

#endif

template<class T, size_t ... Idx>
constexpr auto c_array_to_std(std::index_sequence<Idx...>, const T(&values)[sizeof...(Idx)]) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer calling it to_std_array

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also thinking you could should write a for loop instead of a fold expression

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got rid of the function for now, not really needed for what we do.

template<class T, size_t N>
struct sub_strides
{
T values[N];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
T values[N];
T values[N > 0 ? N : 1];

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done (and delete the specialization below) unless you prefer the specialization path.

@crtrott crtrott changed the title Remove constexpr device code warnings Fix device/host call issues in CUDA (when not using relaxed constexpr workaround) Oct 4, 2024
@crtrott crtrott merged commit 260f525 into kokkos:stable Oct 4, 2024
15 checks passed
@@ -476,7 +468,8 @@ struct layout_stride {
MDSPAN_INLINE_FUNCTION
constexpr index_type required_span_size() const noexcept {
index_type span_size = 1;
for(unsigned r = 0; r < extents_type::rank(); r++) {
// using int here to avoid warning about pointless comparison to 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose you mean the rank 0 case

Comment on lines +72 to +73
template <class IdxT1, class IdxT2, class IndexType>
struct index_pair_like<tuple<IdxT1, IdxT2>, IndexType> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth a comment "our device-compatible poor man's implementation of tuple" for the future reader that might be surprised to see that specialization interleaved with all the std:: ones.

Comment on lines +98 to +102
template<class Integral, Integral v>
MDSPAN_INLINE_FUNCTION
constexpr Integral first_of(const std::integral_constant<Integral, v>&) {
return integral_constant<Integral, v>();
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is weird

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this one

Comment on lines +119 to +133
MDSPAN_TEMPLATE_REQUIRES(
class IdxT1, class IdxT2,
/* requires */ (index_pair_like<std::tuple<IdxT1, IdxT2>, size_t>::value)
)
constexpr auto first_of(const std::tuple<IdxT1, IdxT2>& i) {
return get<0>(i);
}

MDSPAN_TEMPLATE_REQUIRES(
class IdxT1, class IdxT2,
/* requires */ (index_pair_like<std::pair<IdxT1, IdxT2>, size_t>::value)
)
MDSPAN_INLINE_FUNCTION
constexpr auto first_of(const std::pair<IdxT1, IdxT2>& i) {
return i.first;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we use get() for all of these

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get on std::pair is not host-device capable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants