-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix device/host call issues in CUDA (when not using relaxed constexpr workaround) #360
Fix device/host call issues in CUDA (when not using relaxed constexpr workaround) #360
Conversation
@@ -199,7 +199,15 @@ struct layout_stride { | |||
template<class IntegralType> | |||
MDSPAN_INLINE_FUNCTION | |||
static constexpr const __strides_storage_t fill_strides(const std::array<IntegralType,extents_type::rank()>& s) { | |||
// avoid warning for use of host std::array operator[] | |||
#if defined(_MDSPAN_HAS_CUDA) || defined(_MDSPAN_HAS_HIP) | |||
const IntegralType* s_ptr = reinterpret_cast<const IntegralType*>(&s); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to figure out what to do properly here, since reinterpret_cast
can't be used in a constant expression, so we will remove it for now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std::array
isn't usable on Cuda devices anyway unless you are using -expt-relaxed-constexpr
, right? Wouldn't an interface in terms of C-style
arrays be a more natural choice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes and no: we have constructors which have to take std::array because that is the standard. Also a bunch of operations actually do work on device (i.e. construction, destruction, structured binding - because they don't go through member functions, or at least go through defaulted member functions).
That said the above casting approach doesn't work in constexpr so we changed this.
|
…s to work on device
… about what it's used for
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious, if C++17 is the minimum req for Kokkos core, why do we test C++14 for mdspan. Is this the same for other tpls or containers/algorithms?
@@ -206,7 +205,7 @@ struct layout_stride { | |||
class IntegralType, | |||
// The is_convertible condition is added to make sfinae valid | |||
// the extents_type::rank() > 0 is added to avoid use of non-standard zero length c-array |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// the extents_type::rank() > 0 is added to avoid use of non-standard zero length c-array |
This part of the comment is no longer needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are users of mdspan which are not Kokkos users and rely on C++14.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but yeah good observation I just pushed something to remove the comments
@@ -206,7 +205,7 @@ struct layout_stride { | |||
class IntegralType, | |||
// The is_convertible condition is added to make sfinae valid | |||
// the extents_type::rank() > 0 is added to avoid use of non-standard zero length c-array | |||
(std::is_convertible<IntegralType, typename extents_type::index_type>::value && (extents_type::rank() > 0)) | |||
(std::is_convertible<IntegralType, typename extents_type::index_type>::value) | |||
) | |||
MDSPAN_INLINE_FUNCTION | |||
// despite the requirement some compilers still complain about zero length array during parsing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment seems out-of-date now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
@@ -206,7 +205,7 @@ struct layout_stride { | |||
class IntegralType, | |||
// The is_convertible condition is added to make sfinae valid | |||
// the extents_type::rank() > 0 is added to avoid use of non-standard zero length c-array |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// the extents_type::rank() > 0 is added to avoid use of non-standard zero length c-array |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
template<class T> | ||
MDSPAN_INLINE_FUNCTION | ||
constexpr void maybe_unused_variable(const T&) {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function seems unused.
template<class T> | |
MDSPAN_INLINE_FUNCTION | |
constexpr void maybe_unused_variable(const T&) {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
MDSPAN_INLINE_FUNCTION_DEFAULTED | ||
constexpr integral_constant(std::integral_constant<T,v>) {}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DEFAULTED
seems inappropriate. Do you need the extra annotations at all here?
MDSPAN_INLINE_FUNCTION_DEFAULTED | |
constexpr integral_constant(std::integral_constant<T,v>) {}; | |
MDSPAN_INLINE_FUNCTION | |
constexpr integral_constant(std::integral_constant<T,v>) {}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't be host device in the 1st place
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes it can. Nothing in this calls device functions. Constructor of std::integral_constant is defaulted, so it works too. The only thing which doesn't work of std::integral_constant is the conversion operator to its scaler type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then what Daniel said
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(minus the inline)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah clear bug (funnily enough one of the primary bugs we had elsewhere)
using type = tuple_member<T, Idx>; | ||
template<class Other> | ||
MDSPAN_FUNCTION | ||
constexpr auto operator + (Other v) const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using +
seems a little weird. What |
or ||
be slightly better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we just need something which expands.
constexpr integral_constant(std::integral_constant<T,v>) {}; | ||
|
||
static constexpr T value = v; | ||
MDSPAN_INLINE_FUNCTION constexpr operator value_type() const noexcept { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prefer MDSPAN_FUNCTION
here and below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
||
|
||
// same as std::integral_constant but with __host__ __device__ annotations on | ||
// the implicit conversion function and the call operator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need/want conversion from to the std:: counterpart?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because there are places where we pass in std::integral_constant, and std::integral_constant is the standard approved way to pass certain ways to certain functions. For example submdspan(a, std::integral_constant<int, 1>())
is fine in device code.
MDSPAN_INLINE_FUNCTION constexpr value_type operator()() const noexcept { | ||
return value; | ||
} | ||
MDSPAN_INLINE_FUNCTION constexpr operator std::integral_constant<T,v>() const noexcept { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No way that can be annotated host device
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of course it works:
https://godbolt.org/z/bb8qdT31Y
We can use any standard library stuff as long as we don't call non-defaulted member functions. And we don't here.
MDSPAN_INLINE_FUNCTION_DEFAULTED | ||
constexpr integral_constant(std::integral_constant<T,v>) {}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't be host device in the 1st place
#endif | ||
|
||
template<class T, size_t ... Idx> | ||
constexpr auto c_array_to_std(std::index_sequence<Idx...>, const T(&values)[sizeof...(Idx)]) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer calling it to_std_array
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also thinking you could should write a for loop instead of a fold expression
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got rid of the function for now, not really needed for what we do.
template<class T, size_t N> | ||
struct sub_strides | ||
{ | ||
T values[N]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
T values[N]; | |
T values[N > 0 ? N : 1]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done (and delete the specialization below) unless you prefer the specialization path.
@@ -476,7 +468,8 @@ struct layout_stride { | |||
MDSPAN_INLINE_FUNCTION | |||
constexpr index_type required_span_size() const noexcept { | |||
index_type span_size = 1; | |||
for(unsigned r = 0; r < extents_type::rank(); r++) { | |||
// using int here to avoid warning about pointless comparison to 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose you mean the rank 0 case
template <class IdxT1, class IdxT2, class IndexType> | ||
struct index_pair_like<tuple<IdxT1, IdxT2>, IndexType> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be worth a comment "our device-compatible poor man's implementation of tuple" for the future reader that might be surprised to see that specialization interleaved with all the std:: ones.
template<class Integral, Integral v> | ||
MDSPAN_INLINE_FUNCTION | ||
constexpr Integral first_of(const std::integral_constant<Integral, v>&) { | ||
return integral_constant<Integral, v>(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is weird
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need this one
MDSPAN_TEMPLATE_REQUIRES( | ||
class IdxT1, class IdxT2, | ||
/* requires */ (index_pair_like<std::tuple<IdxT1, IdxT2>, size_t>::value) | ||
) | ||
constexpr auto first_of(const std::tuple<IdxT1, IdxT2>& i) { | ||
return get<0>(i); | ||
} | ||
|
||
MDSPAN_TEMPLATE_REQUIRES( | ||
class IdxT1, class IdxT2, | ||
/* requires */ (index_pair_like<std::pair<IdxT1, IdxT2>, size_t>::value) | ||
) | ||
MDSPAN_INLINE_FUNCTION | ||
constexpr auto first_of(const std::pair<IdxT1, IdxT2>& i) { | ||
return i.first; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't we use get() for all of these
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
get on std::pair is not host-device capable.
This includes code that Christian wrote and will be included into Kokkos in kokkos/kokkos#7396
This addresses warnings for calling constexpr device code inside of Kokkos with BasicView