[ENHANCEMENT]: __device__ for_each
that provides a mutable reference rather than a copy of the slot to the callback function
#639
Labels
type: feature request
New feature request
Is your feature request related to a problem? Please describe.
Feature request on behalf of @LutzCle who is implementing a mark join for an internal project.
tl;dr The algorithm requires setting a "(matching) slot has been seen" bit during probing which is explicitly excluded from the hash value generation and key comparison, so it doesn't break the probing sequence if set or unset.
Our initial approach was to use the
for_each
device API and set the bit for each matching element but unfortunately this doesn't work since the currentfor_each
implementation returns a copy of the matching slot rather than a mutable reference.Describe the solution you'd like
Provide both copy and mutable reference options for
for_each
.Naming ideas:
for_each(_reference)
/for_each_copy
/for_each_mutable
, ...Describe alternatives you've considered
The current workaround consists of basically re-implementing
for_each
with the proposed feature in a custom kernel which involves using some of the internalcuco::detail
machinery - not ideal.Additional context
I'm not sure if this feature is a foot gun or not so I would like to spark a discussion around the potential dangers.
The text was updated successfully, but these errors were encountered: