-
Notifications
You must be signed in to change notification settings - Fork 165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CUDAX] Add copy_bytes and fill_bytes overloads for mdspan #2932
Conversation
🟨 CI finished in 43m 38s: Pass: 81%/54 | Total: 4h 27m | Avg: 4m 57s | Max: 18m 23s | Hits: 82%/246
|
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
CUB | |
Thrust | |
+/- | CUDA Experimental |
python | |
CCCL C Parallel Library | |
Catch2Helper |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
CUB | |
Thrust | |
+/- | CUDA Experimental |
python | |
CCCL C Parallel Library | |
Catch2Helper |
🏃 Runner counts (total jobs: 54)
# | Runner |
---|---|
43 | linux-amd64-cpu16 |
5 | linux-amd64-gpu-v100-latest-1 |
4 | linux-arm64-cpu16 |
2 | windows-amd64-cpu16 |
🟩 CI finished in 2h 00m: Pass: 100%/54 | Total: 4h 29m | Avg: 4m 59s | Max: 17m 52s | Hits: 84%/246
|
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
CUB | |
Thrust | |
+/- | CUDA Experimental |
python | |
CCCL C Parallel Library | |
Catch2Helper |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
CUB | |
Thrust | |
+/- | CUDA Experimental |
python | |
CCCL C Parallel Library | |
Catch2Helper |
🏃 Runner counts (total jobs: 54)
# | Runner |
---|---|
43 | linux-amd64-cpu16 |
5 | linux-amd64-gpu-v100-latest-1 |
4 | linux-arm64-cpu16 |
2 | windows-amd64-cpu16 |
I was hoping we can get #2306 done but it looks like this only copies between two identical instead of arbitrary layouts 🥲 |
I feel arbitrary layouts fit more into |
Yes I think |
* Implement copy_bytes for mdspan * Add final conversion to mdspan and more tests * mdspan fill_bytes * Add docs * Fix issues after rebase * Help old GCC figure out the types * Move runtime extents check to a function * Fix clang and more old GCC fixes
This PR adds
copy_bytes
andfill_bytes
overloads operating onmdspan
s. Input types need to becuda::std::mdspan
instance, it needs to launch transform to one or implicitly convert and containmdspan
template arguments as member aliases, so the destination type can be discovered (last case will most likely bemdarray
).For
copy_bytes
this version does not try to do anything clever to match shapes. Source and destination layouts need to be the same and extents need to be compatible, which means any combination of static or dynamic extents, as long as each runtime extent is the same.More test cases will be added once
mdarray
type is available.