Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recurse into some arrays of arrays? #60

Open
mcabbott opened this issue Feb 18, 2023 · 1 comment
Open

Recurse into some arrays of arrays? #60

mcabbott opened this issue Feb 18, 2023 · 1 comment

Comments

@mcabbott
Copy link

I wonder whether adapt should treat an Array of Arrays as a container, like a Tuple, rather than storage to be converted. Convert the innermost Array, not the outermost:

julia> (rand(1,1), rand(1,1)) |> cu |> first
1×1 CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}:
 0.37755206

julia> [rand(1,1), rand(1,1)] |> cu  # should perhaps work as above
ERROR: CuArray only supports element types that are allocated inline.
Matrix{Float64} is a mutable type

julia> [1:2, 3:4] |> cu  # should not change
2-element CuArray{UnitRange{Int64}, 1, CUDA.Mem.DeviceBuffer}:
 1:2
 3:4

julia> [SA[1,2.], SA[3,4.]] |> cu
2-element CuArray{SVector{2, Float64}, 1, CUDA.Mem.DeviceBuffer}:
 [1.0, 2.0]
 [3.0, 4.0]

julia> Any[1:2, 3:4] |> cu
ERROR: CuArray only supports element types that are allocated inline.

I'm not exactly sure what the rule would be, perhaps something like isbitstype(eltype(x)) is enough?

This came up in JuliaGPU/CUDA.jl#1769, where CuIterator at present produces a Vector{CuArray}.

@maleadt
Copy link
Member

maleadt commented Feb 20, 2023

FWIW, CUDA.jl intentionally does not do this because it would imply allocating a device array on every kernel launch, at least when generalizing this to nested CuArrays. Currently, we only allow passing CuArray, because we can directly take its pointer and wrap it in a GPU object (CuDeviceArray) without having to allocate a GPU array. If we'd support nested arrays, say CuArray{CuArray}, we'd first need to download all elements back to the CPU (expensive), convert the elements to CuDeviceArray (cheap), and then allocate a new array (expensive) to upload those elements to (expensive).

This could work fine with Vector{CuArray} because CPU allocations are much cheaper, so I guess we could support adapting those just like we adapt tuples. But the difference in behavior is not ideal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants