Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should we make the GPU stuff a package extension? #47

Open
efaulhaber opened this issue Jul 1, 2024 · 5 comments
Open

Should we make the GPU stuff a package extension? #47

efaulhaber opened this issue Jul 1, 2024 · 5 comments

Comments

@efaulhaber
Copy link
Member

efaulhaber commented Jul 1, 2024

@inline function parallel_foreach(f, iterator,
x::Union{AbstractGPUArray, KernelAbstractions.Backend})
# On the GPU, we can only loop over `1:N`. Therefore, we loop over `1:length(iterator)`
# and index with `iterator[eachindex(iterator)[i]]`.
# Note that this only works with vector-like iterators that support arbitrary indexing.
indices = eachindex(iterator)
ndrange = length(indices)
# Skip empty loops
ndrange == 0 && return
backend = KernelAbstractions.get_backend(x)
# Call the generic kernel that is defined below, which only calls a function with
# the global GPU index.
generic_kernel(backend)(ndrange = ndrange) do i
@inline f(iterator[indices[i]])
end
KernelAbstractions.synchronize(backend)
end

I was thinking about making this a package extension, so that we don't add KernelAbstractions.jl as a dependency. However, then users would get unspecific "scalar indexing" errors when they forget to load KernelAbstractions.jl.

This function dispatches on KernelAbstractions.Backend, so without importing KernelAbstractions, I can't do this:

@inline function parallel_foreach(f, iterator,
                                  x::Union{AbstractGPUArray, KernelAbstractions.Backend})
    error("Load KernelAbstractions.jl")
end

Thoughts? @sloede ?

Originally posted by @efaulhaber in #45 (comment)

@sloede
Copy link
Member

sloede commented Jul 1, 2024

It would be good to get some input from @lchristm on this, regarding how he handled it.

IIRC, both KA.jl and GPUArrays(Core).jl are somewhat lightweight and could be included as regular dependencies. This allows you to throw around errors based on types, while the heavyweight functionality (and dependencies) can reside in a package extension.

But please double-check if this is the really correctly remembered on my part, at least with KA.jl I am not 100% sure (I'm fairly certain about GPUArraysCore.jl being super cheap)

@efaulhaber
Copy link
Member Author

GPUArraysCore is a subproject of GPUArrays.jl. It's one file, 200 lines, and Adapt.jl is the only dependency.

@efaulhaber
Copy link
Member Author

As discussed with @sloede, we will keep KernelAbstractions.jl as a dependency for now, as it's not too heavy and the extension is not trivial (as I explained above). We can still think about changing that in the future.

@lchristm
Copy link
Member

lchristm commented Jul 1, 2024

The .jl files in KA.jl/src have less than 2k lines of code overall (including whitespace, docstrings, comments, ...). KA.jl is a very lightweight interface package which defines the kernel language and provides a simple CPU backend (<200 lines of code). All the heavy lifting is done in other packages that implement their own KA.jl backend, like CUDA.jl.

I don't think it negatively impacts load times enough to warrant creating a package extension. From a UX perspective a package extension would have some obvious drawbacks, as already pointed out in the initial post.

If GPUArrays is only used for dispatching then it could be avoided imo, at least in Trixi.jl I track information about the backend and whether KA.jl or "vanilla Trixi" is used via the container types which are available in every function where it matters and can be easily used for dispatching.

@efaulhaber
Copy link
Member Author

If GPUArrays is only used for dispatching then it could be avoided imo, at least in Trixi.jl I track information about the backend and whether KA.jl or "vanilla Trixi" is used via the container types which are available in every function where it matters and can be easily used for dispatching.

GPUArraysCore.jl is tiny and makes the macro much more flexible, especially when used outside PointNeighbors.jl/TrixiParticles.jl for benchmarking or so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants