-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEA] Allow libraries that implement __array_ufunc__
to override CUDAUFunc
#37
Comments
duplicate of #36 |
Reopening this as it's a much better-written request than mine in #36. My notes from #36:
|
Awesome! Thank you both for getting this started. |
An off the cuff two cents: but wouldn't Dispatch to other ufuncs is already handled quite well by awkward. i.e. an awkward array with |
Just to bump this - has there been any further thought in this direction? |
@lgray - thanks for the bump - I haven't been able to look into this further yet as I don't have enough of a grasp of the concepts to sketch out an implementation plan further without spending time doing some research... I think you might be a bit more ahead of me in your thinking about this - do you have some thoughts about what the implementation should / could look like? |
@gmarkall I don't really have recommendations on low level implementation, but I do know how we would like things to operate from a high level. Essentially we'd like our data scientists (high energy particle physics experiment scientists and PhD students) to we able to design analyses on their laptops for CPU and redeploy it with a few configuration changes on GPU using awkward array. We can detect when data is on GPU vs. CPU and switch between kernels automatically and easily with awkward, so that's largely a matter of user interface. What we need from numba-cuda is for it to interact seamlessly with awkward arrays that are on-device as well as it already does with host-side arrays and regular numba. Training users to write effective cuda kernels with numba is a different matter entirely that we will not touch here. I'm just considering pretty simple ufuncs that you get through So really, on the backend we just need it to be able to identify ufuncs and then to be able to distinguish when those ufuncs accept device side arrays. So I think some scaffolding is missing, and not much else, essentially to smoothen the experience on the user side? I'm not quite aware of the entry points to change things to give it a shot, off the top of my head. |
@gmarkall I was talking to @jpivarski and @ianna last week and I hadn't realized that cupy itself didn't implement a nep13-like protocol when calling the cupy version of the ufunc. So this makes it clear why this had problems working in the first place and now I agree that we need something like nep13 so that awkward can detect and override the application of cupy specific ufuncs. Then we can use that with numba and we're in a much better place. Is this a more accurate understand of the situation from my side? |
It sounds like you've mapped out the issue and what we need to do to resolve it a bit further - I still don't have any expertise in this area so I can't comment definitively, but what you said makes sense and it seems to give us more understanding of the situation. Do we need to have a feature request in CuPy for NEP13 or some NEP13-like support? |
Yes, I think we need to ask @leofang. I will open an issue on CuPy github. |
Is your feature request related to a problem? Please describe.
Array-like objects that define an
__array_ufunc__
method (NEP-13) can be used withufuncs
created bynp.vectorize
as follows:we would like to have similar functionality on CUDA to allow the following:
Describe the solution you'd like
Maybe this function is missing
__array_ufunc__
handling?https://github.com/NVIDIA/numba-cuda/blob/main/numba_cuda/numba/cuda/deviceufunc.py#L241-L329
Describe alternatives you've considered
If we wrap in a
flatten
/unflatten
we are able to get this to work, which is a bit clunky.Additional info
Version of Awkward Array is 2.6.6
Code to reproduce:
resulting in the output:
The text was updated successfully, but these errors were encountered: