Clean CUDA operators #108

lukem12345 · 2024-10-03T16:22:54Z

In advance of adding more CUDA versions and wrappers of DEC operators (possibly based off of KernelAbstractions.jl), it is necessary to clean up the existing code.

For example, the restrictions on dispatching CUDA.jl kernel functions led to some redundant code, now deleted. The dispatch of the regular and inverse Hodge stars was also not optimal (, not exploiting the abstract DiscreteHodge type for example). Docstrings have also been cleaned. The docstrings for the internal functions to the wedge product are no longer exposed.

GeorgeR227 · 2024-10-03T18:50:05Z

ext/CombinatorialSpacesCUDAExt.jl

+ kernel = if (j,k) == (0,1)
+ dec_cu_ker_c_wedge_product_01!
+ elseif (j,k) == (0,2)
+ dec_cu_ker_c_wedge_product_02!
+ elseif (j,k) == (1,1)
+ dec_cu_ker_c_wedge_product_11!
+ else


Need a (1, 0) and a (2, 0) case.

Are there tests for that?

That doesn't correspond to any of the p wedge products and is not how this function is used internally.

True but this function is exposed publicly. At the very least we can assert that the larger form appears on the right. If we wanna do more, we can check the order of the forms and adjust the inputs as needed like what happens internally in Decapodes.

I don't think there are any tests for the flipped cases.

GeorgeR227 · 2024-10-03T18:50:58Z

ext/CombinatorialSpacesCUDAExt.jl

+function dec_cu_ker_c_wedge_product_02!(res, f, α, wedge_cache)
+ p, c = wedge_cache[1], wedge_cache[2]
+ i = (blockIdx().x - Int32(1)) * blockDim().x + threadIdx().x 
 stride = gridDim().x * blockDim().x
- i = index

- @inbounds while i <= Int32(length(wedge_terms))
- wedge_terms[i] = α[i] * (coeffs[Int32(1), i] * f[pv[Int32(1), i]] + coeffs[Int32(2), i] * f[pv[Int32(2), i]]
-  + coeffs[Int32(3), i] * f[pv[Int32(3), i]] + coeffs[Int32(4), i] * f[pv[Int32(4), i]]
-  + coeffs[Int32(5), i] * f[pv[Int32(5), i]] + coeffs[Int32(6), i] * f[pv[Int32(6), i]])
+ @inbounds while i <= Int32(length(res))
+ c1, c2, c3, c4, c5, c6 = c[Int32(1),i], c[Int32(2),i], c[Int32(3),i], c[Int32(4),i], c[Int32(5),i], c[Int32(6),i]
+ p1, p2, p3, p4, p5, p6 = p[Int32(1),i], p[Int32(2),i], p[Int32(3),i], p[Int32(4),i], p[Int32(5),i], p[Int32(6),i]
+ res[i] = α[i] * (c1*f[p1] + c2*f[p2] + c3*f[p3] + c4*f[p4] + c5*f[p5] + c6*f[p6])
 i += stride
 end
- return nothing
+ nothing
 end


Has the performance of this kernel been impacted by these changes, specifically benchmark times and register counts?

GeorgeR227 · 2024-10-03T18:51:15Z

ext/CombinatorialSpacesCUDAExt.jl

+function dec_cu_ker_c_wedge_product_11!(res, α, β, wedge_cache)
+ e, c = wedge_cache[1], wedge_cache[2]
+ i = (blockIdx().x - Int32(1)) * blockDim().x + threadIdx().x 
 stride = gridDim().x * blockDim().x
- i = index

- @inbounds while i <= Int32(length(wedge_terms))
+ @inbounds while i <= Int32(length(res))
 e0, e1, e2 = e[Int32(1), i], e[Int32(2), i], e[Int32(3), i]
+ c1, c2, c3 = c[Int32(1), i], c[Int32(2), i], c[Int32(3), i]
 ae0, ae1, ae2 = α[e0], α[e1], α[e2]
 be0, be1, be2 = β[e0], β[e1], β[e2]

- c1, c2, c3 = coeffs[Int32(1), i], coeffs[Int32(2), i], coeffs[Int32(3), i]
-
- wedge_terms[i] = (c1 * (ae2 * be1 - ae1 * be2)
- + c2 * (ae2 * be0 - ae0 * be2)
- + c3 * (ae1 * be0 - ae0 * be1))
+ res[i] =
+ (c1 * (ae2 * be1 - ae1 * be2) +
+ c2 * (ae2 * be0 - ae0 * be2) +
+ c3 * (ae1 * be0 - ae0 * be1))
 i += stride
 end
-
- return nothing
+ nothing
 end


Same performance questions as the other kernel.

GeorgeR227 · 2024-10-03T18:52:30Z

Please also let me know if the CUDA tests are passing after these changes.

Clean CUDA operator dispatch

4d05ecc

lukem12345 added the enhancement New feature or request label Oct 3, 2024

lukem12345 self-assigned this Oct 3, 2024

lukem12345 requested a review from jpfairbanks October 3, 2024 16:30

jpfairbanks requested a review from GeorgeR227 October 3, 2024 18:20

GeorgeR227 requested changes Oct 3, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clean CUDA operators #108

Clean CUDA operators #108

lukem12345 commented Oct 3, 2024 •

edited

Loading

GeorgeR227 Oct 3, 2024

lukem12345 Oct 3, 2024

lukem12345 Oct 3, 2024 •

edited

Loading

GeorgeR227 Oct 3, 2024

GeorgeR227 Oct 3, 2024

GeorgeR227 Oct 3, 2024

GeorgeR227 commented Oct 3, 2024

Clean CUDA operators #108

Are you sure you want to change the base?

Clean CUDA operators #108

Conversation

lukem12345 commented Oct 3, 2024 • edited Loading

GeorgeR227 Oct 3, 2024

Choose a reason for hiding this comment

lukem12345 Oct 3, 2024

Choose a reason for hiding this comment

lukem12345 Oct 3, 2024 • edited Loading

Choose a reason for hiding this comment

GeorgeR227 Oct 3, 2024

Choose a reason for hiding this comment

GeorgeR227 Oct 3, 2024

Choose a reason for hiding this comment

GeorgeR227 Oct 3, 2024

Choose a reason for hiding this comment

GeorgeR227 commented Oct 3, 2024

lukem12345 commented Oct 3, 2024 •

edited

Loading

lukem12345 Oct 3, 2024 •

edited

Loading