Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean CUDA operators #108

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Clean CUDA operators #108

wants to merge 1 commit into from

Conversation

lukem12345
Copy link
Member

@lukem12345 lukem12345 commented Oct 3, 2024

In advance of adding more CUDA versions and wrappers of DEC operators (possibly based off of KernelAbstractions.jl), it is necessary to clean up the existing code.

For example, the restrictions on dispatching CUDA.jl kernel functions led to some redundant code, now deleted. The dispatch of the regular and inverse Hodge stars was also not optimal (, not exploiting the abstract DiscreteHodge type for example). Docstrings have also been cleaned. The docstrings for the internal functions to the wedge product are no longer exposed.

@lukem12345 lukem12345 added the enhancement New feature or request label Oct 3, 2024
@lukem12345 lukem12345 self-assigned this Oct 3, 2024
Comment on lines +63 to +69
kernel = if (j,k) == (0,1)
dec_cu_ker_c_wedge_product_01!
elseif (j,k) == (0,2)
dec_cu_ker_c_wedge_product_02!
elseif (j,k) == (1,1)
dec_cu_ker_c_wedge_product_11!
else
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need a (1, 0) and a (2, 0) case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there tests for that?

Copy link
Member Author

@lukem12345 lukem12345 Oct 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That doesn't correspond to any of the p wedge products and is not how this function is used internally.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True but this function is exposed publicly. At the very least we can assert that the larger form appears on the right. If we wanna do more, we can check the order of the forms and adjust the inputs as needed like what happens internally in Decapodes.

I don't think there are any tests for the flipped cases.

Comment on lines +91 to 103
function dec_cu_ker_c_wedge_product_02!(res, f, α, wedge_cache)
p, c = wedge_cache[1], wedge_cache[2]
i = (blockIdx().x - Int32(1)) * blockDim().x + threadIdx().x
stride = gridDim().x * blockDim().x
i = index

@inbounds while i <= Int32(length(wedge_terms))
wedge_terms[i] = α[i] * (coeffs[Int32(1), i] * f[pv[Int32(1), i]] + coeffs[Int32(2), i] * f[pv[Int32(2), i]]
+ coeffs[Int32(3), i] * f[pv[Int32(3), i]] + coeffs[Int32(4), i] * f[pv[Int32(4), i]]
+ coeffs[Int32(5), i] * f[pv[Int32(5), i]] + coeffs[Int32(6), i] * f[pv[Int32(6), i]])
@inbounds while i <= Int32(length(res))
c1, c2, c3, c4, c5, c6 = c[Int32(1),i], c[Int32(2),i], c[Int32(3),i], c[Int32(4),i], c[Int32(5),i], c[Int32(6),i]
p1, p2, p3, p4, p5, p6 = p[Int32(1),i], p[Int32(2),i], p[Int32(3),i], p[Int32(4),i], p[Int32(5),i], p[Int32(6),i]
res[i] = α[i] * (c1*f[p1] + c2*f[p2] + c3*f[p3] + c4*f[p4] + c5*f[p5] + c6*f[p6])
i += stride
end
return nothing
nothing
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Has the performance of this kernel been impacted by these changes, specifically benchmark times and register counts?

Comment on lines +105 to 123
function dec_cu_ker_c_wedge_product_11!(res, α, β, wedge_cache)
e, c = wedge_cache[1], wedge_cache[2]
i = (blockIdx().x - Int32(1)) * blockDim().x + threadIdx().x
stride = gridDim().x * blockDim().x
i = index

@inbounds while i <= Int32(length(wedge_terms))
@inbounds while i <= Int32(length(res))
e0, e1, e2 = e[Int32(1), i], e[Int32(2), i], e[Int32(3), i]
c1, c2, c3 = c[Int32(1), i], c[Int32(2), i], c[Int32(3), i]
ae0, ae1, ae2 = α[e0], α[e1], α[e2]
be0, be1, be2 = β[e0], β[e1], β[e2]

c1, c2, c3 = coeffs[Int32(1), i], coeffs[Int32(2), i], coeffs[Int32(3), i]

wedge_terms[i] = (c1 * (ae2 * be1 - ae1 * be2)
+ c2 * (ae2 * be0 - ae0 * be2)
+ c3 * (ae1 * be0 - ae0 * be1))
res[i] =
(c1 * (ae2 * be1 - ae1 * be2) +
c2 * (ae2 * be0 - ae0 * be2) +
c3 * (ae1 * be0 - ae0 * be1))
i += stride
end

return nothing
nothing
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same performance questions as the other kernel.

@GeorgeR227
Copy link
Contributor

Please also let me know if the CUDA tests are passing after these changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants