Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU Batch 10 #1193

Merged
merged 1 commit into from
Sep 9, 2024
Merged

GPU Batch 10 #1193

merged 1 commit into from
Sep 9, 2024

Conversation

mborland
Copy link
Member

@mborland mborland commented Sep 6, 2024

The focus of this PR was to round out support for the distribution now that many of the fundamental special functions have GPU support. Adds support to the following distributions: inverse gaussian, lognormal, non central beta, non central f, non central chi squared, negative binomial, normal, pareto, poisson, rayleigh, students t, triangular and uniform.

The three remaining unsupported distributions are the hyperexponential, hypergeometric, and non-central t. The first two need related special functions support. Non-central T needs GPU support for quadrature built out, specifically exp-sinh. My initial thoughts are restricting quadrature to NVCC and NVRTC based on available APIs for memory allocations, but we will see.

On device CI runs can be found: cppalliance/cuda-math#21.

CC: @dschmitz89, @izaid, @steppi.

Add SYCL testing of normal dist

Add CUDA testing of normal dist

Add NVRTC testing of normal dist

NVRTC fixes

Move headers for NVRTC support

Add GPU support to inverse gaussian dist

Add NVRTC testing of inverse Gaussian dist

Add CUDA testing of inverse gaussian dist

Add SYCL testing of inverse gaussian dist

Add GPU support to lognormal dist

Add SYCL testing of lognormal dist

Add CUDA testing of lognormal dist

Add nvrtc testing of lognormal dist

Add GPU support to negative binomial dist

Avoid float_prior on GPU platform

Add NVRTC testing of negative binomial dist

Fix ambiguous use of nextafter

Add CUDA testing of negative binomial dist

Fix float_prior workaround

Add SYCL testing of negative binomial dist

Add GPU support to non_central_beta dist

Add SYCL testing of nc beta dist

Add CUDA testing of nc beta dist

Enable generic dist handling on GPU

Add GPU support to brent_find_minima

Add NVRTC testing of nc beta dist

Add utility header

Replace non-functional macro with new function

Add GPU support to non central chi squared dist

Add SYCL testing of non central chi squared dist

Add missing macro definition

Markup generic quantile finder

Add CUDA testing of non central chi squared dist

Add NVRTC testing of non central chi squared dist

Add GPU support to the non-central f dist

Add SYCL testing of ncf

Add CUDA testing of ncf dist

Add NVRTC testing of ncf dist

Add GPU support to students_t dist

Add SYCL testing of students_t dist

Add CUDA testing of students_t

Add NVRTC testing of students_t dist

Workaround for header cycle

Add GPU support to pareto dist

Add SYCL testing of pareto dist

Add CUDA testing of pareto dist

Add NVRTC testing of pareto dist

Add missing header

Add GPU support to poisson dist

Add SYCL testing of poisson dist

Add CUDA testing of poisson dist

Add NVRTC testing of poisson dist

Add forward decl for NVRTC platform

Add GPU support to rayleigh dist

Add CUDA testing of rayleigh dist

Add SYCL testing of rayleigh dist

Add NVRTC testing of rayleigh dist

Add GPU support to triangular dist

Add SYCL testing of triangular dist

Add NVRTC testing of triangular dist

Add CUDA testing of triangular dist

Add GPU support to the uniform dist

Add CUDA testing of uniform dist

Add SYCL testing of uniform dist

Add NVRTC testing of uniform dist

Fix missing header

Add markers to docs
@jzmaddock
Copy link
Collaborator

The use of quadrature in the non central T is a new addition to improve accuracy in the tails: IMO it would not be unreasonable to disable this on the GPU. Likewise there may be other functions which need slimming down for GPU usage, even though that may mean a lower quality of implementation. The trick is deciding where to draw the line ;)

@mborland
Copy link
Member Author

mborland commented Sep 6, 2024

The use of quadrature in the non central T is a new addition to improve accuracy in the tails: IMO it would not be unreasonable to disable this on the GPU. Likewise there may be other functions which need slimming down for GPU usage, even though that may mean a lower quality of implementation. The trick is deciding where to draw the line ;)

That makes sense. Now that we have more substantial testing there are some clear outliers appearing in testing run time: https://github.com/cppalliance/cuda-math/actions/runs/10741618717/job/29792448169?pr=21#step:9:262. Once things are correct we can take optimization steps. @NAThompson is also interested in the prospect of the quadrature algorithms being used by others on device.

Copy link

codecov bot commented Sep 6, 2024

Codecov Report

Attention: Patch coverage is 97.26316% with 13 lines in your changes missing coverage. Please review.

Project coverage is 93.79%. Comparing base (1e9b2cc) to head (e9cd6c9).
Report is 6 commits behind head on develop.

Files with missing lines Patch % Lines
include/boost/math/distributions/rayleigh.hpp 89.18% 4 Missing ⚠️
...lude/boost/math/distributions/non_central_beta.hpp 93.75% 3 Missing ⚠️
include/boost/math/distributions/students_t.hpp 91.30% 2 Missing ⚠️
...e/boost/math/distributions/detail/generic_mode.hpp 83.33% 1 Missing ⚠️
...lude/boost/math/distributions/inverse_gaussian.hpp 97.22% 1 Missing ⚠️
include/boost/math/distributions/pareto.hpp 97.50% 1 Missing ⚠️
include/boost/math/distributions/poisson.hpp 96.00% 1 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff            @@
##           develop    #1193   +/-   ##
========================================
  Coverage    93.78%   93.79%           
========================================
  Files          654      655    +1     
  Lines        53774    53851   +77     
========================================
+ Hits         50431    50508   +77     
  Misses        3343     3343           
Files with missing lines Coverage Δ
...ath/distributions/detail/common_error_handling.hpp 82.14% <100.00%> (ø)
...ost/math/distributions/detail/generic_quantile.hpp 83.87% <100.00%> (ø)
...ath/distributions/detail/inv_discrete_quantile.hpp 88.55% <100.00%> (+0.05%) ⬆️
include/boost/math/distributions/lognormal.hpp 81.81% <100.00%> (+0.61%) ⬆️
...ude/boost/math/distributions/negative_binomial.hpp 89.74% <100.00%> (+0.49%) ⬆️
...ost/math/distributions/non_central_chi_squared.hpp 95.53% <100.00%> (+0.03%) ⬆️
include/boost/math/distributions/non_central_f.hpp 85.50% <100.00%> (+0.07%) ⬆️
include/boost/math/distributions/normal.hpp 85.92% <100.00%> (+0.76%) ⬆️
include/boost/math/distributions/triangular.hpp 89.04% <100.00%> (+0.63%) ⬆️
include/boost/math/distributions/uniform.hpp 84.82% <100.00%> (ø)
... and 27 more

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1e9b2cc...e9cd6c9. Read the comment docs.

@jzmaddock
Copy link
Collaborator

@NAThompson is also interested in the prospect of the quadrature algorithms being used by others on device.

Nod, that's actually a good use case given that the algorithms are in principle massively parallel. In that case it probably makes more sense for the Integrand to be on the device, and the quadrature routine the glue that pulls the results together?

@mborland
Copy link
Member Author

mborland commented Sep 6, 2024

@NAThompson is also interested in the prospect of the quadrature algorithms being used by others on device.

Nod, that's actually a good use case given that the algorithms are in principle massively parallel. In that case it probably makes more sense for the Integrand to be on the device, and the quadrature routine the glue that pulls the results together?

I think so. I have to work through how to make a shared memory region on the device that won't be leaked. Right now nick uses a shared_ptr to vectors with all the data. Neither of those facilities are available on device. CUDA has a version if malloc and free which make it easy enough to implement vector, but a shared pointer is more involved.

@mborland mborland merged commit 937107a into develop Sep 9, 2024
79 checks passed
@mborland mborland deleted the GPU10 branch September 9, 2024 16:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants