-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More performance for (cu)finufft #4
Comments
Hi @paquiteau, (unistalling finufft, cufinufft or using --reinstall might be required)
in finufft for 2D and 3D ducc is faster than fftw so one can try:
and
I assume you have a c++17 compiler (gcc-13 recommended) and nvcc (12 is best) installed. Thanks, |
Hello @DiamonDinoia , our engineer @Lenoush work on a revised edition of the benchmark. The results are in #5 . It seems that the --no-binary installation has worse performances than the classic one for cufinufft. |
Hi could you provide the images here? From the PR is not clear to me what I should be looking at. cufinufft being slower but using less vram is possible. May I ask the exact version of cuda used? |
The cuda allocation in finufft is probably a bug on our side, the process had probably some remaining GPU memory before running finufft. The results for the updated benchmark are here to browse: https://github.com/mind-inria/mri-nufft-benchmark/tree/52a38f328124070a955b62bf51ec48adfdd27af2/results |
Can I ask the CUDA version? |
Yes, we are using CUDA 12.2 |
With 12.2 the use of alloca is enabled. I think that differently from my tests using alloca makes things a bit slower but by using the stack as much as possible it reduces VRAM consumption. I will test this further once I have time but the use of alloca in cufinufft might need to be made optional. |
It is not clear to me the binary and non binary installation of finufft? Did you also test with ducc0 fft ? |
Following
flatironinstitute/finufft#564 (comment)
(Let's continue here to avoid flooding the PR review)
We installed cufinufft with pip. Manual installation with optimized flags may bring even better results.
@DiamonDinoia
The text was updated successfully, but these errors were encountered: