Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More performance for (cu)finufft #4

Open
paquiteau opened this issue Sep 23, 2024 · 8 comments
Open

More performance for (cu)finufft #4

paquiteau opened this issue Sep 23, 2024 · 8 comments
Assignees

Comments

@paquiteau
Copy link
Member

paquiteau commented Sep 23, 2024

Following

flatironinstitute/finufft#564 (comment)

(Let's continue here to avoid flooding the PR review)

We installed cufinufft with pip. Manual installation with optimized flags may bring even better results.

@DiamonDinoia

@paquiteau paquiteau self-assigned this Sep 23, 2024
@DiamonDinoia
Copy link

DiamonDinoia commented Sep 23, 2024

Hi @paquiteau,

(unistalling finufft, cufinufft or using --reinstall might be required)
you could do as showed in the most recent docs: https://finufft.readthedocs.io/en/latest/python.html

pip install --no-binary finufft finufft

in finufft for 2D and 3D ducc is faster than fftw so one can try:

pip install --no-binary finufft finufft --config-settings=cmake.define.FINUFFT_USE_DUCC0=ON finufft

and

pip install --no-binary cufinufft cufinufft

I assume you have a c++17 compiler (gcc-13 recommended) and nvcc (12 is best) installed.
I am not sure you need cmake or others as this might be brought in by pip.

Thanks,
Marco

@paquiteau
Copy link
Member Author

Hello @DiamonDinoia , our engineer @Lenoush work on a revised edition of the benchmark. The results are in #5 . It seems that the --no-binary installation has worse performances than the classic one for cufinufft.

@DiamonDinoia
Copy link

Hi could you provide the images here? From the PR is not clear to me what I should be looking at.

cufinufft being slower but using less vram is possible. May I ask the exact version of cuda used?
finufft using GPU memory I cannot explain it. There is no cudaMalloc in the code. @janden could you have a look as if there is any python cuda allocation in finufft maybe we missed something.

@paquiteau
Copy link
Member Author

The cuda allocation in finufft is probably a bug on our side, the process had probably some remaining GPU memory before running finufft.

The results for the updated benchmark are here to browse: https://github.com/mind-inria/mri-nufft-benchmark/tree/52a38f328124070a955b62bf51ec48adfdd27af2/results

@DiamonDinoia
Copy link

Can I ask the CUDA version?

@paquiteau
Copy link
Member Author

Yes, we are using CUDA 12.2

@DiamonDinoia
Copy link

With 12.2 the use of alloca is enabled. I think that differently from my tests using alloca makes things a bit slower but by using the stack as much as possible it reduces VRAM consumption. I will test this further once I have time but the use of alloca in cufinufft might need to be made optional.

@DiamonDinoia
Copy link

It is not clear to me the binary and non binary installation of finufft? Did you also test with ducc0 fft ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants