More performance for (cu)finufft #4

paquiteau · 2024-09-23T13:31:00Z

Following

(Let's continue here to avoid flooding the PR review)

We installed cufinufft with pip. Manual installation with optimized flags may bring even better results.

DiamonDinoia · 2024-09-23T14:01:52Z

(unistalling finufft, cufinufft or using --reinstall might be required)
you could do as showed in the most recent docs: https://finufft.readthedocs.io/en/latest/python.html

pip install --no-binary finufft finufft

in finufft for 2D and 3D ducc is faster than fftw so one can try:

pip install --no-binary finufft finufft --config-settings=cmake.define.FINUFFT_USE_DUCC0=ON finufft

and

pip install --no-binary cufinufft cufinufft

I assume you have a c++17 compiler (gcc-13 recommended) and nvcc (12 is best) installed.
I am not sure you need cmake or others as this might be brought in by pip.

Thanks,
Marco

paquiteau · 2024-10-02T10:41:21Z

Hello @DiamonDinoia , our engineer @Lenoush work on a revised edition of the benchmark. The results are in #5 . It seems that the --no-binary installation has worse performances than the classic one for cufinufft.

DiamonDinoia · 2024-10-02T13:19:50Z

Hi could you provide the images here? From the PR is not clear to me what I should be looking at.

cufinufft being slower but using less vram is possible. May I ask the exact version of cuda used?
finufft using GPU memory I cannot explain it. There is no cudaMalloc in the code. @janden could you have a look as if there is any python cuda allocation in finufft maybe we missed something.

paquiteau · 2024-10-02T13:24:01Z

The cuda allocation in finufft is probably a bug on our side, the process had probably some remaining GPU memory before running finufft.

The results for the updated benchmark are here to browse: https://github.com/mind-inria/mri-nufft-benchmark/tree/52a38f328124070a955b62bf51ec48adfdd27af2/results

DiamonDinoia · 2024-10-02T13:37:26Z

Can I ask the CUDA version?

paquiteau · 2024-10-02T13:52:33Z

Yes, we are using CUDA 12.2

DiamonDinoia · 2024-10-03T08:05:36Z

With 12.2 the use of alloca is enabled. I think that differently from my tests using alloca makes things a bit slower but by using the stack as much as possible it reduces VRAM consumption. I will test this further once I have time but the use of alloca in cufinufft might need to be made optional.

DiamonDinoia · 2024-10-03T08:10:21Z

It is not clear to me the binary and non binary installation of finufft? Did you also test with ducc0 fft ?

paquiteau self-assigned this Sep 23, 2024

paquiteau assigned Lenoush Sep 23, 2024

Lenoush mentioned this issue Oct 2, 2024

Improving benchmark experience2.0 #5

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More performance for (cu)finufft #4

More performance for (cu)finufft #4

paquiteau commented Sep 23, 2024 •

edited

Loading

DiamonDinoia commented Sep 23, 2024 •

edited

Loading

paquiteau commented Oct 2, 2024

DiamonDinoia commented Oct 2, 2024

paquiteau commented Oct 2, 2024

DiamonDinoia commented Oct 2, 2024

paquiteau commented Oct 2, 2024

DiamonDinoia commented Oct 3, 2024

DiamonDinoia commented Oct 3, 2024

More performance for (cu)finufft #4

More performance for (cu)finufft #4

Comments

paquiteau commented Sep 23, 2024 • edited Loading

DiamonDinoia commented Sep 23, 2024 • edited Loading

paquiteau commented Oct 2, 2024

DiamonDinoia commented Oct 2, 2024

paquiteau commented Oct 2, 2024

DiamonDinoia commented Oct 2, 2024

paquiteau commented Oct 2, 2024

DiamonDinoia commented Oct 3, 2024

DiamonDinoia commented Oct 3, 2024

paquiteau commented Sep 23, 2024 •

edited

Loading

DiamonDinoia commented Sep 23, 2024 •

edited

Loading