Please tell us about your use of FINUFFT! #398
Replies: 9 comments 3 replies
-
As an example of a GPU use case with all of the parameters listed: #323 (reply in thread) |
Beta Was this translation helpful? Give feedback.
-
Hello (again)! We (the team doing Compressed Sensing for MRI, at Neurospin, France) uses both finufft and cufinufft in our pipeline for iterative reconstruction of non-cartesian MRI. In practice, we work with data that is:
NB: A key aspect of the Non-Cartesian Trajectories is that they are composed of "shots" that each passes through the center point of the k-space. Due to the number of coils, the potential use of extra interpolators (to correct for static field inhomogeneities3), the number of calls to type 1 (adjoint) and type 2 (forward) NUFFT is in the hundreds if not thousands. As there are many NUFFT implementations available, we have developed mri-nufft (in Python with numpy, cupy, and torch backend):
In term of parametrization:
finufft and cufinufft are the most stable implementations we can find out there. There are some performance challengers (well, just gpuNUFFT in some cases 4, but the recent merging of #330 calls for a new study) In terms of feature requests the discussion at #306 and #308 is a first point: getting preconditioning weights (similar to what is done in astronomy) helps for iterative reconstruction, and is essential for deep-learning based approaches56 @chaithyagr, @philouc, @matthieutrs Footnotes
|
Beta Was this translation helpful? Give feedback.
-
I missed this discussion post and therefore accidentally created an issue explaining how finufft is used in MRI! I also see @paquiteau have already answered quite thourougly. Many of the things he describes is also the case here where I do my research in Compressed Sensing at Umeå University. We mainly work with 4D-flow MRI data (3D+time) and not 4D (functional MRI) but how this library is used I imagine is rather the same. We also, due to multiple coils and multiple velocity/phase encodings have to do potentially hundreds or thousands of nuffts every iteration. As I didn't take the time to rewrite my issue post more suitably for this discussion I have simply pasted what I wrote there here instead, and added some extra information. ISSUE POSTI created this issue as a discussion on what other extensions the finufft library could provide that are still within the grasp of what it tries to achieve. Mainly, I hope it could bring some light to the reply @ahbarnett provided in #306. As such, the listed points will be written as if answered to that comment, so it is probably nice to read that comment first and also perhaps the expertly written inverse nufft tutorial. First, I see how exposing the spreading/interpolation operations is problematic. I agree that the nufft is the crucial feature of the library, and if other spreaders/interpolators that would make the nufft faster were found that should be prioritized! I think exposing just the spreader and interpolator in C/C++ would still be useful, but users would have to be carefull or aware that they may change between versions and so on. If asking me, by no means make this a priority. Although, as mentioned in @ahbarnett's reply, the sinc^2 weights of Geengard, Inati, et al, would be very useful! So, if it would be feasible to implement an interface to sinc^2 quadrature weights that would be great! If exposing the fast sinc^2 in the process seems reasonable that would also be nice. As pointed out. In MRI applications one often tries to minimize ||Ax-b|| where A represents a nufft. A may be both a 2D or 3D nufft depending on the particular problem. The iterative solvers have to repeatedly perform the operation A^HA, the normal operator. Sometimes to improve convergence, one instead solves ||W(Ax-b)||, where W is a diagonal matrix such that the iterations computes A^HWA instead. Offering up best likelihood estimation for convergence speed. Preconditioning might seem like a better alternative, and indeed sometimes it is, but sometimes also not. Preconditioning makes some of the proximal algorithms different and more difficult. The W used for convergence speedup is also often the same as the "density compensation weights" mentioned. Sometimes the density compensation weights are also used to perform a "gridding reconstruction". That is, one finds W such that instead of solving ||Ax-b|| via an iterative solver one just does x = A^H(Wb), this is approximate but fast! Also mentioned in the inverse tutorial (and often utilized in MRI), the toeplitz approach can greatly increase the speed of performing A^HA. But this approach is also sometimes not viable. For some problems, multiple different A^HA (coordinates differ) has to be performed in each iteration. Storing the toeplitz kernels for these kernels takes up too much memory so computing A^HA via nuffts has to be used instead. Also sometimes other modifications to the minimization problem taking into account off-resonance effects and such makes the toeplitz approach less feasible. Actually these problems are discussed in the Fessler article referenced in the inversion tutorial. After all this text I would like to provide some features that would be really useful for MRI users that hopefully are still within the finufft grasp. I list them in order of importance.
As I see it these features are all within the subject of what finufft strives to achieve, especially the first one. Atleast, if one considers inverse nufft as a goal. And I also think that changes to inner workings of how the nufft is performed could be made without this interface being unstable. EXTRA INFO
Another feature I didn't list but would be appreciated is the 1.25 upsampling factor mentioned in the cufinufft repo (issue 126) |
Beta Was this translation helpful? Give feedback.
-
The way we use it is quite different. Our entire 4D-MRI reconstruction is based on the XD-GRASP1. We implemented everything in c++. Give the structure of the algorithm we do some pre-processing then we split the reconstruction into independent conjugate-gradient problems along the Z-dimension. This has two advantages:
We further split the problem into multiple respiration phases, this allows to do multiple 2D transforms. Since, these transforms do not saturate the GPU we queue multiple of them in parallel. At each reconstruction we do thousands of 2D reconstruction. To further minimize overhead we only use the GURU interface and FFTW_MEASURE and we create the plans once per thread re-using them until the end of the program. Details:
Footnotes |
Beta Was this translation helpful? Give feedback.
-
Our use case is CMB lensing, and we work on iterative lensing reconstruction1. The Wiener-filtering is done with a conjugate-gradient (cg) descent. For each cg-iteration, we have to perform the lensing operation. To do the lensing operation using non-uniform FFTs, we,
This approach is tremendously better and faster than previous approaches as discussed in our lenspyx paper2, that implements a CPU code. We currently explore how much faster the lensing operation is on a GPU using cufinuFFT. All the enumerated steps above have to be put on the GPU. We use SHTns for the GPU SHT calculation. The spin-0 implemenation of the lensing operation is almost done and speed ups look quite good at the moment! Plan is to also include spin-2 (or spin-n, really), and eventually integrate this into delensalot. Footnotes |
Beta Was this translation helpful? Give feedback.
-
We are using the FINUFFT for a poisson solver and I have a basic question. When the sample is non-equally spaced is there aliasing when taking the type 1 and type 2 NUFFT? Is it possible to eliminate the aliasing? If not, how to best deal with aliasing? |
Beta Was this translation helpful? Give feedback.
-
Hello Nikos,
Aliasing can be interpreted as quadrature error. Eg, to take a simple case,
applying the FFT to a regular grid of samples to estimate the Fourier
series coefficients of a function induces aliasing error (shifted copies of
the spectrum) which can be completely understood as quadrature error of the
periodic trapezoid rule. The exact answer would be the Euler-Fourier
formula, an integral, but the grid approximates it (with equal weights).
Now, if you are using a NUFFT (eg type 1 to get to F series coeffs), then
you must have used non-constant quadrature weights. And, yes, that would
induce aliasing, but not straightforward shifted copies.
Usually a similar Nyquist principle for the maximum quadrature node spacing
would ensure small aliasing errors.
I suggest going through my tutorial examples to play with this.
https://finufft.readthedocs.io/en/latest/tut.html
Best, Alex
…On Wed, Oct 16, 2024 at 11:39 PM nikos2wheels ***@***.***> wrote:
We are using the FINUFFT for a poisson solver and I have a basic question.
When the sample is non-equally spaced is there aliasing when taking the
type 1 and type 2 NUFFT? Is it possible to eliminate the aliasing? If not,
how to best deal with aliasing?
—
Reply to this email directly, view it on GitHub
<#398 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACNZRSSKBN5LWU2GL46XHT3Z34WP3AVCNFSM6AAAAABQCZHX2SVHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTAOJWGY2TCMA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***
com>
--
*-------------------------------------------------------------------~^`^~._.~'
|\ Alex Barnett Center for Computational Mathematics, Flatiron Institute
| \ http://users.flatironinstitute.org/~ahb 646-876-5942
|
Beta Was this translation helpful? Give feedback.
-
Hi, I made a Python package that strongly relies on FINUFFT. The package is named PyEPRI and is dedicated to Electron Paramagnetic Resonance Imaging (EPRI). In few words, EPRI is an imaging technique of paramagnetic species, with applications in biomedical sciences and chemistry. Many demonstration examples are available in the PyEPRI documentation. In terms of modeling, a standard EPRI acquisition corresponds to a sequence of 1D projections (called a sinogram), each projection roughly corresponds to the convolution between the spectral signature of the paramagnetic sample (called the reference spectrum) and the Radon transform of the image to be reconstructed (its direction depends on the projection and is controlled during the acquisition). The direct operator The typical use case is 3D, although 2D is sometimes considered to reduce the acquisition time. The input and output data size depends on the experimental constraints (mainly the acquisition time). For instance, in this particular in vitro experiment, the input dataset contains 8836 projections, each projection contains 360 measurement points, leading to up to Modern image reconstruction techniques rely on variational models and iterative algorithms. Most standard optimization schemes would require the evaluation of Thanks to FINUFFT, the PyEPRI package is CPU & GPU compatible. I use it mostly in I would like to thank you for developing and maintaining such a great library 🙏. Its impressive fastness and low memory consumption, and also its GPU compatibility greatly motivated me to create the PyEPRI package. Thank you also again for your support and the fix related to #454. My roadmap for the next release of PyEPRI contains many items, I may post several suggestions related to some additional new features that I would like to add to PyEPRI in a near future. Best, |
Beta Was this translation helpful? Give feedback.
-
Hi everyone, I am Sriram a researcher from the Math department in Jülich Supercomputing Centre, Germany. I use FINUFFT or specifically cuFINUFFT for Particle-In-Fourier simulations of kinetic plasma. Here are my answers to the questions
|
Beta Was this translation helpful? Give feedback.
-
This is a discussion thread for users to post how they use FINUFFT. This will help us optimize and extend the software.
The main questions to answer are:
what type (1,2,3) and dimension?
what number of modes N = (N1,N2,...) ie uniform grid shape? (if type 1 or 2)
what number of nonuniform pts M? And are these points clustered or quasi-uniform?
what tolerance? (and do you use single- or double-precision library?)
what ntransf (if you use vectorized interface)?
Do you use CPU, GPU, or both?
For CPU, how many threads do you use? Do you use single-threaded calls in parallel?
Do you use the guru or the simple interfaces?
What language wrappers do you use?
What is your application area? (possibly link to paper or your package)
What feature requests do you have?
If you want to give a link to the part of your code that uses FINUFFT, that might also be useful.
Thanks so much! Alex
Beta Was this translation helpful? Give feedback.
All reactions