Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gpu type 3 #517

Merged
merged 82 commits into from
Sep 12, 2024
Merged
Show file tree
Hide file tree
Changes from 76 commits
Commits
Show all changes
82 commits
Select commit Hold shift + click to select a range
45333fa
basic benchmarks
DiamonDinoia Jul 3, 2024
b95a082
added plotting script
DiamonDinoia Jul 4, 2024
ae55ca5
optimised plotting
DiamonDinoia Jul 8, 2024
16e27f0
fixed plotting and metrics
DiamonDinoia Jul 8, 2024
49d1f21
fixed the plot script
DiamonDinoia Jul 8, 2024
2fdae68
bin_size_x is as function of the shared memory available
DiamonDinoia Jul 8, 2024
c0d9923
bin_size_x is as function of the shared memory available
DiamonDinoia Jul 8, 2024
907797c
minor optimizations in 1D
DiamonDinoia Jul 9, 2024
60f4780
otpimized nupts driven
DiamonDinoia Jul 12, 2024
35dcc66
Optimized 1D and 2D
DiamonDinoia Jul 15, 2024
e1ad9bb
Merge branch 'master' into gpu-optimizations
DiamonDinoia Jul 15, 2024
366295d
3D integer operations
DiamonDinoia Jul 18, 2024
24bf6be
3D SM and GM optimized
DiamonDinoia Jul 18, 2024
960117a
bump cuda version
DiamonDinoia Jul 18, 2024
4295a86
Merge remote-tracking branch 'flatiron/master' into gpu-optimizations
DiamonDinoia Jul 23, 2024
c1b14c6
changed matlab to generate necessary cuda upsampfact files
DiamonDinoia Jul 23, 2024
f300d2d
added new coeffs
DiamonDinoia Jul 23, 2024
e86c762
Merge remote-tracking branch 'refs/remotes/origin/gpu-optimizations' …
DiamonDinoia Jul 23, 2024
db0457a
restoring .m from master
DiamonDinoia Jul 23, 2024
d0ce11e
updated hook
DiamonDinoia Jul 23, 2024
513ce4b
updated matlab upsampfact
DiamonDinoia Jul 23, 2024
798717d
updated coefficients
DiamonDinoia Jul 23, 2024
282baf5
new coeffs
DiamonDinoia Jul 23, 2024
12822a2
updated cufinufft to new coeff
DiamonDinoia Jul 23, 2024
badf22f
Merge remote-tracking branch 'flatiron/master' into gpu-optimizations
DiamonDinoia Jul 23, 2024
bf6328b
Merge remote-tracking branch 'flatiron/master' into gpu-optimizations
DiamonDinoia Jul 23, 2024
ae783da
picked good defaults for method
DiamonDinoia Jul 24, 2024
d29fcf5
update configuration
DiamonDinoia Jul 24, 2024
73f937b
upated build system
DiamonDinoia Jul 25, 2024
0724866
fixing jenkins
DiamonDinoia Jul 25, 2024
8cd50fc
using cuda 11.2
DiamonDinoia Jul 25, 2024
49a9d7e
using sm90 atomics
DiamonDinoia Jul 25, 2024
041a536
updated script
DiamonDinoia Jul 25, 2024
54683c3
fixed bin sizes
DiamonDinoia Jul 26, 2024
4f19103
Merge branch 'master' into gpu-optimizations
DiamonDinoia Jul 26, 2024
dc3a628
using floor in fold_rescale updated changelog
DiamonDinoia Jul 26, 2024
b3237f7
fixed a mistake
DiamonDinoia Jul 26, 2024
db80aad
added comments for review
DiamonDinoia Jul 26, 2024
c225fb5
fixing review comments
DiamonDinoia Jul 31, 2024
394550f
Merge remote-tracking branch 'flatiron/master' into gpu-optimizations
DiamonDinoia Jul 31, 2024
5606aa0
merged master
DiamonDinoia Jul 31, 2024
74ccd71
fixed cmake
DiamonDinoia Jul 31, 2024
ee28d05
Gcc-9 fixes; Ker size fixed too
DiamonDinoia Aug 1, 2024
466ddff
windows compatibility tweak; unit testing the 1.25 upsampfact
DiamonDinoia Aug 1, 2024
3f60ca4
Merge remote-tracking branch 'flatiron/master' into gpu-optimizations
DiamonDinoia Aug 1, 2024
fb48ff8
added forgotten c++17 flag
DiamonDinoia Aug 1, 2024
8c42061
Merge remote-tracking branch 'flatiron/master' into gpu-type-3
DiamonDinoia Aug 2, 2024
b64f68e
Preliminary type 3 commit. Incomplete setpts but greatly simplifies t…
DiamonDinoia Aug 8, 2024
7c810a5
Merge remote-tracking branch 'flatiron/master' into gpu-type-3
DiamonDinoia Aug 8, 2024
9d44993
testing
DiamonDinoia Aug 13, 2024
074dda5
Adding prephase and deconv with tests
DiamonDinoia Aug 14, 2024
332b5b7
first 3D working version
DiamonDinoia Aug 15, 2024
53a7c63
First working version, Horner breaks
DiamonDinoia Aug 15, 2024
9f517e3
Type 3 working
DiamonDinoia Aug 16, 2024
096cf1e
added 1D&2d type 3
DiamonDinoia Aug 20, 2024
3cfe406
fixed tests for type3
DiamonDinoia Aug 20, 2024
1842f68
fixed possible memory leaks
DiamonDinoia Aug 21, 2024
c13a6a9
minor changes, mainly for debug
DiamonDinoia Aug 21, 2024
f0a0fa4
small fixes
DiamonDinoia Aug 22, 2024
066906e
adding debug prints
DiamonDinoia Aug 23, 2024
6da956b
testing inner plan2 & using cudamemcpyasync
DiamonDinoia Aug 23, 2024
6098edc
testing the intter type 2 completely
DiamonDinoia Aug 26, 2024
e89a4f9
fixed type 3 without horner
DiamonDinoia Aug 27, 2024
fe1da53
type3 many support
DiamonDinoia Aug 27, 2024
d415f0d
type3 many tests for one target
DiamonDinoia Aug 28, 2024
289fb4f
updated docstring
DiamonDinoia Aug 28, 2024
bca0a73
removed small transf tests
DiamonDinoia Aug 28, 2024
d29cbba
XMerge remote-tracking branch 'flatiron/master' into gpu-type-3
DiamonDinoia Aug 28, 2024
71ad464
added extended lambda flag to tests
DiamonDinoia Aug 28, 2024
a494518
CleanUP
DiamonDinoia Aug 28, 2024
5788320
Updated changelog
DiamonDinoia Aug 28, 2024
4c7388e
fixed printf warning
DiamonDinoia Aug 28, 2024
46eb1d4
restored fftw behaviour
DiamonDinoia Aug 28, 2024
0ada7a0
Added devnotes on the issue
DiamonDinoia Aug 28, 2024
671e4ac
removed sprurious changes
DiamonDinoia Aug 28, 2024
7a7cff5
Minor cleanup
DiamonDinoia Aug 28, 2024
9b0da66
fixed math test
DiamonDinoia Aug 29, 2024
d3d4d34
Addressed review comments
DiamonDinoia Sep 4, 2024
52cd6cc
Merge remote-tracking branch 'flatiron/master' into gpu-type-3
DiamonDinoia Sep 11, 2024
1355818
splitting onedim_f_series in two functions
DiamonDinoia Sep 11, 2024
bc64a92
GPU flipwind type 1-2; fseries and nuft renaming to match CPU code
DiamonDinoia Sep 12, 2024
96980d3
fixed complex math test
DiamonDinoia Sep 12, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions CHANGELOG
Original file line number Diff line number Diff line change
@@ -1,6 +1,15 @@
List of features / changes made / release notes, in reverse chronological order.
If not stated, FINUFFT is assumed (cuFINUFFT <=1.3 is listed separately).

V 2.4.0 (08/28/24)
* Support for type 3 in 1D, 2D, and 3D in the GPU library cufinufft (PR #517).
* Removed the CPU fseries computation (only used for benchmark no longer needed).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using a separate * for each line here implies they are distinct features. Suggest use hyphen as below to group a sublist for GPU.

* Added complex arithmetic support for cuda_complex type
* Added tests for type 3 in 1D, 2D, and 3D and cuda_complex arithmetic
* Minor fixes on the GPU code:
- removed memory leaks in case of errors
- renamed maxbatchsize to batchsize

V 2.3.0-rc1 (8/6/24)

* Switched C++ standards from C++14 to C++17, allowing various templating
Expand Down
2 changes: 2 additions & 0 deletions docs/devnotes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,8 @@ Developer notes

* CMake compiling on linux at Flatiron Institute (Rusty cluster): We have had a report that if you want to use LLVM, you need to ``module load llvm/16.0.3`` otherwise the default ``llvm/14.0.6`` does not find ``OpenMP_CXX``.

* Note to the nvcc developer. nvcc with debug symbols causes a stack overflow that is undetected at both compile and runtime. This goes undetected until ns>=10, for ns<10, one can use -G and debug the code with cuda-gdb. The way to avoid is to not use Debug symbols, possibly using ``--generate-line-info`` might work (not tested). As a side note, compute-sanitizers do not detect the issue.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great. Maybe mention only d=3?


* Testing cufinufft (for FI, mostly):

.. code-block:: sh
Expand Down
28 changes: 14 additions & 14 deletions docs/opts.rst
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ to the simple, vectorized, or guru makeplan routines.
Recall how to do this from C++:

.. code-block:: C++

// (... set up M,x,c,tol,N, and allocate F here...)
finufft_opts* opts;
finufft_default_opts(opts);
Expand All @@ -30,7 +30,7 @@ Recall how to do this from C++:
This setting produces more timing output to ``stdout``.

.. warning::

In C/C++ and Fortran, don't forget to call the command which sets default options
(``finufft_default_opts`` or ``finufftf_default_opts``)
before you start changing them and passing them to FINUFFT.
Expand All @@ -51,9 +51,9 @@ Here are their default settings (from ``src/finufft.cpp:finufft_default_opts``):
.. literalinclude:: ../src/finufft.cpp
:start-after: @defopts_start
:end-before: @defopts_end

As for quick advice, the main options you'll want to play with are:

- ``modeord`` to flip ("fftshift") the Fourier mode ordering
- ``debug`` to look at timing output (to determine if your problem is spread/interpolation dominated, vs FFT dominated)
- ``nthreads`` to run with a different number of threads than the current maximum available through OpenMP (a large number can sometimes be detrimental, and very small problems can sometimes run faster on 1 thread)
Expand Down Expand Up @@ -92,15 +92,15 @@ Data handling options
.. note:: The index *sets* are the same in the two ``modeord`` choices; their ordering differs only by a cyclic shift. The FFT ordering cyclically shifts the CMCL indices $\mbox{floor}(N/2)$ to the left (often called an "fftshift").

**chkbnds**: [DEPRECATED] has no effect.


Diagnostic options
~~~~~~~~~~~~~~~~~~~~~~~

**debug**: Controls the amount of overall debug/timing output to stdout.

* ``debug=0`` : silent

* ``debug=1`` : print some information

* ``debug=2`` : prints more information
Expand All @@ -113,11 +113,11 @@ Diagnostic options

* ``spread_debug=2`` : prints lots. This can print thousands of lines since it includes one line per *subproblem*.


**showwarn**: Whether to print warnings (these go to stderr).

* ``showwarn=0`` : suppresses such warnings

* ``showwarn=1`` : prints warnings


Expand Down Expand Up @@ -173,17 +173,17 @@ for only two settings, as follows. Otherwise, setting it to zero chooses a good
**spread_thread**: in the case of multiple transforms per call (``ntr>1``, or the "many" interfaces), controls how multithreading is used to spread/interpolate each batch of data.

* ``spread_thread=0`` : makes an automatic choice between the below. Recommended.

* ``spread_thread=1`` : acts on each vector in the batch in sequence, using multithreaded spread/interpolate on that vector. It can be slightly better than ``2`` for large problems.

* ``spread_thread=2`` : acts on all vectors in a batch (of size chosen typically to be the number of threads) simultaneously, assigning each a thread which performs a single-threaded spread/interpolate. It is much better than ``1`` for all but large problems. (Historical note: this was used by Melody Shih for the original "2dmany" interface in 2018.)

.. note::

Historical note: A former option ``3`` has been removed. This was like ``2`` except allowing nested OMP parallelism, so multi-threaded spread-interpolate was used for each of the vectors in a batch in parallel. This was used by Andrea Malleo in 2019. We have not yet found a case where this beats both ``1`` and ``2``, hence removed it due to complications with changing the OMP nesting state in both old and new OMP versions.

**maxbatchsize**: in the case of multiple transforms per call (``ntr>1``, or the "many" interfaces), set the largest batch size of data vectors.

**batchsize**: in the case of multiple transforms per call (``ntr>1``, or the "many" interfaces), set the largest batch size of data vectors.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused -this seems to rename a CPU option? We don't want to change them. I thought we were changing maxbatchsize -> batchsize only internally and only in the GPU code ?

Here ``0`` makes an automatic choice. If you are unhappy with this, then for small problems it should equal the number of threads, while for large problems it appears that ``1`` often better (since otherwise too much simultaneous RAM movement occurs). Some further work is needed to optimize this parameter.

**spread_nthr_atomic**: if non-negative: for numbers of threads up to this value, an OMP critical block for ``add_wrapped_subgrid`` is used in spreading (type 1 transforms). Above this value, instead OMP atomic writes are used, which scale better for large thread numbers. If negative, the heuristic default in the spreader is used, set in ``src/spreadinterp.cpp:setup_spreader()``.
Expand Down
28 changes: 19 additions & 9 deletions include/cufinufft/common.h
Original file line number Diff line number Diff line change
Expand Up @@ -7,27 +7,37 @@
#include <finufft_errors.h>
#include <finufft_spread_opts.h>

#include <complex.h>
#include <complex>
#include <optional>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Used?


namespace cufinufft {
namespace common {
template<typename T>
__global__ void fseries_kernel_compute(int nf1, int nf2, int nf3, T *f,
cuDoubleComplex *a, T *fwkerhalf1, T *fwkerhalf2,
__global__ void fseries_kernel_compute(int nf1, int nf2, int nf3, T *f, T *a,
T *fwkerhalf1, T *fwkerhalf2, T *fwkerhalf3,
int ns);
template<typename T>
__global__ void fseries_kernel_compute(int nf1, int nf2, int nf3, T *f, T *a, T *kx,
T *ky, T *kz, T *fwkerhalf1, T *fwkerhalf2,
T *fwkerhalf3, int ns);
template<typename T>
int cufserieskernelcompute(int dim, int nf1, int nf2, int nf3, T *d_f,
cuDoubleComplex *d_a, T *d_fwkerhalf1, T *d_fwkerhalf2,
int cufserieskernelcompute(int dim, int nf1, int nf2, int nf3, T *d_f, T *d_a,
T *d_fwkerhalf1, T *d_fwkerhalf2, T *d_fwkerhalf3, int ns,
cudaStream_t stream);
template<typename T>
int cufserieskernelcompute(int dim, int nf1, int nf2, int nf3, T *d_f, T *d_a, T *d_kx,
T *d_ky, T *d_kz, T *d_fwkerhalf1, T *d_fwkerhalf2,
T *d_fwkerhalf3, int ns, cudaStream_t stream);
template<typename T>
int setup_spreader_for_nufft(finufft_spread_opts &spopts, T eps, cufinufft_opts opts);

void set_nf_type12(CUFINUFFT_BIGINT ms, cufinufft_opts opts, finufft_spread_opts spopts,
CUFINUFFT_BIGINT *nf, CUFINUFFT_BIGINT b);
template<typename T>
void onedim_fseries_kernel(CUFINUFFT_BIGINT nf, T *fwkerhalf, finufft_spread_opts opts);
template<typename T>
void onedim_fseries_kernel_precomp(CUFINUFFT_BIGINT nf, T *f, std::complex<double> *a,
// template<typename T>
// void onedim_fseries_kernel(CUFINUFFT_BIGINT nf, T *fwkerhalf, finufft_spread_opts
// opts);
template<typename T, bool>
void onedim_fseries_kernel_precomp(CUFINUFFT_BIGINT nf, T *f, T *a,
finufft_spread_opts opts);
template<typename T>
void onedim_fseries_kernel_compute(CUFINUFFT_BIGINT nf, T *f, std::complex<double> *a,
Expand Down
168 changes: 168 additions & 0 deletions include/cufinufft/contrib/helper_math.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,168 @@
#ifndef FINUFFT_INCLUDE_CUFINUFFT_CONTRIB_HELPER_MATH_H
ahbarnett marked this conversation as resolved.
Show resolved Hide resolved
#define FINUFFT_INCLUDE_CUFINUFFT_CONTRIB_HELPER_MATH_H

#include <cuComplex.h>

// Addition for cuDoubleComplex (double) with cuDoubleComplex (double)
__host__ __device__ __forceinline__ cuDoubleComplex operator+(
const cuDoubleComplex &a, const cuDoubleComplex &b) noexcept {
return cuCadd(a, b);
}

// Subtraction for cuDoubleComplex (double) with cuDoubleComplex (double)
__host__ __device__ __forceinline__ cuDoubleComplex operator-(
const cuDoubleComplex &a, const cuDoubleComplex &b) noexcept {
return cuCsub(a, b);
}

// Multiplication for cuDoubleComplex (double) with cuDoubleComplex (double)
__host__ __device__ __forceinline__ cuDoubleComplex operator*(
const cuDoubleComplex &a, const cuDoubleComplex &b) noexcept {
return cuCmul(a, b);
}

// Division for cuDoubleComplex (double) with cuDoubleComplex (double)
__host__ __device__ __forceinline__ cuDoubleComplex operator/(
const cuDoubleComplex &a, const cuDoubleComplex &b) noexcept {
return cuCdiv(a, b);
}

// Equality for cuDoubleComplex (double) with cuDoubleComplex (double)
__host__ __device__ __forceinline__ bool operator==(const cuDoubleComplex &a,
const cuDoubleComplex &b) noexcept {
return cuCreal(a) == cuCreal(b) && cuCimag(a) == cuCimag(b);
}

// Inequality for cuDoubleComplex (double) with cuDoubleComplex (double)
__host__ __device__ __forceinline__ bool operator!=(const cuDoubleComplex &a,
const cuDoubleComplex &b) noexcept {
return !(a == b);
}

// Addition for cuDoubleComplex (double) with double
__host__ __device__ __forceinline__ cuDoubleComplex operator+(const cuDoubleComplex &a,
double b) noexcept {
return make_cuDoubleComplex(cuCreal(a) + b, cuCimag(a));
}

__host__ __device__ __forceinline__ cuDoubleComplex operator+(
double a, const cuDoubleComplex &b) noexcept {
return make_cuDoubleComplex(a + cuCreal(b), cuCimag(b));
}

// Subtraction for cuDoubleComplex (double) with double
__host__ __device__ __forceinline__ cuDoubleComplex operator-(const cuDoubleComplex &a,
double b) noexcept {
return make_cuDoubleComplex(cuCreal(a) - b, cuCimag(a));
}

__host__ __device__ __forceinline__ cuDoubleComplex operator-(
double a, const cuDoubleComplex &b) noexcept {
return make_cuDoubleComplex(a - cuCreal(b), -cuCimag(b));
}

// Multiplication for cuDoubleComplex (double) with double
__host__ __device__ __forceinline__ cuDoubleComplex operator*(const cuDoubleComplex &a,
double b) noexcept {
return make_cuDoubleComplex(cuCreal(a) * b, cuCimag(a) * b);
}

__host__ __device__ __forceinline__ cuDoubleComplex operator*(
double a, const cuDoubleComplex &b) noexcept {
return make_cuDoubleComplex(a * cuCreal(b), a * cuCimag(b));
}

// Division for cuDoubleComplex (double) with double
__host__ __device__ __forceinline__ cuDoubleComplex operator/(const cuDoubleComplex &a,
double b) noexcept {
return make_cuDoubleComplex(cuCreal(a) / b, cuCimag(a) / b);
}

__host__ __device__ __forceinline__ cuDoubleComplex operator/(
double a, const cuDoubleComplex &b) noexcept {
double denom = cuCreal(b) * cuCreal(b) + cuCimag(b) * cuCimag(b);
return make_cuDoubleComplex((a * cuCreal(b)) / denom, (-a * cuCimag(b)) / denom);
}

// Addition for cuFloatComplex (float) with cuFloatComplex (float)
__host__ __device__ __forceinline__ cuFloatComplex operator+(
const cuFloatComplex &a, const cuFloatComplex &b) noexcept {
return cuCaddf(a, b);
}

// Subtraction for cuFloatComplex (float) with cuFloatComplex (float)
__host__ __device__ __forceinline__ cuFloatComplex operator-(
const cuFloatComplex &a, const cuFloatComplex &b) noexcept {
return cuCsubf(a, b);
}

// Multiplication for cuFloatComplex (float) with cuFloatComplex (float)
__host__ __device__ __forceinline__ cuFloatComplex operator*(
const cuFloatComplex &a, const cuFloatComplex &b) noexcept {
return cuCmulf(a, b);
}

// Division for cuFloatComplex (float) with cuFloatComplex (float)
__host__ __device__ __forceinline__ cuFloatComplex operator/(
const cuFloatComplex &a, const cuFloatComplex &b) noexcept {
return cuCdivf(a, b);
}

// Equality for cuFloatComplex (float) with cuFloatComplex (float)
__host__ __device__ __forceinline__ bool operator==(const cuFloatComplex &a,
const cuFloatComplex &b) noexcept {
return cuCrealf(a) == cuCrealf(b) && cuCimagf(a) == cuCimagf(b);
}

// Inequality for cuFloatComplex (float) with cuFloatComplex (float)
__host__ __device__ __forceinline__ bool operator!=(const cuFloatComplex &a,
const cuFloatComplex &b) noexcept {
return !(a == b);
}

// Addition for cuFloatComplex (float) with float
__host__ __device__ __forceinline__ cuFloatComplex operator+(const cuFloatComplex &a,
float b) noexcept {
return make_cuFloatComplex(cuCrealf(a) + b, cuCimagf(a));
}

__host__ __device__ __forceinline__ cuFloatComplex operator+(
float a, const cuFloatComplex &b) noexcept {
return make_cuFloatComplex(a + cuCrealf(b), cuCimagf(b));
}

// Subtraction for cuFloatComplex (float) with float
__host__ __device__ __forceinline__ cuFloatComplex operator-(const cuFloatComplex &a,
float b) noexcept {
return make_cuFloatComplex(cuCrealf(a) - b, cuCimagf(a));
}

__host__ __device__ __forceinline__ cuFloatComplex operator-(
float a, const cuFloatComplex &b) noexcept {
return make_cuFloatComplex(a - cuCrealf(b), -cuCimagf(b));
}

// Multiplication for cuFloatComplex (float) with float
__host__ __device__ __forceinline__ cuFloatComplex operator*(const cuFloatComplex &a,
float b) noexcept {
return make_cuFloatComplex(cuCrealf(a) * b, cuCimagf(a) * b);
}

__host__ __device__ __forceinline__ cuFloatComplex operator*(
float a, const cuFloatComplex &b) noexcept {
return make_cuFloatComplex(a * cuCrealf(b), a * cuCimagf(b));
}

// Division for cuFloatComplex (float) with float
__host__ __device__ __forceinline__ cuFloatComplex operator/(const cuFloatComplex &a,
float b) noexcept {
return make_cuFloatComplex(cuCrealf(a) / b, cuCimagf(a) / b);
}

__host__ __device__ __forceinline__ cuFloatComplex operator/(
float a, const cuFloatComplex &b) noexcept {
float denom = cuCrealf(b) * cuCrealf(b) + cuCimagf(b) * cuCimagf(b);
return make_cuFloatComplex((a * cuCrealf(b)) / denom, (-a * cuCimagf(b)) / denom);
}

#endif // FINUFFT_INCLUDE_CUFINUFFT_CONTRIB_HELPER_MATH_H
9 changes: 6 additions & 3 deletions include/cufinufft/defs.h
Original file line number Diff line number Diff line change
@@ -1,15 +1,18 @@
#ifndef CUFINUFFT_DEFS_H
#define CUFINUFFT_DEFS_H

#include <complex>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Used?

#include <limits>

// constants needed within common
// upper bound on w, ie nspread, even when padded (see evaluate_kernel_vector); also for
// common
#define MAX_NSPREAD 16
#define MAX_NSPREAD 16

// max number of positive quadr nodes
#define MAX_NQUAD 100
#define MAX_NQUAD 100

// Fraction growth cut-off in utils:arraywidcen, sets when translate in type-3
#define ARRAYWIDCEN_GROWFRAC 0.1

// FIXME: If cufft ever takes N > INT_MAX...
constexpr int32_t MAX_NF = std::numeric_limits<int32_t>::max();
Expand Down
Loading
Loading