-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimising foldrescale #440
Optimising foldrescale #440
Conversation
Performance:
Using the new FOLDRESCALE, force inlining:
According to this test the change made the code slower. However, it is not a fair evaluation as the test now evaluates with pirange=1 instead of 0, which is slower: #436
|
CMakeLists.txt
Outdated
@@ -31,6 +31,7 @@ option(FINUFFT_USE_OPENMP "Whether to use OpenMP for parallelization. If disable | |||
option(FINUFFT_USE_CUDA "Whether to build CUDA accelerated FINUFFT library (libcufinufft). This is completely independent of the main FINUFFT library" OFF) | |||
option(FINUFFT_USE_CPU "Whether to build the ordinary FINUFFT library (libfinufft)." ON) | |||
option(FINUFFT_STATIC_LINKING "Whether to link the static FINUFFT library (libfinufft_static)." ON) | |||
option(FINUFTT_BUILD_DEVEL "Whether to build developement executables" OFF) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo FINUFFT
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also typo development
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some minor tweaks. The main one I'd prefer to revert is the introduction of all the DEFAULT macros - it is a matter of taste (I can see potential advnatages were we to test if user has changed from default), but the disadvantage of unreadability is worse, to my mind. Also it has broken the sphinx doc system which actually serves that piece of source code to the docs. Currently the only help is in testing if a single deprecated opt is changed from default (and see below). I would prefer to leave that default hard-coded as 1, and test if it changed from 1 where the warning is given. In general if we as the devs want to insert code to test if the user has changed an opt from default, we can create a new struct, call finufft_default_opts() on it, and then compare that field. Would you be able to revert just the DEFAULT macros aspect (if no-one else has major feelings)? Rest is great, and you can proceed with killing pirange (I haven't checked docs/*.rst yet - we will all have to) Thanks! ALex
include/finufft_opts.h
Outdated
@@ -5,6 +5,30 @@ | |||
#ifndef FINUFFT_OPTS_H | |||
#define FINUFFT_OPTS_H | |||
|
|||
// Marco Barbone: 5.8.2024 | |||
// These are user-facing to that the user can reset to the default value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The user would have to recompile finufft lib to change these, correct? Couldn't they also just change finufft_default_opts() function themselves instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To change the MACROS? They are not supposed to be changed.
I think having values lying around instead of a named macro or constant is error prone. If we start adding more or removing options this can get out of have. I would like to pay the price now and have something like this now that delay to the future and let the dust pile. @blackwer what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm on the fence. The macros are a lot of pollution, but it's nice having everything in one place -- especially when you need it multiple times. My inclination is to make a constexpr finufft_opts FINUFFT_DEFAULT_OPTS{...};
in a header so that it can be autodocced and essentially namespace itself, grouping all relevant opts under the same header (alternatively prefixing FINUFFT_DEFAULT_OPTS_{OPTION}
is reasonable, if we fix the doc issue). with the constexpr
, finufft_default_opts()
can just copy it. when you need a comparison it won't pay a cost of lookup (as compared to a static variable somewhere). there might be a problem I'm overlooking, but thoughts?
src/finufft.cpp
Outdated
@@ -136,6 +136,13 @@ int setup_spreader_for_nufft(finufft_spread_opts &spopts, FLT eps, finufft_opts | |||
spopts.atomic_threshold = opts.spread_nthr_atomic; | |||
if (opts.spread_max_sp_size>0) // overrides | |||
spopts.max_subproblem_size = opts.spread_max_sp_size; | |||
if (opts.chkbnds != FINUFFT_CHKBND_DEFAULT) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, I see why default macros needed in this case.
src/finufft.cpp
Outdated
o->maxbatchsize = 0; | ||
o->spread_nthr_atomic = -1; | ||
o->spread_max_sp_size = 0; | ||
o->modeord = FINUFFT_MODEORD_DEFAULT; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find this now too abstract and hard to track down for the user reading the code - also the defaults cannot be viewed in the sphinx docs (note tags). Let's decide about this...
@@ -178,11 +160,9 @@ int spreadcheck(BIGINT N1, BIGINT N2, BIGINT N3, BIGINT M, FLT *kx, FLT *ky, | |||
/* This does just the input checking and reporting for the spreader. | |||
See spreadinterp() for input arguments and meaning of returned value. | |||
Split out by Melody Shih, Jun 2018. Finiteness chk Barnett 7/30/18. | |||
Bypass FOLDRESCALE macro which has inevitable rounding err even nr +pi, | |||
giving fake invalids well inside the [-3pi,3pi] domain, 4/9/21. | |||
Marco Barbone 5.8.24 removed bounds check as new foldrescale is not limited to [-3pi,3pi) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good thks for the docs
src/spreadinterp.cpp
Outdated
kx,ky,kz - length-M arrays of real coords of NU pts, in the domain | ||
for FOLDRESCALE, which includes [0,N1], [0,N2], [0,N3] | ||
respectively, if opts.pirange=0; or [-pi,pi] if opts.pirange=1. | ||
kx,ky,kz - length-M arrays of real coords of NU pts. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for this PR we'll want to keep that doc comment (pirange not gone yet:)
@@ -21,6 +21,7 @@ | |||
Either precision with dual-prec lib funcs 7/3/20. | |||
Added a chkbnds case to 1d1, 4/9/21. | |||
Made pass-fail, obviating results/dumbinputs.refout. Barnett 6/16/23. | |||
Removed the chkbnds case to 1d1, 05/08/2024. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good, nice catch.
src/spreadinterp.cpp
Outdated
|
||
} // namespace | ||
/* local NU coord fold+rescale macro: does the following affine transform to x: | ||
when p=true: x mod PI each to [0,N) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you mean (x+PI) mod 2PI
Re docs, I saw docs/math.rst still has 3pi in it. Maybe there are other places to remove from the docs (matlab, etc)? Did you findgrep on 3pi or 3\pi or 3 pi or 3 \pi ? :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
minor typo in .m files and matlabhelp.doc, outsied->outside
de9ae8f
to
634f163
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks! almost done! just find some cleanup in the comments.
docs/opts.rst
Outdated
<<<<<<< Updated upstream | ||
**chkbnds**: [DEPRECATED] has no effect. | ||
|
||
======= | ||
**chkbnds**: [DEPRECATED] It does nothing now. | ||
>>>>>>> Stashed changes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this from merge conflict?
docs/cguru.doc
Outdated
type 2, "targets". In contrast, for type 3 there are no restrictions on | ||
them, or on s, t, u, other than the resulting size of the internal fine | ||
* For type 1 and 2, the values in x (and if nonempty, y and z) can be in any | ||
interval, they will be folded to [-pi, pi]. Note: for large numbers outside |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there are some places using [-pi, pi], and some using [-pi, pi), maybe stick to [-pi, pi)?
@@ -9,6 +9,7 @@ If not stated, FINUFFT is assumed (cuFINUFFT <=1.3 is listed separately). | |||
* MAX_NF increased from 1e11 to 1e12, since machines grow. | |||
* improved GPU python docs: migration guide; usage from cupy, numba, torch, | |||
pycuda. PyPI pkg still at 2.2.0beta. | |||
* Used new foldrescale and removed tests for the range |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also pirange is removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
src/spreadinterp.cpp
Outdated
<<<<<<< Updated upstream | ||
kx,ky,kz - length-M arrays of real coords of NU pts, in the domain | ||
for FOLDRESCALE, which includes [0,N1], [0,N2], [0,N3] | ||
respectively, if opts.pirange=0; or [-pi,pi] if opts.pirange=1. | ||
======= | ||
kx,ky,kz - length-M arrays of real coords of NU pts. Domain is [-pi, pi], | ||
points outside are folded in. | ||
>>>>>>> Stashed changes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems also from git
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I may tweak some docs once in master
@@ -146,10 +144,10 @@ int main(int argc, char* argv[]) | |||
unsigned int se=MY_OMP_GET_THREAD_NUM(); // needed for parallel random #s | |||
#pragma omp for schedule(dynamic,1000000) reduction(+:strre,strim) | |||
for (BIGINT i=0; i<M; ++i) { | |||
kx[i]=rand01r(&se)*N; | |||
kx[i]=randm11r(&se)*3*M_PI; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be randm11r * M_PI/2 here, unless you explicitly want to test folding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
etc same 5x belwo. I can fix this minor tweak after merging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, Marco says this is to test folding, deliberately.
@@ -53,7 +53,7 @@ int main() { | |||
c = (float _Complex *)malloc(M * sizeof(float _Complex)); | |||
f = (float _Complex *)malloc(N * sizeof(float _Complex)); | |||
|
|||
// Fill with random numbers. Frequencies must be in the interval [-pi, pi] | |||
// Fill with random numbers. Frequencies must be in the interval [-pi, pi) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
freqs can be anything due to folding. Just a minor tweak I can fix after merging.
@@ -11,7 +11,9 @@ | |||
f(k1) = SUM c[j] exp(+/-i k1 x(j)) for -ms/2 <= k1 <= (ms-1)/2 | |||
j=1 | |||
Inputs: | |||
x locations of nonuniform sources on interval [-3pi,3pi), length nj | |||
x locations of nonuniform sources on interval [-pi, pi) length nj. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you read docs/README (or makefile) you'll see this file docs/matlabhelp.doc is overwritten in the make docs process. So, will go away.
PR #440 tests on AMD laptop 5700U CPU (8-core)
|
Optimized foldrescale using @mreineck suggestions. This removed the range limitation and made the code faster. Results in the comments.
List of changes: