-
Notifications
You must be signed in to change notification settings - Fork 100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MNT] - Optimizations for curve fit #299
Conversation
This is awesome! Really cool to see nice optimizations here. In terms of:
It may be nice to agree on a set of simulations (I attempted something similar here) that can be used to benchmark and validate performance of various optimizations, so we have some objective measure of choosing one setting over another. I can rerun these simulations to see how this PR compares to main, or we can update/improve the set of simulations first (maybe something like random uniforms from [lower_bound, upper_bound] for each param?). For bounded optimization, I've had some luck using scipy's LBFGSB. I wonder how it compares to TRF. |
@ryanhammonds - your work with Rust and comparisons are what got me thinking more about why we are so slow and trying this out! In terms of having a basic set of comparisons for profiling, I totally agree! Basic idea is to check {fixed, knee} fits across {one-peak, multi-peak, high noise} examples. Having cleaned this notebook up a bit, the one / multi peak cases have what seem like useful but fairly modest improvements (order of 5-50% speed up), where as the high noise can be several times faster - this is, I think, because without setting a peak bound we can have many peaks, and estimating the Jacobian for such a large function gets really slow (note that this probably exaggerates the benefit in real cases, since in real cases one would want to bound the number of peaks anyways). What you have for comparing is more detailed / better than my super simple things - so perhaps we should go with something more like that! |
First of all this is great work, Tom! But ugh what I think this means is that we should not only do what Ryan suggests of having some good simulations for benchmarking, but we should take it even further:
Maybe not for this implementation, but if you two agree, please convert this comment to an issue and I'll save it for when we hire a full-time dev (hopefully this year!). |
@voytek & @ryanhammonds - I do agree with the broader points here, though I'd like to suggest we don't miss the trees for the forest in terms of this PR as compared to bigger & broader plans :). For this PR, if we can move forward with fairly standard code review, as long as we are confident that it offers some speedup, and is consistently accurate (the only more notably difference is everyone being on board with the tolerance changes noted above), then I hope we can merge this in, and I can merge it through to 2.0 and continue updates there. For the broader perspective, I think much if not all of this is best organized around 2.0, since various things will change by then anyways (for example, benchmarking the broader set of fit functions, rather than the limited set in 1.X, and using the updated organization in relation to fully describing algorithm decision points). I think these two things (common benchmarking & more detailed algorithm description) are within scope to happen as we work towards 2.0. In terms of Brad's third point (full audit + explore new approaches), and Ryan's comment about LBFGSB then this is something we can keep in mind and consider after 2.0 if/when someone has time (and can describe in a new issue). Does this all sound good? If so, the practical outcomes are to check the details of this PR, I'll keep track of some notes and plans here for anything at the 2.0 level, and we can open issues for longer term plans. |
Oh for sure. I'm happy with this as is now, I just think we should—for 2.0—actually implement some of the above suggestions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! I left a comment for gaussian jacobian function that reduces redundant computation.
Also, I noticed curve_fit has weird behavior for me when using timeit in a notebook (run %%timeit -n 1 -r 1
multiple times vs %%timeit
). A similar thing occurs when using a loop to track runtime withtime.time()
values across random seeds. The first iteration is slow relative to the subsequent iterations, similar to using %%timeit -n 1 -r 1
vs multilpe runs with %%timeit
. If a time.sleep(1.)
is inserted at the end of the loop, after timing for the iteration is complete, the effect goes away (all iterations are slowish). Below shows the same 10 simulations and fitting parameters, the only difference being a 1s sleep at the end of the loop after runtime has been logged into an array.
I'm not sure if this is just my computer/hardware or not. The instability in timing has made me question it's accuracy and maybe %%timeit
is giving underestimated results.
Thanks for the deep dive, and making things even faster @ryanhammonds! In terms of the refactoring, it looks greats - and I left replies to the comments above! In terms of the timeit quirks - that's a great catch, and I have basically no idea what's going on here lol. My first thought was that maybe curve_fit specifically does some kind of caching (?). To poke at this, using FWIW - I was using That aside, I think regardless of some fuzziness of exactly which timeit reports to exactly believe, in relative terms, it's consistent that the current version with Jacobians is faster than main, and your update of the Jacobians is faster than what's currently on this branch, right? If so, I think we can move forward with this PR, and just accept that profiling is confusing and difficult lol. |
This comment was marked as resolved.
This comment was marked as resolved.
I think the reason why it's slow for the first iteration is due to the use of memoization to reduce redundant computation, particularly at the guess params and bounds since the curve needs to evaluated at those points every time curve_fit is called with the same settings and it seems stores those results until garbage collection frees it (or maybe until the minpack or openblas process drops). Adding noise to guess and bounds removes the effect. This may be a consideration for future benchmarks since related reductions in runtime may not be present with real data if guess and bounds are expected to change. Everything else looks good to go here though! |
had a quick scan, I have nothing insightful to add, I didn't even know you could speed up Somewhat unrelated so not sure if this is the best place to discuss this, but since we're on the topic of many peaks slowing things down: I think the default With a lower default, the tangential benefit would be a speedup in the common use cases when less resources are dedicated to fitting ghost peaks (which of course wouldn't manifest in the speed tests here when actually setting |
@rdgao's comment goes back to my data-driven refactoring idea: let's just try all combos and choose the fastest among the subset that returns the correct fit. But anyway, I agree about max_n_peaks being a sticking point. I understand the desire to force a default, but that's motivated by our a priori knowledge about neural data. Now that other fields (nuclear fusion, materials science) are adapting specparam for their needs, forcing n=3 or whatever probably doesn't make sense. |
In terms of the code updates here - following @ryanhammonds work on optimizing the Gaussian Jacobian, I updated the implementation of the aperiodic function Jacobians - they do get ~ twice as fast, but as far as I can tell, using them still does not seem to give any speedup (curve_fit must be able to estimate these very well), so I still have not set them to be used (but they are available in the module). In terms of @rdgao's comment on tweaking default parameters, I've opened an issue for discussion of any default parameter changes (#303). My suggestion is that we should consider this for the 2.0 release, as it's a key time to change any defaults (I don't want to change any defaults here, in 1.1, as the idea is that running 1.0 vs 1.1 shouldn't change anything). Let's move / continue any discussion there (I've added some comments on In terms of the previous discussion of developing a more formal / systematic approach for benchmarking and trying out some of the other ideas that have come up, I've started this issue for that: #304 Otherwise, back to this PR, I think we can call this done on code changes, and merge in from here - sound good @ryanhammonds , @voytek, @rdgao ? |
sounds good to me! |
Yep, I'm good with this as well. |
This is a series of small tweaks to the fit algorithm that should result in minor improvements in run speed.
Updates include:
check_finite
to False in curve_fit (since we do our own data checking)curve_fit
tolerance settings as a private setting, and set new default value that is a bit more liberal*I did add the Jacobians for the AP functions as well, but there was more mixed results whether this provided any speedup - notably these calls are already pretty fast, and so based on this post, I think for AP funcs we are often in a case in which the checks on the passed in Jacobian supercede any benefit as compared to computing it. Because of this, I did not actually set the Jacobian functions to be used in the aperiodic component curve_fit calls.
Results !?
Without being particularly rigorous about profiling, pretty quick / simple tests suggest this update provides something in the range of at least 5-20% speed up even in the simplest fitting cases (which is what I was expecting), but it has an unexpectedly outsized effects on more complex cases - and so in cases with knees and/or more peaks, noise etc - I'm seeing what looks like 2-3X speed increases! I didn't profile in too much detail exactly where this comes from, but I think a bit part is that when we start having multiple Gaussians estimating the Jacobian is slow, but it's actually quick and easy to pass in, and also the tolerance shift can have a notable impact (the other changes are less impactful).
In terms of outputs everything other than the tolerance levels change should have zero effect (meaning should get numerically exactly the same results out). The tolerance change can (though does not necessarily) change results - I've only seen it change beyond what should be notable resolution (eg. changing a peak bandwidth value by ~0.005 or so). If it's indeed limited to the this magnitude of impact, I think it's fine / worth the trade off. Note that the actual value I set (0.00001, replacing the otherwise default of 1e-8) is arbitrary - the higher this gets, more speed increase, but more chance of having more notably impacts. Also, this was added as a private setting, so one can also set it back to match previous default (
fm._tol = 1e-8
) to return to exactly the same as 1.0, or set a different value as needed.Notes
Reviewing
There's not too much changed in the code to check - what would be good is if someone (@ryanhammonds perhaps) could do a quick check to replicate that a) this branch is at least a bit quicker than main, and b) you see no real change in output parameters. I'm also tagging @rdgao because you might be interested in these changes, and/or have some more knowledgeable insight over and above my heuristic updates here.
Addendum: what I learned about
curve_fit
Doing some quick profiling, we spend basically all of our time in
curve_fit
, meaning it's the only real target for meaningful optimization, and within that, we spend the vast majority of our time doing the Gaussian fit (not really the guess procedure, but the curve_fit call itself). Why here? Well, it turns out with no bounds,curve_fit
estimation method ends up as the Levenberg-Marquardt algorithm which is very fast - but does not support bounds. For peaks, we need bounds - adding those pushes the method to default to the Trust Region Reflective ('trf') which is way slower. Notably, there is another option for bounded estimations ('dogbox') - which is sometimes a bit faster, but not consistently enough to try and change it (both methods give the exact same results as far as I can see).Implications of this:
curve_fit
function, with no other meaningful targets for optimization. This PR does what I could figure out to optimize how we callcurve_fit
- but my sense beyond this is that we are hitting a limit of what we can speed up without looking for different / faster implementations of the fitting procedure itself (in particular with bounds).