-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
improve performance with ties #74
Conversation
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## master #74 +/- ##
==========================================
- Coverage 91.98% 85.77% -6.21%
==========================================
Files 2 2
Lines 212 218 +6
==========================================
- Hits 195 187 -8
- Misses 17 31 +14
☔ View full report in Codecov by Sentry. |
@andreasnoack playing around a bit I discovered that things are dramatically faster if you presort instead of repeatedly calling |
When you timed, did you compare to current master or #76? |
The released versions -- I did |
…_circuit_small_number_of_ties
…_circuit_small_number_of_ties
…_circuit_small_number_of_ties
Benchmark Report for /home/runner/work/Loess.jl/Loess.jlJob Properties
ResultsA ratio greater than
Benchmark Group ListHere's a list of all the benchmark groups executed by this job:
Julia versioninfoTarget
Baseline
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ran the timings in #76 (comment). Looks like the two versions are comparable in the random case but that this PR is faster when there are ties so let's go with this one. Might be worth leaving a comment in the code with a link to this issue in case somebody in the future compares the implementation to the paper and thinks there is a potential for speedup via partialsort!
.
src/kd.jl
Outdated
mid = (length(perm) + 1) ÷ 2 | ||
@debug "Candidate median index and median value" mid xs[perm[mid], j] | ||
mid = (length(xjs) + 1) ÷ 2 | ||
@debug "Candidate median index and median value" mid xs[mid, j] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be xjs[mid]
instead of xs[mid, j]
and likewise below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch, yes
done! |
@andreasnoack if you have no further concerns, I'll merge this in about an hour |
Building on #73 (comment),
I tried doing an initial equality check to short-circuit the linear search. For small n, this seems to improve performance (data from #73):and further discussion in this PR, I tried just pre-sorting all values ahead of time. If we assume thatsort
is O(n log n) (and with radix sort, which I believe is the default for floats in recent releases, that's a loose upper bound!), then the sort penalty for the entire array isn't that high. I think part of the motivation for the original piecewisepartialsort!
was that n_piecewise << n_total and so with a bunch of runs, you still have better performance. That didn't turn out to be the case. I don't know if this is due topartialsort!
using a lower performing algorithm or just the need to pass through the array multiple times and thus doing multiple sorting passes or some mixture of the two. It doesn't matter. Pre-sorting seems to greatly speed things up for datasets with ties. I also benchmark random data (so hopefully very few ties) and saw no performance penalty for this approach.Setup
all done in a clean temporary environment
Plotting the big dataset
0.5.4
this PR
Plotting the sinusoid with lots of ties
0.5.4
0.6.1
this PR
version info
Benchmarks on 1000 elements
0.5.4
v0.6.1
this PR
Entire dataset
0.5.4
0.6.1
N/A
this PR
Benchmarks on random data
(i.e. without many ties)
0.5.4
0.6.1
this PR