Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve performance with ties #74

Merged
merged 12 commits into from
Aug 23, 2023
Merged

Conversation

palday
Copy link
Member

@palday palday commented Aug 13, 2023

Building on #73 (comment), I tried doing an initial equality check to short-circuit the linear search. For small n, this seems to improve performance (data from #73): and further discussion in this PR, I tried just pre-sorting all values ahead of time. If we assume that sort is O(n log n) (and with radix sort, which I believe is the default for floats in recent releases, that's a loose upper bound!), then the sort penalty for the entire array isn't that high. I think part of the motivation for the original piecewise partialsort! was that n_piecewise << n_total and so with a bunch of runs, you still have better performance. That didn't turn out to be the case. I don't know if this is due to partialsort! using a lower performing algorithm or just the need to pass through the array multiple times and thus doing multiple sorting passes or some mixture of the two. It doesn't matter. Pre-sorting seems to greatly speed things up for datasets with ties. I also benchmark random data (so hopefully very few ties) and saw no performance penalty for this approach.

Setup

all done in a clean temporary environment

using Arrow
using BenchmarkTools
using Loess
using Random

tbl = Arrow.Table("loess.arrow")

Plotting the big dataset

using AlgebraOfGraphics
using CairoMakie
tbl = Arrow.Table("loess.arrow")
plt = data(tbl) * mapping(:x, :y) * smooth()
save("loess.png", draw(plt))
0.5.4

loess_0 5 4

this PR

loess_sort_first

Plotting the sinusoid with lots of ties

using AlgebraOfGraphics
using CairoMakie
using Loess

x = repeat([π/4*i for i in -20:20], inner=101)
y = sin.(x)
model = loess(x,y; span=0.2)

let fig = Figure(), x = unique(x)
    ax = Axis(fig[1, 1])
    predx = range(minimum(x), stop = maximum(x), length = 500)
    scatter!(ax, predx, predict(model, predx); label="fitted")
    scatter!(ax, x, sin.(x); label="observed")
    axislegend(ax)
    save("sine_0.6.1.png", fig)
    fig
end
0.5.4

sine_0 5 4

0.6.1

sine_0 6 1

this PR

sine_sort_first

version info

julia> versioninfo()
Julia Version 1.9.2
Commit e4ee485e909 (2023-07-05 09:39 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 16 × Intel(R) Xeon(R) E-2288G CPU @ 3.70GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-14.0.6 (ORCJIT, skylake)
  Threads: 1 on 16 virtual cores

Benchmarks on 1000 elements

0.5.4

julia> n = 1000; @benchmark loess($(first(tbl.x, n)), $(first(tbl.y, n)))
BenchmarkTools.Trial: 3056 samples with 1 evaluation.
 Range (min  max):  1.458 ms    6.231 ms  ┊ GC (min  max): 0.00%  72.58%
 Time  (median):     1.531 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   1.634 ms ± 586.090 μs  ┊ GC (mean ± σ):  5.93% ± 11.27%

  ██▃▁▁                                                       ▁
  █████▄▅▆▅▆▅▄▄▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁██▇ █
  1.46 ms      Histogram: log(frequency) by time         5 ms <

 Memory estimate: 2.05 MiB, allocs estimate: 59301.

v0.6.1

julia> n = 1000; @benchmark loess($(first(tbl.x, n)), $(first(tbl.y, n)))
BenchmarkTools.Trial: 102 samples with 1 evaluation.
 Range (min  max):  48.012 ms   56.795 ms  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     49.252 ms               ┊ GC (median):    1.50%
 Time  (mean ± σ):   49.401 ms ± 951.264 μs  ┊ GC (mean ± σ):  1.36% ± 0.56%

             ▂      ▄  ▄▂█  ▂▂▄ ▂▂                              
  ▄▁▁▁▄▄▄▁▄▁▁█▆▆███▄██▆███▆▁███▆██▄▆▄▁▄▆▄▁▆▁▆█▄▁█▄▆▁█▁▁▁▁▄▁▁▁▄ ▄
  48 ms           Histogram: frequency by time         50.9 ms <

 Memory estimate: 38.05 MiB, allocs estimate: 2340200.

this PR

julia> n = 1000; @benchmark loess($(first(tbl.x, n)), $(first(tbl.y, n)))
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min  max):  355.205 μs    2.749 ms  ┊ GC (min  max): 0.00%  83.59%
 Time  (median):     368.688 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   383.268 μs ± 158.739 μs  ┊ GC (mean ± σ):  2.98% ±  6.12%

     ▁▃▇██▇▇██▆▄▃▂▂                  ▁▁▁▁                       ▂
  ▇█████████████████▇▇▇█████▇▇▇▇▇█████████▇███▇▆▆▄▄▅▄▄▄▄▄▄▂▅▃▄▅ █
  355 μs        Histogram: log(frequency) by time        440 μs <

 Memory estimate: 412.83 KiB, allocs estimate: 453.

Entire dataset

0.5.4

julia> ll = @time loess(tbl.x, tbl.y);
  6.950204 seconds (218.62 M allocations: 6.195 GiB, 7.18% gc time)

julia> sort!(collect(ll.kdtree.verts))
12-element Vector{Vector{Float64}}:
 [1.0]
 [4.0]
 [5.0]
 [6.0]
 [7.0]
 [8.0]
 [9.0]
 [10.0]
 [11.0]
 [12.0]
 [13.0]
 [21.0]

0.6.1

N/A

this PR

julia> ll = @time loess(tbl.x, tbl.y);
  2.995456 seconds (120.77 k allocations: 1.184 GiB, 1.97% gc time, 3.45% compilation time)

julia> sort!(collect(ll.kdtree.verts))
10-element Vector{Vector{Float64}}:
 [1.0]
 [4.0]
 [5.0]
 [6.0]
 [7.0]
 [8.0]
 [9.0]
 [10.0]
 [11.0]
 [21.0]

Benchmarks on random data

(i.e. without many ties)

0.5.4

julia> for i in 2:6
           n = 10^i
           x = rand(MersenneTwister(42), n)
           y = sqrt.(x)
           b = @benchmark loess($x, $y)
           @info "" n
           display(b)
       end

┌ Info: 
└   n = 100
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min  max):  320.761 μs    8.552 ms  ┊ GC (min  max): 0.00%  94.90%
 Time  (median):     334.907 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   389.049 μs ± 474.765 μs  ┊ GC (mean ± σ):  7.90% ±  6.17%

  ▁▃▅██▇▆▅▃▂▁▁     ▁ ▁▁                    ▂▃▅▅▃▂▂▂             ▂
  █████████████▇█████████▇██▇▇▆▅▅▆▄▅▅▅▅▆▆▆█████████▇▆▅▆▅▅▅▅▇▆▆▆ █
  321 μs        Histogram: log(frequency) by time        490 μs <

 Memory estimate: 519.67 KiB, allocs estimate: 7455.
┌ Info: 
└   n = 1000
BenchmarkTools.Trial: 1846 samples with 1 evaluation.
 Range (min  max):  2.390 ms  10.500 ms  ┊ GC (min  max): 0.00%  69.27%
 Time  (median):     2.508 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   2.706 ms ±  1.032 ms  ┊ GC (mean ± σ):  7.04% ± 12.38%

  ▇█▂                                                      ▁  
  ████▆▁▇▁▁▁▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁██ █
  2.39 ms      Histogram: log(frequency) by time      8.1 ms <

 Memory estimate: 4.15 MiB, allocs estimate: 80600.
┌ Info: 
└   n = 10000
BenchmarkTools.Trial: 193 samples with 1 evaluation.
 Range (min  max):  23.413 ms  34.036 ms  ┊ GC (min  max): 0.00%  15.06%
 Time  (median):     24.412 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   26.005 ms ±  2.485 ms  ┊ GC (mean ± σ):  6.96% ±  8.32%

     ▃▂▆█▁▅                                     ▂              
  ▃▄▅███████▅▁▁▃▃▃▃▁▁▁▁▁▁▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▄▇██▅▆▅▃▆▃▅▃▃▁▃ ▃
  23.4 ms         Histogram: frequency by time        30.3 ms <

 Memory estimate: 42.53 MiB, allocs estimate: 941293.
┌ Info: 
└   n = 100000
BenchmarkTools.Trial: 16 samples with 1 evaluation.
 Range (min  max):  308.744 ms  326.230 ms  ┊ GC (min  max): 9.58%  8.12%
 Time  (median):     318.931 ms               ┊ GC (median):    9.43%
 Time  (mean ± σ):   318.033 ms ±   5.482 ms  ┊ GC (mean ± σ):  9.40% ± 1.12%

  ▁   ▁ ▁  ▁             ▁   ▁  ▁    █  ▁  ▁      ▁▁▁    ▁    ▁  
  █▁▁▁█▁█▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁█▁▁▁█▁▁█▁▁▁▁█▁▁█▁▁█▁▁▁▁▁▁███▁▁▁▁█▁▁▁▁█ ▁
  309 ms           Histogram: frequency by time          326 ms <

 Memory estimate: 430.37 MiB, allocs estimate: 9810248.
┌ Info: 
└   n = 1000000
BenchmarkTools.Trial: 2 samples with 1 evaluation.
 Range (min  max):  3.572 s     3.817 s  ┊ GC (min  max): 8.26%  7.90%
 Time  (median):     3.694 s               ┊ GC (median):    8.08%
 Time  (mean ± σ):   3.694 s ± 173.271 ms  ┊ GC (mean ± σ):  8.08% ± 0.26%

  █                                                        █  
  █▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█ ▁
  3.57 s         Histogram: frequency by time         3.82 s <

 Memory estimate: 4.25 GiB, allocs estimate: 101252486.

0.6.1

julia> for i in 2:6
           n = 10^i
           x = rand(MersenneTwister(42), n)
           y = sqrt.(x)
           b = @benchmark loess($x, $y)
           @info "" n
           display(b)
       end
┌ Info: 
└   n = 100
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min  max):  115.501 μs   2.125 ms  ┊ GC (min  max): 0.00%  93.09%
 Time  (median):     117.672 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   124.172 μs ± 86.753 μs  ┊ GC (mean ± σ):  3.42% ±  4.59%

   ▆█▇▅▂▁                                                      ▁
  ████████▇▇▆▆▄▄▄▄▄▄▅▅▅▅▆▆▄▄▇▇█▇████▇▇▇▇▇▆▄▄▄▃▄▃▄▃▅▃▅▅▄▄▃▃▄▅▄▅ █
  116 μs        Histogram: log(frequency) by time       162 μs <

 Memory estimate: 114.06 KiB, allocs estimate: 2478.
┌ Info: 
└   n = 1000
BenchmarkTools.Trial: 4777 samples with 1 evaluation.
 Range (min  max):  978.490 μs    2.443 ms  ┊ GC (min  max): 0.00%  55.42%
 Time  (median):       1.007 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):     1.044 ms ± 179.361 μs  ┊ GC (mean ± σ):  2.32% ±  7.44%

  ▄█▅▄▁▁                                                        ▁
  ████████▇▇█▅▆▁▄▁▄▁▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▄▇█ █
  978 μs        Histogram: log(frequency) by time       2.25 ms <

 Memory estimate: 918.16 KiB, allocs estimate: 29710.
┌ Info: 
└   n = 10000
BenchmarkTools.Trial: 467 samples with 1 evaluation.
 Range (min  max):  10.211 ms   12.664 ms  ┊ GC (min  max): 0.00%  7.23%
 Time  (median):     10.558 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   10.723 ms ± 415.142 μs  ┊ GC (mean ± σ):  1.87% ± 3.34%

        ▂▂ ▄▇▂▄▃▁▃█                                             
  ▂▂▁▃▄▇████████████▅▅▅▄▃▃▂▂▁▁▂▁▁▁▁▁▁▁▁▁▁▁▂▁▁▄▄▆▃▃▃▄▅▄▅▅▅▄▄▃▄▄ ▃
  10.2 ms         Histogram: frequency by time         11.6 ms <

 Memory estimate: 9.52 MiB, allocs estimate: 353432.
┌ Info: 
└   n = 100000
BenchmarkTools.Trial: 39 samples with 1 evaluation.
 Range (min  max):  124.520 ms  142.043 ms  ┊ GC (min  max): 1.45%  2.12%
 Time  (median):     129.131 ms               ┊ GC (median):    2.15%
 Time  (mean ± σ):   129.727 ms ±   3.977 ms  ┊ GC (mean ± σ):  2.09% ± 0.39%

  ▁  ▁ █      ▁ ▁    ▁▁  ▄            ▁                          
  █▆▆█▆█▆▆▁▁▁▆█▁█▆▆▁▆██▁▆█▁▁▁▆▆▁▁▆▆▆▁▆█▁▁▁▁▁▁▁▆▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▆ ▁
  125 ms           Histogram: frequency by time          142 ms <

 Memory estimate: 97.23 MiB, allocs estimate: 3682381.
┌ Info: 
└   n = 1000000
BenchmarkTools.Trial: 3 samples with 1 evaluation.
 Range (min  max):  1.882 s    1.916 s  ┊ GC (min  max): 1.03%  1.62%
 Time  (median):     1.891 s              ┊ GC (median):    1.58%
 Time  (mean ± σ):   1.896 s ± 17.801 ms  ┊ GC (mean ± σ):  1.41% ± 0.33%

  █             █                                         █  
  █▁▁▁▁▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█ ▁
  1.88 s         Histogram: frequency by time        1.92 s <

 Memory estimate: 1021.80 MiB, allocs estimate: 40087669.

this PR

julia> for i in 2:6
           n = 10^i
           x = rand(MersenneTwister(42), n)
           y = sqrt.(x)
           b = @benchmark loess($x, $y)
           @info "" n
           display(b)
       end
┌ Info: 
└   n = 100
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min  max):  62.760 μs    5.021 ms  ┊ GC (min  max): 0.00%  97.23%
 Time  (median):     65.091 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   73.761 μs ± 129.748 μs  ┊ GC (mean ± σ):  4.92% ±  2.76%

  ▂▇█▇▆▄▂▁             ▂▄▅▅▅▅▄▃▂▂▁▁                            ▂
  █████████▇▆▇▇▇▇▇▇▇▆▇███████████████▇▇▆▆▆▆▇▅▅▆▅▅▆▅▅▄▄▆▅▅▅▅▄▂▆ █
  62.8 μs       Histogram: log(frequency) by time      98.7 μs <

 Memory estimate: 79.91 KiB, allocs estimate: 467.
┌ Info: 
└   n = 1000
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min  max):  449.035 μs    3.317 ms  ┊ GC (min  max): 0.00%  82.10%
 Time  (median):     467.707 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   482.946 μs ± 176.674 μs  ┊ GC (mean ± σ):  2.42% ±  5.62%

   ▃▂▂▂▆█▇▆▅▆▇▆▆▅▄▃▂▁▁▁   ▁▁▁▁▁▁                                ▂
  █████████████████████████████████▇▆▆▆▆▅▅▅▅▅▆▅▄▄▅▅▄▅▄▃▄▃▂▄▂▃▂▂ █
  449 μs        Histogram: log(frequency) by time        564 μs <

 Memory estimate: 437.19 KiB, allocs estimate: 477.
┌ Info: 
└   n = 10000
BenchmarkTools.Trial: 917 samples with 1 evaluation.
 Range (min  max):  5.181 ms    9.019 ms  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     5.351 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   5.452 ms ± 295.416 μs  ┊ GC (mean ± σ):  0.84% ± 3.41%

       ▁█▅ ▁                                                   
  ▃▄▇▇▇█████▇▄▃▃▃▃▃▄▄▃▄▃▄▄▄▃▃▂▂▁▁▂▂▁▂▂▂▁▁▂▂▁▁▁▁▂▁▁▁▁▂▁▂▂▂▂▃▃▂ ▃
  5.18 ms         Histogram: frequency by time        6.46 ms <

 Memory estimate: 3.91 MiB, allocs estimate: 510.
┌ Info: 
└   n = 100000
BenchmarkTools.Trial: 63 samples with 1 evaluation.
 Range (min  max):  72.733 ms  93.711 ms  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     79.146 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   80.258 ms ±  4.298 ms  ┊ GC (mean ± σ):  0.52% ± 0.61%

               █▂▄▄                                            
  ▄▄▄▄▁▁▁▁▁▄▆▄▄████▄█▆▆▆▆▁▄█▄▆▄▆▁▄▁▄▄▆▄▁▄▁▁▁▁▁▁▁▁▄▁▄▁▁▁▁▁▁▁▁▄ ▁
  72.7 ms         Histogram: frequency by time        93.5 ms <

 Memory estimate: 38.76 MiB, allocs estimate: 510.
┌ Info: 
└   n = 1000000
BenchmarkTools.Trial: 4 samples with 1 evaluation.
 Range (min  max):  1.305 s    1.370 s  ┊ GC (min  max): 0.22%  0.37%
 Time  (median):     1.340 s              ┊ GC (median):    0.26%
 Time  (mean ± σ):   1.338 s ± 30.105 ms  ┊ GC (mean ± σ):  0.26% ± 0.10%

  █             █                               █         █  
  █▁▁▁▁▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁█ ▁
  1.3 s          Histogram: frequency by time        1.37 s <

@codecov-commenter
Copy link

codecov-commenter commented Aug 13, 2023

Codecov Report

Patch coverage: 54.83% and project coverage change: -6.21% ⚠️

Comparison is base (5126a74) 91.98% compared to head (35202d2) 85.77%.

Additional details and impacted files
@@            Coverage Diff             @@
##           master      #74      +/-   ##
==========================================
- Coverage   91.98%   85.77%   -6.21%     
==========================================
  Files           2        2              
  Lines         212      218       +6     
==========================================
- Hits          195      187       -8     
- Misses         17       31      +14     
Files Changed Coverage Δ
src/kd.jl 83.48% <54.83%> (-12.64%) ⬇️

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@palday palday requested a review from andreasnoack August 13, 2023 20:27
src/kd.jl Outdated Show resolved Hide resolved
@palday
Copy link
Member Author

palday commented Aug 15, 2023

@andreasnoack playing around a bit I discovered that things are dramatically faster if you presort instead of repeatedly calling partialsort!. Doing this, I can leave the original linear search in place (for now) and still see a dramatic performance increase. I'll update the benchmarks later, but it passes the sinusoid example you proposed and gives a result much closer to 0.5.4 in the plot.

@palday palday requested a review from andreasnoack August 16, 2023 03:45
@andreasnoack
Copy link
Member

When you timed, did you compare to current master or #76?

@palday
Copy link
Member Author

palday commented Aug 21, 2023

When you timed, did you compare to current master or #76?

The released versions -- I did add Loess@version

@github-actions
Copy link

github-actions bot commented Aug 22, 2023

Benchmark Report for /home/runner/work/Loess.jl/Loess.jl

Job Properties

  • Time of benchmarks:
    • Target: 23 Aug 2023 - 20:43
    • Baseline: 23 Aug 2023 - 20:44
  • Package commits:
    • Target: a03a45
    • Baseline: 5126a7
  • Julia commits:
    • Target: e4ee48
    • Baseline: e4ee48
  • Julia command flags:
    • Target: None
    • Baseline: -Cnative,-J/opt/hostedtoolcache/julia/1.9.2/x64/lib/julia/sys.so,-g1,-O3,-e,using Pkg; Pkg.update()
  • Environment variables:
    • Target: None
    • Baseline: None

Results

A ratio greater than 1.0 denotes a possible regression (marked with ❌), while a ratio less
than 1.0 denotes a possible improvement (marked with ✅). Only significant results - results
that indicate possible regressions or improvements - are shown below (thus, an empty table means that all
benchmark results remained invariant between builds).

ID time ratio memory ratio
["random", "100"] 0.52 (5%) ✅ 0.95 (1%) ✅
["random", "1000"] 0.44 (5%) ✅ 0.68 (1%) ✅
["random", "10000"] 0.49 (5%) ✅ 0.44 (1%) ✅
["random", "100000"] 0.53 (5%) ✅ 0.40 (1%) ✅
["random", "1000000"] 0.57 (5%) ✅ 0.38 (1%) ✅
["ties", "sine"] 0.01 (5%) ✅ 0.02 (1%) ✅

Benchmark Group List

Here's a list of all the benchmark groups executed by this job:

  • ["random"]
  • ["ties"]

Julia versioninfo

Target

Julia Version 1.9.2
Commit e4ee485e909 (2023-07-05 09:39 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
      Ubuntu 22.04.3 LTS
  uname: Linux 5.15.0-1041-azure #48-Ubuntu SMP Tue Jun 20 20:34:08 UTC 2023 x86_64 x86_64
  CPU: Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz: 
              speed         user         nice          sys         idle          irq
       #1  2593 MHz       1084 s          0 s        100 s       1112 s          0 s
       #2  2593 MHz        504 s          0 s        120 s       1654 s          0 s
  Memory: 6.7694854736328125 GB (5095.03515625 MB free)
  Uptime: 234.43 sec
  Load Avg:  1.0  0.69  0.31
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-14.0.6 (ORCJIT, skylake-avx512)
  Threads: 1 on 2 virtual cores

Baseline

Julia Version 1.9.2
Commit e4ee485e909 (2023-07-05 09:39 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
      Ubuntu 22.04.3 LTS
  uname: Linux 5.15.0-1041-azure #48-Ubuntu SMP Tue Jun 20 20:34:08 UTC 2023 x86_64 x86_64
  CPU: Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz: 
              speed         user         nice          sys         idle          irq
       #1  2593 MHz       1442 s          0 s        121 s       1517 s          0 s
       #2  2593 MHz        944 s          0 s        139 s       1979 s          0 s
  Memory: 6.7694854736328125 GB (5510.23828125 MB free)
  Uptime: 312.99 sec
  Load Avg:  1.05  0.8  0.38
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-14.0.6 (ORCJIT, skylake-avx512)
  Threads: 1 on 2 virtual cores

Copy link
Member

@andreasnoack andreasnoack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ran the timings in #76 (comment). Looks like the two versions are comparable in the random case but that this PR is faster when there are ties so let's go with this one. Might be worth leaving a comment in the code with a link to this issue in case somebody in the future compares the implementation to the paper and thinks there is a potential for speedup via partialsort!.

src/kd.jl Outdated
mid = (length(perm) + 1) ÷ 2
@debug "Candidate median index and median value" mid xs[perm[mid], j]
mid = (length(xjs) + 1) ÷ 2
@debug "Candidate median index and median value" mid xs[mid, j]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be xjs[mid] instead of xs[mid, j] and likewise below?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch, yes

@palday
Copy link
Member Author

palday commented Aug 23, 2023

Ran the timings in #76 (comment). Looks like the two versions are comparable in the random case but that this PR is faster when there are ties so let's go with this one. Might be worth leaving a comment in the code with a link to this issue in case somebody in the future compares the implementation to the paper and thinks there is a potential for speedup via partialsort!.

done!

@palday
Copy link
Member Author

palday commented Aug 23, 2023

@andreasnoack if you have no further concerns, I'll merge this in about an hour

@palday palday merged commit 33a4cd3 into master Aug 23, 2023
@palday palday deleted the pa/short_circuit_small_number_of_ties branch August 23, 2023 21:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants