warm up at the beginning of every measurement #330

willow-ahrens · 2023-08-29T21:50:32Z

fixes #291

willow-ahrens · 2023-08-31T02:35:34Z

@gdalle I think this one is ready for review!

gdalle · 2023-09-05T07:03:49Z

Thanks @willow-ahrens! I'm a bit swamped at the moment but I haven't forgotten this, it's on my to-do list and then the 2.0 will probably follow

test/ExecutionTests.jl

-@test p.evals == 1
-@test p.gctrial == false
-@test p.gcsample == false
+@test (@belapsed needs_warm() seconds = 1) < 1


Zentrik · 2023-12-30T12:01:17Z

Looks good to me

Zentrik

LGTM

willow-ahrens · 2024-01-02T16:02:54Z

@gdalle Hi there! Given this is on the release milestone and has an approving review, I'm sending a friendly ping your way. Thanks for all your work on this repo 😊

gdalle · 2024-01-03T08:57:10Z

Hi @willow-ahrens and thanks for the ping.
Since this is a breaking change I don't want to take the merge responsibility alone, especially because

I have decided to step back from this maintenance task (which I had never really asked for ^^)
I don't understand the internals enough to make an informed decision

However, I do wanna point out that this would affect the behavior of mutating functions, i.e. those for which we set evals=1 and actually want that to be true. If we make warm up the default, then a second eval is unavoidable?

willow-ahrens · 2024-01-03T13:51:32Z

Thanks for the prompt response! I believe not warming up is a correctness issue, especially since the package usually warms up but unexpectedly doesn't in certain circumstances. This is an important package in the Julia ecosystem so I'd like to see some fix to the issue. Perhaps we can ask someone else for help reviewing?

To answer your specific question:

evals is the number of times we run the function in each sample. https://github.com/JuliaCI/BenchmarkTools.jl/blob/42155f726e8da464baa5d917938a6d38fe20f7ae/src/execution.jl#L557C1-L559C20

When we call @btime, the function is first warmed up, then tuned, then run, so it would be executed at least 3 times. When we do b = @benchmarkable ...; run(b), b is executed at least once. You're correct that I would be changing this behavior to execute b at least twice, once to warm up and once to measure. That's the behavior the documentation would lead one to believe is already in place, and best practice for benchmarking.

gdalle · 2024-01-03T13:57:23Z

You're correct that I would be changing this behavior to execute b at least twice, once to warm up and once to measure.

My issue is the case where the function destroys its input, so that only one evaluation is possible. Consider the following example:

julia> using BenchmarkTools

julia> @benchmark pop!(x) setup=(x=[0]) evals=1
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  34.000 ns … 425.000 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     42.000 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   42.534 ns ±   7.141 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

     ▃ ▄▅ ▆ ▇▇ █ ▇ ▅▄ ▃ ▁▁ ▁                  ▁ ▁              ▂
  ▄█▁█▁██▁█▁██▁█▁█▁██▁█▁██▁█▁▇▁▆▆▁▄▁▆▇▁█▁▇▁██▁█▁██▁█▁▇▁▇▇▁▇▁▇▆ █
  34 ns         Histogram: log(frequency) by time        69 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

Until my PR #318 this used to error. With BenchmarkTools 1.4 it doesn't. With automatic warmup... it would again?

I agree this is a rather niche case, but there were 4 issues closed by that PR so I assume people have run into it

Zentrik · 2024-01-03T14:15:01Z

Your example works with this pr, as we run a full sample to warmup (which includes the setup and teardown) not a single eval.

As for the part about this being a breaking change, would not removing warmup fix that. I don't see any pressing reason to remove it.

Thanks for all your work on this package.

willow-ahrens · 2024-01-03T14:18:37Z

I understand your concern. However, I have tested this PR with your function and it does not error:

(@v1.9) pkg> dev /Users/willow/Projects/BenchmarkTools.jl/
   Resolving package versions...
  No Changes to `~/.julia/environments/v1.9/Project.toml`
  No Changes to `~/.julia/environments/v1.9/Manifest.toml`

julia> using BenchmarkTools

julia> @benchmark pop!(x) setup=(x=[0]) evals=1
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):   0.001 ns … 167.000 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):      0.001 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   12.841 ns ±  19.438 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █                                                             
  █▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▄▆ ▂
  0.001 ns        Histogram: frequency by time           42 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

This is because my "warmup" step just re-uses the existing BenchmarkTools samplefunc that gets generated to benchmark the function, and samplefunc calls setup.

It's somewhat opaque because there is metaprogramming involved, but you can see setup pasted into the samplefunc definition here:

BenchmarkTools.jl/src/execution.jl

Line 548 in c655edb

$(setup)

In BenchmarkTools.jl, we can understand the current run(@benchmarkable ...) behavior with the following pseudocode:

function samplefunc()
    setup()
    t = @elapsed begin
        for _ = 1:evals
            f()
        end
    end
    return t
end

for _ = 1:samples
    push!(measurements, samplefunc())
end

This PR is a petition to change the last three lines to:

samplefunc() #warmup
for _ = 1:samples
    push!(measurements, samplefunc())
end

willow-ahrens · 2024-01-03T14:20:23Z

@Zentrik I agree, we don't need to remove warmup. We may want to issue a deprecation though? Are deprecations breaking?

gdalle · 2024-01-03T17:03:01Z

Thanks for the clarifications, I had missed the setup phase in samplefunc. In that case, I think if we keep (but possibly deprecate) warmup, the PR is non-breaking and a strict improvement. If @Zentrik is confident in their ability to review the changes, I'll do a surface inspection myself and then probably merge

gdalle · 2024-01-03T17:03:50Z

We may want to issue a deprecation though? Are deprecations breaking?

According to SciML ColPrac they are not

https://docs.sciml.ai/ColPrac/stable/#Incrementing-the-package-version

willow-ahrens · 2024-01-04T19:32:00Z

Okay, looks like this is all set then!

Zentrik · 2024-01-04T19:35:47Z

Cool, I'll try and test this out tomorrow.

willow-ahrens · 2024-01-09T21:49:36Z

 Cool, I might make a pr to skip the second warmup then if we know tune! has been called.

It may be difficult to know that it has been called recently. The reason we always warm up is that running other code in between makes the benchmark go cold.

Zentrik · 2024-01-09T21:51:30Z

I figure for @benchmark we could just pass a boolean to skip the warmup or something when calling run. But I'll have a think.

willow-ahrens · 2024-01-09T22:03:22Z

@gdalle okay, I think we're ready for your review, whenever you find time. Thanks for your help with this!

src/execution.jl

test/ExecutionTests.jl

willow-ahrens · 2024-01-17T17:20:04Z

I figure for @benchmark we could just pass a boolean to skip the warmup or something when calling run. But I'll have a think.

@Zentrik I think that to maintain semantics with how @bprofile used to work, we may want to do that boolean in this PR.

Zentrik · 2024-01-18T22:48:53Z

I'm not sure this pr currently changes the semantic of @bprofile as it doesn't seem to me it should matter if there's an extra run in the profile.
If you think its problematic feel free to fix it. Happy to do it myself/ remove all redundant warmups in a separate pr or in this one if you wish.

willow-ahrens · 2024-01-22T19:07:02Z

@gdalle would it be breaking to add a boolean warmup to the parameters struct with a default value of true? What about adding the boolean to the run function as a kwarg?

gdalle · 2024-01-23T07:23:49Z

Neither of those is breaking as long as they don't change the semantics of existing code that does not use them

willow-ahrens · 2024-01-23T21:12:52Z

In that case, @Zentrik if you want to give it a go adding a boolean to the run kwargs for warmup, we could use that here to maintain semantics in bprofile.

willow-ahrens · 2024-01-29T19:14:02Z

Okay, I added a warmup flag to run that allows us to skip warmups in certain limited scenarios, meaning that this PR preserves as much of the old execution semantics as possible. This PR is non-breaking and ready for review.

gdalle · 2024-01-30T07:12:03Z

gotta love the commit messages ;) ping me when this is ready!

willow-ahrens · 2024-01-30T13:51:56Z

@gdalle this one's ready! it's passing tests, but the icon is fail because the changes only reached 90% coverage rather than 91%.

Zentrik · 2024-01-30T13:52:32Z

Thanks, sorry I've just been a bit busy and hadn't got around to this.

gdalle

Thanks for the contribution! I think most of the remarks stem from my own lack of understanding, but once we fix that it looks good to go

src/execution.jl

test/ExecutionTests.jl

willow-ahrens · 2024-02-02T14:37:56Z

@gdalle I responded to your comments. I don't think anything needs to change, but I'm happy to change anything you would like, just let me know.

gdalle · 2024-02-21T12:52:27Z

Sorry this has stalled, trying to get it merged and release this week

test/ExecutionTests.jl

gdalle · 2024-02-27T08:43:07Z

Thank you so much for your work and patience, and sorry for the delay. This is a really important change and I reviewed it to the best of my ability.
Julia users of the future, if everything collapsed, this is where it started

willow-ahrens and others added 3 commits August 29, 2023 17:49

warm up at the beginning of every measurement.

a83af79

Merge remote-tracking branch 'juliaci/master' into wma/warmup

4a16f74

use JuliaFormatter(Benchmarktools; overwrite=true, style=BlueStyle())

c655edb

gdalle added this to the v2.0 milestone Sep 18, 2023

gdalle added the enhancement label Sep 18, 2023

Zentrik reviewed Dec 30, 2023

View reviewed changes

test/ExecutionTests.jl Show resolved Hide resolved

Zentrik reviewed Dec 30, 2023

View reviewed changes

test/ExecutionTests.jl

@test p.evals == 1

@test p.gctrial == false

@test p.gcsample == false

@test (@belapsed needs_warm() seconds = 1) < 1

This comment was marked as resolved.

Sign in to view

willow-ahrens requested a review from Zentrik December 31, 2023 13:35

Zentrik approved these changes Dec 31, 2023

View reviewed changes

Merge remote-tracking branch 'juliaci/master' into wma/warmup

214793a

willow-ahrens added 3 commits January 4, 2024 14:10

Merge branch 'master' into wma/warmup

3760a40

deprecate warmup instead of remove

4560ec2

JuliaFormatter

8665689

willow-ahrens added 3 commits January 4, 2024 14:51

ensure that only one eval is used for the warmup, test deprecations

f796677

format

a37cfd8

export warmup

90e4470

with formatting

e1e637f

Zentrik approved these changes Jan 9, 2024

View reviewed changes

hmm

d624c26

gdalle reviewed Jan 17, 2024

View reviewed changes

src/execution.jl Show resolved Hide resolved

test/ExecutionTests.jl Show resolved Hide resolved

test/ExecutionTests.jl Show resolved Hide resolved

willow-ahrens added 3 commits January 29, 2024 14:03

add a boolean to run kwargs

5280f8b

cool

9c75e9e

warmup in the case evals are unspecified.

824c0dd

willow-ahrens requested a review from gdalle January 29, 2024 19:14

willow-ahrens added 2 commits January 29, 2024 14:28

clean up regex to optionally match gensymed defs from Julia 1.0

71c9335

oops, rusty regex

9046b1e

gdalle reviewed Feb 2, 2024

View reviewed changes

src/execution.jl Show resolved Hide resolved

src/execution.jl Show resolved Hide resolved

src/execution.jl Show resolved Hide resolved

test/ExecutionTests.jl Show resolved Hide resolved

test/ExecutionTests.jl Show resolved Hide resolved

test/ExecutionTests.jl Show resolved Hide resolved

willow-ahrens requested a review from gdalle February 15, 2024 13:58

gdalle mentioned this pull request Feb 22, 2024

Change of warmup behavior in BenchmarkTools awadell1/PkgJogger.jl#70

Closed

gdalle approved these changes Feb 27, 2024

View reviewed changes

test/ExecutionTests.jl Show resolved Hide resolved

gdalle merged commit d564ee7 into JuliaCI:master Feb 27, 2024
7 of 9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

warm up at the beginning of every measurement #330

warm up at the beginning of every measurement #330

willow-ahrens commented Aug 29, 2023

willow-ahrens commented Aug 31, 2023

gdalle commented Sep 5, 2023

This comment was marked as resolved.

Zentrik commented Dec 30, 2023

Zentrik left a comment

willow-ahrens commented Jan 2, 2024

gdalle commented Jan 3, 2024 •

edited

Loading

willow-ahrens commented Jan 3, 2024

gdalle commented Jan 3, 2024 •

edited

Loading

Zentrik commented Jan 3, 2024 •

edited

Loading

willow-ahrens commented Jan 3, 2024

willow-ahrens commented Jan 3, 2024 •

edited

Loading

gdalle commented Jan 3, 2024

gdalle commented Jan 3, 2024

willow-ahrens commented Jan 4, 2024

Zentrik commented Jan 4, 2024

willow-ahrens commented Jan 9, 2024 •

edited

Loading

Zentrik commented Jan 9, 2024

willow-ahrens commented Jan 9, 2024 •

edited

Loading

willow-ahrens commented Jan 17, 2024

Zentrik commented Jan 18, 2024 •

edited

Loading

willow-ahrens commented Jan 22, 2024

gdalle commented Jan 23, 2024

willow-ahrens commented Jan 23, 2024

willow-ahrens commented Jan 29, 2024 •

edited

Loading

gdalle commented Jan 30, 2024

willow-ahrens commented Jan 30, 2024

Zentrik commented Jan 30, 2024

gdalle left a comment

willow-ahrens commented Feb 2, 2024

gdalle commented Feb 21, 2024

gdalle commented Feb 27, 2024

warm up at the beginning of every measurement #330

warm up at the beginning of every measurement #330

Conversation

willow-ahrens commented Aug 29, 2023

willow-ahrens commented Aug 31, 2023

gdalle commented Sep 5, 2023

This comment was marked as resolved.

Zentrik commented Dec 30, 2023

Zentrik left a comment

Choose a reason for hiding this comment

willow-ahrens commented Jan 2, 2024

gdalle commented Jan 3, 2024 • edited Loading

willow-ahrens commented Jan 3, 2024

gdalle commented Jan 3, 2024 • edited Loading

Zentrik commented Jan 3, 2024 • edited Loading

willow-ahrens commented Jan 3, 2024

willow-ahrens commented Jan 3, 2024 • edited Loading

gdalle commented Jan 3, 2024

gdalle commented Jan 3, 2024

willow-ahrens commented Jan 4, 2024

Zentrik commented Jan 4, 2024

willow-ahrens commented Jan 9, 2024 • edited Loading

Zentrik commented Jan 9, 2024

willow-ahrens commented Jan 9, 2024 • edited Loading

willow-ahrens commented Jan 17, 2024

Zentrik commented Jan 18, 2024 • edited Loading

willow-ahrens commented Jan 22, 2024

gdalle commented Jan 23, 2024

willow-ahrens commented Jan 23, 2024

willow-ahrens commented Jan 29, 2024 • edited Loading

gdalle commented Jan 30, 2024

willow-ahrens commented Jan 30, 2024

Zentrik commented Jan 30, 2024

gdalle left a comment

Choose a reason for hiding this comment

willow-ahrens commented Feb 2, 2024

gdalle commented Feb 21, 2024

gdalle commented Feb 27, 2024

gdalle commented Jan 3, 2024 •

edited

Loading

gdalle commented Jan 3, 2024 •

edited

Loading

Zentrik commented Jan 3, 2024 •

edited

Loading

willow-ahrens commented Jan 3, 2024 •

edited

Loading

willow-ahrens commented Jan 9, 2024 •

edited

Loading

willow-ahrens commented Jan 9, 2024 •

edited

Loading

Zentrik commented Jan 18, 2024 •

edited

Loading

willow-ahrens commented Jan 29, 2024 •

edited

Loading