Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

warm up at the beginning of every measurement #330

Merged
merged 17 commits into from
Feb 27, 2024

Conversation

willow-ahrens
Copy link
Collaborator

fixes #291

@willow-ahrens
Copy link
Collaborator Author

@gdalle I think this one is ready for review!

@gdalle
Copy link
Collaborator

gdalle commented Sep 5, 2023

Thanks @willow-ahrens! I'm a bit swamped at the moment but I haven't forgotten this, it's on my to-do list and then the 2.0 will probably follow

@gdalle gdalle added this to the v2.0 milestone Sep 18, 2023
@test p.evals == 1
@test p.gctrial == false
@test p.gcsample == false
@test (@belapsed needs_warm() seconds = 1) < 1

This comment was marked as resolved.

@Zentrik
Copy link
Contributor

Zentrik commented Dec 30, 2023

Looks good to me

Copy link
Contributor

@Zentrik Zentrik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@willow-ahrens
Copy link
Collaborator Author

@gdalle Hi there! Given this is on the release milestone and has an approving review, I'm sending a friendly ping your way. Thanks for all your work on this repo 😊

@gdalle
Copy link
Collaborator

gdalle commented Jan 3, 2024

Hi @willow-ahrens and thanks for the ping.
Since this is a breaking change I don't want to take the merge responsibility alone, especially because

  • I have decided to step back from this maintenance task (which I had never really asked for ^^)
  • I don't understand the internals enough to make an informed decision

However, I do wanna point out that this would affect the behavior of mutating functions, i.e. those for which we set evals=1 and actually want that to be true. If we make warm up the default, then a second eval is unavoidable?

@willow-ahrens
Copy link
Collaborator Author

Thanks for the prompt response! I believe not warming up is a correctness issue, especially since the package usually warms up but unexpectedly doesn't in certain circumstances. This is an important package in the Julia ecosystem so I'd like to see some fix to the issue. Perhaps we can ask someone else for help reviewing?

To answer your specific question:

evals is the number of times we run the function in each sample. https://github.com/JuliaCI/BenchmarkTools.jl/blob/42155f726e8da464baa5d917938a6d38fe20f7ae/src/execution.jl#L557C1-L559C20

When we call @btime, the function is first warmed up, then tuned, then run, so it would be executed at least 3 times. When we do b = @benchmarkable ...; run(b), b is executed at least once. You're correct that I would be changing this behavior to execute b at least twice, once to warm up and once to measure. That's the behavior the documentation would lead one to believe is already in place, and best practice for benchmarking.

@gdalle
Copy link
Collaborator

gdalle commented Jan 3, 2024

You're correct that I would be changing this behavior to execute b at least twice, once to warm up and once to measure.

My issue is the case where the function destroys its input, so that only one evaluation is possible. Consider the following example:

julia> using BenchmarkTools

julia> @benchmark pop!(x) setup=(x=[0]) evals=1
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min  max):  34.000 ns  425.000 ns  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     42.000 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   42.534 ns ±   7.141 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

     ▃ ▄▅ ▆ ▇▇ █ ▇ ▅▄ ▃ ▁▁ ▁                  ▁ ▁              ▂
  ▄█▁█▁██▁█▁██▁█▁█▁██▁█▁██▁█▁▇▁▆▆▁▄▁▆▇▁█▁▇▁██▁█▁██▁█▁▇▁▇▇▁▇▁▇▆ █
  34 ns         Histogram: log(frequency) by time        69 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

Until my PR #318 this used to error. With BenchmarkTools 1.4 it doesn't. With automatic warmup... it would again?

I agree this is a rather niche case, but there were 4 issues closed by that PR so I assume people have run into it

@Zentrik
Copy link
Contributor

Zentrik commented Jan 3, 2024

Your example works with this pr, as we run a full sample to warmup (which includes the setup and teardown) not a single eval.

As for the part about this being a breaking change, would not removing warmup fix that. I don't see any pressing reason to remove it.

Thanks for all your work on this package.

@willow-ahrens
Copy link
Collaborator Author

I understand your concern. However, I have tested this PR with your function and it does not error:

(@v1.9) pkg> dev /Users/willow/Projects/BenchmarkTools.jl/
   Resolving package versions...
  No Changes to `~/.julia/environments/v1.9/Project.toml`
  No Changes to `~/.julia/environments/v1.9/Manifest.toml`

julia> using BenchmarkTools

julia> @benchmark pop!(x) setup=(x=[0]) evals=1
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):   0.001 ns … 167.000 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):      0.001 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   12.841 ns ±  19.438 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █                                                             
  █▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▄▆ ▂
  0.001 ns        Histogram: frequency by time           42 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

This is because my "warmup" step just re-uses the existing BenchmarkTools samplefunc that gets generated to benchmark the function, and samplefunc calls setup.

It's somewhat opaque because there is metaprogramming involved, but you can see setup pasted into the samplefunc definition here:

In BenchmarkTools.jl, we can understand the current run(@benchmarkable ...) behavior with the following pseudocode:

function samplefunc()
    setup()
    t = @elapsed begin
        for _ = 1:evals
            f()
        end
    end
    return t
end

for _ = 1:samples
    push!(measurements, samplefunc())
end

This PR is a petition to change the last three lines to:

samplefunc() #warmup
for _ = 1:samples
    push!(measurements, samplefunc())
end

@willow-ahrens
Copy link
Collaborator Author

willow-ahrens commented Jan 3, 2024

@Zentrik I agree, we don't need to remove warmup. We may want to issue a deprecation though? Are deprecations breaking?

@gdalle
Copy link
Collaborator

gdalle commented Jan 3, 2024

Thanks for the clarifications, I had missed the setup phase in samplefunc. In that case, I think if we keep (but possibly deprecate) warmup, the PR is non-breaking and a strict improvement. If @Zentrik is confident in their ability to review the changes, I'll do a surface inspection myself and then probably merge

@gdalle
Copy link
Collaborator

gdalle commented Jan 3, 2024

We may want to issue a deprecation though? Are deprecations breaking?

According to SciML ColPrac they are not

https://docs.sciml.ai/ColPrac/stable/#Incrementing-the-package-version

@willow-ahrens
Copy link
Collaborator Author

Okay, looks like this is all set then!

@Zentrik
Copy link
Contributor

Zentrik commented Jan 4, 2024

Cool, I'll try and test this out tomorrow.

@willow-ahrens
Copy link
Collaborator Author

willow-ahrens commented Jan 9, 2024

 Cool, I might make a pr to skip the second warmup then if we know tune! has been called.

It may be difficult to know that it has been called recently. The reason we always warm up is that running other code in between makes the benchmark go cold.

@Zentrik
Copy link
Contributor

Zentrik commented Jan 9, 2024

I figure for @benchmark we could just pass a boolean to skip the warmup or something when calling run. But I'll have a think.

@willow-ahrens
Copy link
Collaborator Author

willow-ahrens commented Jan 9, 2024

@gdalle okay, I think we're ready for your review, whenever you find time. Thanks for your help with this!

src/execution.jl Show resolved Hide resolved
test/ExecutionTests.jl Show resolved Hide resolved
test/ExecutionTests.jl Show resolved Hide resolved
@willow-ahrens
Copy link
Collaborator Author

I figure for @benchmark we could just pass a boolean to skip the warmup or something when calling run. But I'll have a think.

@Zentrik I think that to maintain semantics with how @bprofile used to work, we may want to do that boolean in this PR.

@Zentrik
Copy link
Contributor

Zentrik commented Jan 18, 2024

I'm not sure this pr currently changes the semantic of @bprofile as it doesn't seem to me it should matter if there's an extra run in the profile.
If you think its problematic feel free to fix it. Happy to do it myself/ remove all redundant warmups in a separate pr or in this one if you wish.

@willow-ahrens
Copy link
Collaborator Author

@gdalle would it be breaking to add a boolean warmup to the parameters struct with a default value of true? What about adding the boolean to the run function as a kwarg?

@gdalle
Copy link
Collaborator

gdalle commented Jan 23, 2024

Neither of those is breaking as long as they don't change the semantics of existing code that does not use them

@willow-ahrens
Copy link
Collaborator Author

In that case, @Zentrik if you want to give it a go adding a boolean to the run kwargs for warmup, we could use that here to maintain semantics in bprofile.

@willow-ahrens
Copy link
Collaborator Author

willow-ahrens commented Jan 29, 2024

Okay, I added a warmup flag to run that allows us to skip warmups in certain limited scenarios, meaning that this PR preserves as much of the old execution semantics as possible. This PR is non-breaking and ready for review.

@willow-ahrens willow-ahrens requested a review from gdalle January 29, 2024 19:14
@gdalle
Copy link
Collaborator

gdalle commented Jan 30, 2024

gotta love the commit messages ;) ping me when this is ready!

@willow-ahrens
Copy link
Collaborator Author

@gdalle this one's ready! it's passing tests, but the icon is fail because the changes only reached 90% coverage rather than 91%.

@Zentrik
Copy link
Contributor

Zentrik commented Jan 30, 2024

Thanks, sorry I've just been a bit busy and hadn't got around to this.

Copy link
Collaborator

@gdalle gdalle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution! I think most of the remarks stem from my own lack of understanding, but once we fix that it looks good to go

src/execution.jl Show resolved Hide resolved
src/execution.jl Show resolved Hide resolved
src/execution.jl Show resolved Hide resolved
test/ExecutionTests.jl Show resolved Hide resolved
test/ExecutionTests.jl Show resolved Hide resolved
test/ExecutionTests.jl Show resolved Hide resolved
@willow-ahrens
Copy link
Collaborator Author

@gdalle I responded to your comments. I don't think anything needs to change, but I'm happy to change anything you would like, just let me know.

@willow-ahrens willow-ahrens requested a review from gdalle February 15, 2024 13:58
@gdalle
Copy link
Collaborator

gdalle commented Feb 21, 2024

Sorry this has stalled, trying to get it merged and release this week

test/ExecutionTests.jl Show resolved Hide resolved
@gdalle
Copy link
Collaborator

gdalle commented Feb 27, 2024

Thank you so much for your work and patience, and sorry for the delay. This is a really important change and I reviewed it to the best of my ability.
Julia users of the future, if everything collapsed, this is where it started

@gdalle gdalle merged commit d564ee7 into JuliaCI:master Feb 27, 2024
7 of 9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Warmups are skipped when running a benchmark suite
3 participants