-
-
Notifications
You must be signed in to change notification settings - Fork 32
BumbleBench and HumbleBench
BumbleBench is a microbenchmark tool intended to make it as easy as possible to avoid common pitfalls when microbenchmarking Java. It is intended to make sure that test runs spend most of their time running the desired piece of code compiled at the highest possible level of quality. This is surprisingly tricky in an environment with dynamic compilation employing aggressive speculative optimizations.
The name "BumbleBench" derives from the manner in which the tool varies the iteration count of the benchmark's main loop in order to determine the highest iteration count that can be completed within a given target duration. The target score vacillates around the estimated maximum achievable score, alternating between low and high target scores in an attempt to converge on the actual achievable score, while remaining sensitive to variations in performance that can occur due to effects like jit compilation occurring during the run.
What: BumbleBench is a JAR file containing two classes (MicroBench and MiniBench) that you can extend to implement your microbenchmark by writing its inner loop. BumbleBench automatically adjusts the iteration count of your loop to measure how many iterations can complete within a given target duration.
When: BumbleBench is appropriate any time you have a workload you would like to run many times repeatedly in order to gauge its speed.
Where: BumbleBench is on GitHub.
Why: The priority for BumbleBench is ease of use. The intent is that you can read this Quick Start guide in five minutes, and then write a microbenchmark, and it will behave the way you want.
How to write tests: Write a benchmark class that extends MicroBench
or MiniBench
, add it to the jar file, and run java -jar BumbleBench.jar
[Benchmark]. Additional options can be set using the java -D
option, or by providing a .properties file alongside your benchmark class.
How to build your jar: you can build a jar from source in RTC client. Use it's build.xml to run an ant build.
Who: Message #testing slack channel if you need help.
You can list the available benchmarks with this command:
java -jar BumbleBench.jar
Then you can choose one of the benchmarks, and run it with this command:
java -jar BumbleBench.jar [Benchmark name]
BumbleBench options are set using -DBumbleBench.
xxx=
yyy. To see a list of all available options for your benchmark, set the listOptions
option:
java -DBumbleBench.listOptions -jar BumbleBench.jar [Benchmark name]
(For boolean options, the =
yyy part can be omitted.)
BumbleBench has a facility to finish its run with a single extra-long batch, during which you can run a profiling tool like tprof. To enable this feature, use -DBumbleBench.longBatchSeconds=''nnn''
. This will cause BumbleBench to wait for you to hit Enter, and then to perform one batch of the specified duration:
-= BumbleBench series 2 version 3.2 running net.adoptopenjdk.bumblebench.examples.TrigBench Mon Sep 29 11:22:26 EDT 2014 =- Target Est Uncert% MaxPeak Peak Peak% %paused 0.0s: >! 110 120.0 24.0 110 110 470.0 0.0s: >! 134.4 148.8 28.8 134.4 134.4 490.1 0.0s: >! 170.2 191.7 34.6 170.2 170.2 513.7
- ⋮
TrigBench score: 1.1853785E7 (11.85M) uncertainty: 0.8% -- LONG BATCH -- Press <Enter> to begin a 30-second batch... Running for 30 seconds... ...done.
BumbleBench offers a way to run multiple copies of the same benchmark in multiple threads.
The basic technique uses -DBumbleBench.parallelInstances=''N''
. This causes BumbleBench to create N instances of your benchmark class, and then run them in N threads. Each instance of your benchmark is controlled using a pair of BlockingQueue
s to communicate target and result scores for each batch to an instance of ParallelBench
, which manages the threads and aggregates the results.
There are three different settings for the aggregationStyle
option that controls how the benchmark score is computed:
AVERAGE: The score is the arithmetic mean all scores achieved by the threads. This is the default. If the benchmark scales perfectly, then you should get the same score with parallelInstances
as without it.
SUM: The score is the total of all scores achieved by the threads.
MIN: The score is the lowest score achieved by any of the benchmark threads. This is effectively the score achieved by the longest-running thread.
If your benchmark class has mutable static fields, you may still be able to run parallel instances. If you specify the option flag classPerInstance
, BumbleBench will load and initialize your benchmark class N times, each with its own separate copy of the static fields.
This mode is not recommended because it can cause the jit to do weird things like compile the exact same code repeatedly for each instance. It's meant as a workaround to get easy parallel runs of a benchmark that was not designed with parallelism in mind. If you care about parallel performance measurement, it's better to write the benchmark to use instance variables instead of statics.
Note that only the main benchmark class is loaded multiple times. If your benchmark uses other global data, it probably needs to be modified in order to run in parallel.
The basic MicroBench looks like this:
protected long doBatch(long numIterations) throws InterruptedException { for (long i = 0; i < numIterations; i++) { '''''// WORKLOAD GOES HERE''''' } return numIterations; } }
The basic MiniBench is a little more complex, but a surprising number of benchmarks make good use of its nested loop structure:
protected int maxIterationsPerLoop(){ return 1234567; } // Max allowable value of numIterationsPerLoop protected long doBatch(long numLoops, int numIterationsPerLoop) throws InterruptedException { '''''// YOU CAN ALLOCATE SOME RESOURCES OR DO OTHER SETUP HERE ...''''' for (long i = 0; i < numLoops; i++) { '''''// ... AND HERE.''''' for (int j = 0; j < numIterationsPerLoop; j++) { startTimer(); '''''// WORKLOAD GOES HERE - preferably just a call to a method that does the actual work''''' pauseTimer(); } } return numLoops * numIterationsPerLoop; } }
Briefly:
- For a MicroBench:
- Implement
long doBatch(long numIterations)
to do the requested number of iterations and return the number of iterations performed. - By default, the timer is always running. If you have portions of your workload you don't want timed, you'll need to call
pauseTimer()
andstartTimer()
around those portions.
- Implement
- For a MiniBench:
- Implement
long doBatch(long numLoops, int numIterationsPerLoop)
to do the logic for one batch and return the number of iterations performed. - Implement
int maxIterationsPerLoop()
to indicate the limit on thenumIterationsPerLoop
parameter fordoBatch
. - By default, the timer is not running when
doBatch
begins. CallstartTimer()
andpauseTimer()
around the portions of the batch logic that you want to measure.
- Implement
- Make your benchmark class final, and make as many as possible of its fields final
- Implement your setup parameters as final static fields initialized using the
option
method - Try to set up your test so it scales properly, ie. the score should be largely invariant with respect to the setup parameters, including
targetDuration
and the number of iterations per batch. - Put the timed portion of a MiniBench in its own method. That makes it easier to study in isolation. The size of the inner loop should be arranged to make the call overhead insignificant, and anyway, it can be measured by disabling inlining.
- Use locals instead of fields wherever convenient. You can even use a local to privatize a field before startTimer if you want.
- Naming:
- static final fields should use UPPERCASE_NAMES
- other fields (static or instance fields) should begin with an underscore to distinguish them from locals, and otherwise use camelCase
- Import classes, not whole packages
HumbleBench
is a subclass of MiniBench
intended to facilitate benchmarking at low optimization levels by throttling the real workload. By default HumbleBench
will try to run doBatch
for a third of a percent of the time. That should be enough to compile at warm, but not at hot. However, it's important to confirm with a verbose log that methods have been compiled at the expected level.
@Override protected final void setup(int numIterations) { // optional '''''// YOU CAN ALLOCATE SOME RESOURCES OR DO OTHER SETUP HERE''''' } public static class Workload extends AbstractWorkload { @Override public void doBatch(HumbleBench bench, int numIterations) { for (int i = 0; i < numIterations; i++) { '''''// WORKLOAD GOES HERE''''' } } } }
See net.adoptopenjdk.bumblebench.humble.SameStringsEqualsBench
for an example.
HumbleBench
has two main options:
-
loadFactor=N
: Spend 1/N
of the time in the workload method (default 300) -
fanout=N
: LoadN
copies of the workload to allow shorter batches (default 12)
loadFactor
:
warm |
loadFactor=300
|
hot |
loadFactor=50
|
scorching |
loadFactor=1
|
If a workload makes calls that are not inlined, fanout will be ineffective at reducing time spent in the callee. For this reason, it must not be set too high in workloads in which inlining is expected at warm, or else the callee's high fan-in will prevent inlining.
There is a heuristic for setting the default batchTargetDuration
to avoid overlong batches. If overriding this default, note that targetIncludesPauses
defaults to false under HumbleBench
.
Java microbenchmarks are notoriously hard to write, and can easily end up measuring the wrong thing. Bumblebench is maintained by people experienced in Java performance tuning, and is designed to make microbenchmarking less error prone.
The goal is to allow people to write microbenchmarks like they do in a static language, with one or two little loops that just run the desired workload. In Java that typically does not work well, but BumbleBench is designed to make it work as well as possible.
Bumblebench allows people to write microbenchmarks by implementing a single method called doBatch:
protected long doBatch(long numIterations) throws InterruptedException { for (long i = 0; i < numIterations; i++) { '''''// WORKLOAD GOES HERE''''' } return numIterations; }
Bumblebench calls this method repeatedly with various values of numIterations, with the objective to make the doBatch call last for some specified target duration (default being one second). It alternates passing numIterations that it believes are first too low, and then too high, to finish within the target duration. These are called "lowball" and "highball" guesses. Whenever it's right, the "uncertainty" gets smaller, causing Bumblebench to attempt lowball and highball values closer and closer together. When it's wrong, the uncertainty increases. (Uncertainty can also increase if a lowball guess is very low, or a highball guess is very high.)
The output of a run looks like this:
Target Est Uncert% MaxPeak Peak Peak% %paused 0.0s: >! 110 110.0K 24.0 110 110 470.0 0.0s: >! 123.2K 3.850M 28.8 123.2K 123.2K 1172.2 0.4s: >! 4.404M 11.29M 34.6 4.404M 4.404M 1529.8 1.6s: < 13.24M 11.73M 20.7 4.404M 4.404M 1529.8 2.5s: > 10.52M 11.74M 12.4 10.52M 10.52M 1616.8 3.5s: < 12.47M 11.57M 7.5 10.52M 10.52M 1616.8 4.5s: > 11.14M 11.70M 4.5 11.14M 11.14M 1622.6 5.5s: < 11.97M 11.80M 2.7 11.14M 11.14M 1622.6 6.5s: > 11.64M 11.86M 1.6 11.64M 11.64M 1627.0 7.5s: < 11.95M 11.80M 1.0 11.64M 11.64M 1627.0 8.5s: > 11.74M 11.78M 0.6 11.74M 11.74M 1627.9 9.5s: < 11.81M 11.75M 0.3 11.74M 11.74M 1627.9 10.5s: > 11.73M 11.79M 0.2 11.74M 11.74M 1627.9 -- ballpark -- 11.5s: < 11.80M 11.70M 0.3 11.74M 11.74M 1627.9 12.5s: > 11.68M 11.77M 0.3 11.74M 11.74M 1627.9 13.5s: >! 11.78M 11.84M 0.4 11.78M 11.78M 1628.2 14.5s: >! 11.86M 11.89M 0.4 11.86M 11.86M 1628.9 15.5s: < 11.91M 11.81M 0.5 11.86M 11.86M 1628.9 16.5s: <! 11.78M 11.69M 0.6 11.86M -∞ -- 17.5s: > 11.66M 11.84M 0.7 11.86M 11.66M 1627.1 18.6s: < 11.88M 11.80M 0.4 11.86M 11.66M 1627.1 19.6s: <! 11.77M 11.45M 0.5 11.86M 11.66M 1627.1 20.6s: > 11.42M 11.76M 0.6 11.86M 11.66M 1627.1 21.6s: < 11.80M 11.79M 0.4 11.86M 11.66M 1627.1 22.6s: <! 11.76M 11.69M 0.5 11.86M 11.66M 1627.1 23.6s: > 11.67M 11.77M 0.6 11.86M 11.67M 1627.2 24.6s: < 11.80M 11.66M 0.7 11.86M 11.67M 1627.2 25.6s: <! 11.63M 11.59M 0.8 11.86M -∞ -- 26.6s: > 11.54M 11.59M 0.5 11.86M 11.54M 1626.2 27.6s: >! 11.62M 11.75M 0.6 11.86M 11.62M 1626.8 28.6s: >! 11.78M 11.78M 0.7 11.86M 11.78M 1628.2 29.6s: >! 11.82M 11.83M 0.8 11.86M 11.82M 1628.6 30.6s: < 11.88M 11.85M 0.5 11.86M 11.82M 1628.6 31.6s: <! 11.82M 11.81M 0.6 11.86M -∞ -- 32.6s: > 11.77M 11.84M 0.4 11.86M 11.77M 1628.1 33.6s: < 11.86M 11.78M 0.4 11.86M 11.77M 1628.1 34.6s: > 11.76M 11.79M 0.3 11.86M 11.77M 1628.1 35.6s: >! 11.81M 11.82M 0.3 11.86M 11.81M 1628.4 36.6s: < 11.84M 11.81M 0.2 11.86M 11.81M 1628.4 37.6s: > 11.80M 11.86M 0.2 11.86M 11.81M 1628.4 38.6s: < 11.87M 11.86M 0.1 11.86M 11.81M 1628.4 39.6s: > 11.86M 11.86M 0.1 11.86M 11.86M 1628.8 40.6s: >! 11.86M 11.91M 0.1 11.86M 11.86M 1628.9 41.6s: < 11.91M 11.77M 0.1 11.86M 11.86M 1628.9 42.6s: > 11.77M 11.81M 0.1 11.86M 11.86M 1628.9 -- finale -- 43.6s: >! 11.82M 11.83M 0.2 11.86M 11.86M 1628.9 44.6s: < 11.84M 11.83M 0.1 11.86M -∞ -- 45.6s: <! 11.82M 11.80M 0.1 11.86M -∞ -- 46.6s: <! 11.79M 11.69M 0.1 11.86M -∞ -- 47.6s: > 11.68M 11.88M 0.2 11.86M 11.68M 1627.3 48.6s: < 11.89M 11.77M 0.2 11.86M 11.68M 1627.3 49.6s: > 11.76M 11.77M 0.1 11.86M 11.76M 1628.0 50.6s: < 11.78M 11.65M 0.1 11.86M 11.76M 1628.0 51.6s: > 11.64M 11.70M 0.2 11.86M 11.76M 1628.0 52.6s: >! 11.71M 11.83M 0.2 11.86M 11.76M 1628.0 53.6s: < 11.84M 11.80M 0.3 11.86M 11.76M 1628.0 54.6s: > 11.78M 11.88M 0.3 11.86M 11.78M 1628.2 55.6s: < 11.90M 11.75M 0.4 11.86M 11.78M 1628.2 56.6s: <! 11.73M 11.67M 0.4 11.86M -∞ -- 57.6s: > 11.65M 11.78M 0.5 11.86M 11.65M 1627.1 TrigBench score: 1.1859834E7 (11.86M 1628.9%) uncertainty: 0.5%
There's one line per batch (literally a call to doBatch
). The columns are:
Target: The number of iterations requested for this batch
Est: The actual estimated throughput achieved, based on the actual number of iterations performed and the actual duration
Uncert%: The current uncertainty, or difference between lowball and highball estimates
MaxPeak: The largest successful numIterations value ever recorded
Peak: The largest successful numIterations value in the current run, where a "run" ends at an unsuccessful lowball batch
Peak%: The "Peak" in log points to help with mental math
%paused: The fraction of the run where the timer was paused. If the timer was never paused, this will be blank.
In practical terms:
- "Target" isn't terribly useful under normal circumstances.
- "Est" usually best characterizes the current instantaneous performance of the benchmark.
- "Uncert%" indicates the precision of the measurements.
- "MaxPeak" is usually the best "overall score" indicator for the benchmark.
- "Peak" is like "MaxPeak", but if performance suffers a drop during the run, "Peak" will reflect that while "MaxPeak" will ignore it.
<:
failure: the doBatch call finished after the target duration had elapsed
>:
success: the doBatch call finished before the target duration had elapsed
!:
the result was surprising (a lowball run failed, or a highball run succeeded)
?:
the workload has declined to provide throughput estimates, opting to give just "pass" and "fail" results
There are two additional lines that indicate different "phases" of BumbleBench:
-- ballpark --: At this point, BumbleBench has some confidence the score is near its eventual final value. It could be wrong, of course, so it runs the benchmark for some additional time to make sure the score is stable.
-- finale --: At this point, BumbleBench is ready to finish up. However, if the current score has regressed from a prior max-peak value, then the benchmark may actually be unable to sustain the measured max-peak score. To avoid egregiously temporary max-peak scores, BumbleBench reduces the max-peak score to the current peak score, and runs for a small amount of additional time to allow the score to regain its prior max-peak if it can.
Some Design Highlights
- Configuration is all through -D property settings, instead of command line arguments, allowing the use of static final fields (which are readily optimized) that can be altered without recompiling the program. Try -DBumbleBench.listOptions to see all the options available.
- All microbenchmark kernel logic can go into the single doBatch method. This encourages the use of local variables, which are readily optimized. In particular, Bumblebench offers
pauseTimer()
andstartTimer()
methods that make it easy to record the timing of just a portion of the doBatch logic, further encouraging people to put the whole kernel in one method. - Bumblebench design encourages a style of one benchmark per program. Running multiple benchmarks in succession in a single Java program can cause phase changes in common infrastructural classes, distorting the performance of the later benchmarks.
- doBatch can return the number of iterations actually performed. If it is awkward to perform the actual desired number of iterations, doBatch can perform a few more or a few less without harming the accuracy of the timing results. For instance, if the kernel has two nested loops of M and N iterations respectively, then it can return M*N even if that is not exactly equal to numIterations.
- Benchmark classes are instantiated, so a benchmark can record state in the benchmark class instance. This allows for state that outlives the doBatch call, yet does not require mutable static variables. This is the preferred way to achieve #Multi-threaded runs.
- This section is confusing. Don't read it unless you want a deep understanding of exactly what BumbleBench measures.
The blue dashed lines indicate the range in which BumbleBench believes the benchmark score lies at any given time, and the orange line shows the estimated score reported by the workload. This is a highly variable workload, so sometimes the score is between the dashed lines, and sometimes it is not. Each batch that runs is represented by an asterisk.
The key to understanding BumbleBench's approach is that BumbleBench does not trust estimated scores; that is, the orange line does not directly affect the final benchmark score. Rather, BumbleBench is only interested in successful batches, which are those that were able to finish the requested number of iterations before the deadline elapsed. Thus, graphically speaking, the final score reported always corresponds to the location of some asterisk that lies under the orange line.
The distance between the dashed lines is the "uncertainty". BumbleBench continually tries to reduce its uncertainty about the score, which means it wants to minimize the distance between the dashed lines while keeping the orange line in between them. To bring the dashed lines closer together, BumbleBench alternately attempts to raise the bottom line and lower the top line. It does so by requesting batches (represented by the asterisks) at various scores. An asterisk on the bottom line is a "lowball" batch, and an asterisk on the top line is a "highball" batch.
The yellow line is the so-called "peak" score. That indicates the highest successful batch, meaning the highest asterisk that was under the orange line. For example, right before 34 seconds, there is a blue asterisk under the orange line. This indicates a successful batch, and since that batch was above the most recent peak score, this established a new peak score, moving the yellow line up to the asterisk. This happens again at 46.6, 49, and 50 seconds.
This "peak" score, however is designed to be sensitive to slowdowns in the benchmark. When a batch fails, and the target score of that batch was below the peak score, then BumbleBench concludes that the performance may have dropped, and attempts to measure this by resetting the peak score. You can see this occurring just after 44 seconds: the orange line came in under the asterisk, meaning the batch failed, and the asterisk was under the most recent location of the yellow line. In response, BumbleBench reset the peak score, causing the yellow line to fall from its previous location to -infinity. It recovered again just after 46 seconds, when a batch succeeded (the orange line rose above the asterisk), causing the yellow line to jump back up to the location of the asterisk.
The green line is the so-called "maxPeak" score. It maintains the highest observed value of the peak score, and so it never decreases. The only exception to this is when the "finale" phase begins, which occurred at 42.6 seconds in the graph. To guard against the case in which the benchmark peaks early and then never again achieves its highest score, BumbleBench adjusts maxPeak so it equals the peak score at the start of the finale; that is, it brings the green line down to meet the yellow line. Aside from this one exception, the green line always increases. This can be seen to happen at 50 seconds, for example.
It is the final value of the green line, the maxPeak score, that is eventually reported as the benchmark score.
THIS IS A WORK IN PROGRESS
The simplest "hello world" example is a tiny microbenchmark that measures Math.sin
. It can be found in net/adoptopenjdk/BumbleBench/examples/TrigBench.java
.
public final class TrigBench extends MicroBench { protected long doBatch(long numIterations) throws InterruptedException { double argument = 0.1; for (long i = 0; i < numIterations; i++) argument = 4.6 * Math.sin(argument); // Chaos! return numIterations; } }
You can build this file, then it to BumbleBench.jar
using this command:
jar -uf BumbleBench.jar net/adoptopenjdk/BumbleBench/examples/TrigBench.class
Then run it like this:
java -jar BumbleBench.jar TrigBench
You should see output like this:
> java -jar BumbleBench.jar TrigBench
Target Est Uncert% MaxPeak Peak Peak% %paused 0.0s: >! 110 110.0K 24.0 110 110 470.0 0.0s: >! 123.2K 3.850M 28.8 123.2K 123.2K 1172.2 0.4s: >! 4.404M 11.29M 34.6 4.404M 4.404M 1529.8 1.6s: < 13.24M 11.73M 20.7 4.404M 4.404M 1529.8 2.5s: > 10.52M 11.74M 12.4 10.52M 10.52M 1616.8 3.5s: < 12.47M 11.57M 7.5 10.52M 10.52M 1616.8 4.5s: > 11.14M 11.70M 4.5 11.14M 11.14M 1622.6 5.5s: < 11.97M 11.80M 2.7 11.14M 11.14M 1622.6 6.5s: > 11.64M 11.86M 1.6 11.64M 11.64M 1627.0 7.5s: < 11.95M 11.80M 1.0 11.64M 11.64M 1627.0 8.5s: > 11.74M 11.78M 0.6 11.74M 11.74M 1627.9 9.5s: < 11.81M 11.75M 0.3 11.74M 11.74M 1627.9 10.5s: > 11.73M 11.79M 0.2 11.74M 11.74M 1627.9 -- ballpark -- 11.5s: < 11.80M 11.70M 0.3 11.74M 11.74M 1627.9 12.5s: > 11.68M 11.77M 0.3 11.74M 11.74M 1627.9 13.5s: >! 11.78M 11.84M 0.4 11.78M 11.78M 1628.2 14.5s: >! 11.86M 11.89M 0.4 11.86M 11.86M 1628.9 15.5s: < 11.91M 11.81M 0.5 11.86M 11.86M 1628.9 16.5s: <! 11.78M 11.69M 0.6 11.86M -∞ -- 17.5s: > 11.66M 11.84M 0.7 11.86M 11.66M 1627.1 18.6s: < 11.88M 11.80M 0.4 11.86M 11.66M 1627.1 19.6s: <! 11.77M 11.45M 0.5 11.86M 11.66M 1627.1 20.6s: > 11.42M 11.76M 0.6 11.86M 11.66M 1627.1 21.6s: < 11.80M 11.79M 0.4 11.86M 11.66M 1627.1 22.6s: <! 11.76M 11.69M 0.5 11.86M 11.66M 1627.1 23.6s: > 11.67M 11.77M 0.6 11.86M 11.67M 1627.2 24.6s: < 11.80M 11.66M 0.7 11.86M 11.67M 1627.2 25.6s: <! 11.63M 11.59M 0.8 11.86M -∞ -- 26.6s: > 11.54M 11.59M 0.5 11.86M 11.54M 1626.2 27.6s: >! 11.62M 11.75M 0.6 11.86M 11.62M 1626.8 28.6s: >! 11.78M 11.78M 0.7 11.86M 11.78M 1628.2 29.6s: >! 11.82M 11.83M 0.8 11.86M 11.82M 1628.6 30.6s: < 11.88M 11.85M 0.5 11.86M 11.82M 1628.6 31.6s: <! 11.82M 11.81M 0.6 11.86M -∞ -- 32.6s: > 11.77M 11.84M 0.4 11.86M 11.77M 1628.1 33.6s: < 11.86M 11.78M 0.4 11.86M 11.77M 1628.1 34.6s: > 11.76M 11.79M 0.3 11.86M 11.77M 1628.1 35.6s: >! 11.81M 11.82M 0.3 11.86M 11.81M 1628.4 36.6s: < 11.84M 11.81M 0.2 11.86M 11.81M 1628.4 37.6s: > 11.80M 11.86M 0.2 11.86M 11.81M 1628.4 38.6s: < 11.87M 11.86M 0.1 11.86M 11.81M 1628.4 39.6s: > 11.86M 11.86M 0.1 11.86M 11.86M 1628.8 40.6s: >! 11.86M 11.91M 0.1 11.86M 11.86M 1628.9 41.6s: < 11.91M 11.77M 0.1 11.86M 11.86M 1628.9 42.6s: > 11.77M 11.81M 0.1 11.86M 11.86M 1628.9 -- finale -- 43.6s: >! 11.82M 11.83M 0.2 11.86M 11.86M 1628.9 44.6s: < 11.84M 11.83M 0.1 11.86M -∞ -- 45.6s: <! 11.82M 11.80M 0.1 11.86M -∞ -- 46.6s: <! 11.79M 11.69M 0.1 11.86M -∞ -- 47.6s: > 11.68M 11.88M 0.2 11.86M 11.68M 1627.3 48.6s: < 11.89M 11.77M 0.2 11.86M 11.68M 1627.3 49.6s: > 11.76M 11.77M 0.1 11.86M 11.76M 1628.0 50.6s: < 11.78M 11.65M 0.1 11.86M 11.76M 1628.0 51.6s: > 11.64M 11.70M 0.2 11.86M 11.76M 1628.0 52.6s: >! 11.71M 11.83M 0.2 11.86M 11.76M 1628.0 53.6s: < 11.84M 11.80M 0.3 11.86M 11.76M 1628.0 54.6s: > 11.78M 11.88M 0.3 11.86M 11.78M 1628.2 55.6s: < 11.90M 11.75M 0.4 11.86M 11.78M 1628.2 56.6s: <! 11.73M 11.67M 0.4 11.86M -∞ -- 57.6s: > 11.65M 11.78M 0.5 11.86M 11.65M 1627.1 TrigBench score: 1.1859834E7 (11.86M 1628.9%) uncertainty: 0.5%
This output contains a succession of lines, each representing a single batch (literally a call to TrigBench.doBatch
)
-= BumbleBench series 2 version 3.1 running net.adoptopenjdk.bumblebench.examples.EmptyBench Sun Sep 14 23:36:39 EDT 2014 =- Target Est Uncert% MaxPeak Peak Peak% %paused 0.0s: >! 110 120.0 24.0 110 110 470.0 0.0s: >! 134.4 148.8 28.8 134.4 134.4 490.1 0.0s: >! 170.2 191.7 34.6 170.2 170.2 513.7 0.0s: >! 224.8 257.9 40.0 224.8 224.8 541.5 0.0s: >! 309.5 361.0 40.0 309.5 309.5 573.5 0.0s: >! 433.3 505.5 40.0 433.3 433.3 607.1 0.0s: >! 606.6 707.7 40.0 606.6 606.6 640.8 0.0s: >! 849.2 990.7 40.0 849.2 849.2 674.4 0.0s: >! 1189 1387 40.0 1189 1189 708.1 0.0s: >! 1664 1942 40.0 1664 1664 741.7 0.0s: >! 2330 2719 40.0 2330 2330 775.4 0.0s: >! 3262 3806 40.0 3262 3262 809.0 0.0s: >! 4567 5328 40.0 4567 4567 842.7 0.0s: >! 6394 7460 40.0 6394 6394 876.3 0.0s: >! 8952 10.44K 40.0 8952 8952 910.0 0.0s: >! 12.53K 14.62K 40.0 12.53K 12.53K 943.6 0.0s: >! 17.54K 20.47K 40.0 17.54K 17.54K 977.3 0.0s: >! 24.56K 28.66K 40.0 24.56K 24.56K 1010.9 0.0s: >! 34.39K 40.12K 40.0 34.39K 34.39K 1044.5
- ⋮
EmptyBench score: Infinity (∞) uncertainty: 0.0%
BumbleBench correctly reports that this benchmark runs infinitely fast: no matter now many iterations are requested, a batch finishes long before the target time has elapsed.
MORE TO COME
Put your custom settings in a .properties
file with the same name as your benchmark's class, and put it in the benchmark directory inside BumbleBench.jar
. (See net/adoptopenjdk/bumblebench/examples/TardyBench.properties
for example.)
You probably don't want to. If you want the timer paused, just call pauseTimer
, and if you want it running, call startTimer
. These methods are idempotent, so it's harmless to call them if the timer is already in the desired state. Adding code to check the current state would be overly complicated, unnecessary, and could distort performance measurements.
If you really must check, then call isTimerPaused
, but first you must promise that you have read the preceding paragraph.
No. This is intentional. Having initialization in another method would require information to be passed to doBench
in fields. Locals are faster than fields, so having doBatch
load data from fields could distort your measurements.
Instead of an initialization method, BumbleBench encourages you to call pauseTimer
and startTimer
around any logic in doBatch
that you do not wish to measure. Of course, these methods can also distort measurements if you do them too often, but they do so in a very predictable way (each of these acts like a call to System.currentTimeMillis
), and they are normally amortized into insignificance as long as you don't call them in your workload's inner loop.
Initialization to be performed just once, however, is another matter. For adjustable settings, use the option
method to set a static final field. To acquire and release resources (like opening and closing a network socket), you can override bumbleMain
, adding your own logic before and after calling super.bumbleMain()
.
No. This is intentional. Final static fields can be folded away by the JIT compiler as though their values had been hard-coded into the program. In contrast, variables (even local variables) with values derived from command line options cannot be folded away, and are more likely to distort your measurements.
Adjustable benchmark parameters should use static final fields initialized using the BumbleBench option
method.
Variations on a benchmark can be written as subclasses. BumbleBench even goes to some effort to support a dot syntax A.B
if you'd like to implement your benchmark variation as an inner subclass B
of a common superclass A
, if that's how you'd like to organize your code. (See net/adoptopenjdk/BumbleBench/lambda/DispatchBench.java
for example.) We recommend all classes in the MicroBench
and MiniBench
hierarchy be either abstract or final. This can help the JIT to optimize the benchmark harness so the measurement is focused on your workload.