Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(leap): add performance article #760

Merged
merged 4 commits into from
Jan 13, 2024
Merged

Conversation

vaeng
Copy link
Contributor

@vaeng vaeng commented Jan 8, 2024

To make the links work, this should be merged after #759 ✔️
To make the image work, exercism/images#24 should be merged beforehand. ✔️

@vaeng vaeng added the x:type/docs Work on Documentation label Jan 8, 2024
@vaeng vaeng self-assigned this Jan 8, 2024
@siebenschlaefer
Copy link
Contributor

siebenschlaefer commented Jan 9, 2024

Hej!

When I ran the tests on my computer I got significantly different results:

  • I compiled with GCC-13.2.0 and -O3
  • I ran cpupower frequency-set --governor performance to avoid any interference from the processor
  • I ran benchmark_leap --benchmark_repetitions=100

On my computer (with a AMD Ryzen 1920X) I got these results (slightly shortened):

Benchmark                                     Time             CPU   Iterations
-------------------------------------------------------------------------------
BM_Leap_boolean_chain_mean                 1925 ns         1924 ns          100
BM_Leap_boolean_chain_median               1952 ns         1952 ns          100
BM_Leap_boolean_chain_stddev               37.9 ns         37.9 ns          100
BM_Leap_boolean_chain_cv                   1.97 %          1.97 %           100
BM_Leap_boolean_chain_inverse_mean         1213 ns         1213 ns          100
BM_Leap_boolean_chain_inverse_median       1210 ns         1210 ns          100
BM_Leap_boolean_chain_inverse_stddev       20.2 ns         20.2 ns          100
BM_Leap_boolean_chain_inverse_cv           1.67 %          1.67 %           100
BM_Leap_ternary_mean                        976 ns          976 ns          100
BM_Leap_ternary_median                      968 ns          968 ns          100
BM_Leap_ternary_stddev                     19.6 ns         19.6 ns          100
BM_Leap_ternary_cv                         2.01 %          2.01 %           100
BM_Leap_chrono_mean                         949 ns          949 ns          100
BM_Leap_chrono_median                       947 ns          947 ns          100
BM_Leap_chrono_stddev                      7.14 ns         7.11 ns          100
BM_Leap_chrono_cv                          0.75 %          0.75 %           100
BM_Leap_boost_mean                         1685 ns         1685 ns          100
BM_Leap_boost_median                       1683 ns         1683 ns          100
BM_Leap_boost_stddev                       9.90 ns         9.93 ns          100
BM_Leap_boost_cv                           0.59 %          0.59 %           100
BM_empty_read_mean                        0.000 ns        0.000 ns          100
BM_empty_read_median                      0.000 ns        0.000 ns          100
BM_empty_read_stddev                      0.000 ns        0.000 ns          100
BM_empty_read_cv                           1.78 %          1.20 %           100

It looks like "boolean_chain" is among the slowest approaches, "boolean_chain_inverse" is significantly faster but "ternary" beats both. That's surprising. Or am I misinterpreting the numbers?

@vaeng
Copy link
Contributor Author

vaeng commented Jan 10, 2024

BM_Leap_boolean_chain_median 1952 ns
BM_Leap_boolean_chain_inverse_mean 1213 ns

That is strange. I used g++ leap_benchmark.cpp -std=c++20 -isystem benchmark/include -Lbenchmark/build/src -lbenchmark -lpthread -o leap_benchmark, compiled with g++ version 14.10.0. So the article's values are without -O3.

I can't change my CPU to performance, as the VM does not support it, but that should not change things so drastically. I reran the tests many times and always got those results. Even with -O3.

Another go:

❯ gcc leap_benchmark.cpp -std=gnu++2a -O3 -isystem benchmark/include  -Lbenchmark/build/src -lbenchmark -lpthread -o leap_benchmark -lstdc++ -lm
❯ ./leap_benchmark --benchmark_report_aggregates_only=true --benchmark_repetitions=10
BM_Leap_boolean_chain_mean                  937 ns          937 ns           10
BM_Leap_boolean_chain_median                937 ns          937 ns           10
BM_Leap_boolean_chain_stddev               4.00 ns         4.01 ns           10
BM_Leap_boolean_chain_cv                   0.43 %          0.43 %            10
BM_Leap_boolean_chain_inverse_mean         1334 ns         1334 ns           10
BM_Leap_boolean_chain_inverse_median       1333 ns         1333 ns           10
BM_Leap_boolean_chain_inverse_stddev       3.36 ns         3.32 ns           10
BM_Leap_boolean_chain_inverse_cv           0.25 %          0.25 %            10
BM_Leap_ternary_mean                       1060 ns         1060 ns           10
BM_Leap_ternary_median                     1062 ns         1061 ns           10
BM_Leap_ternary_stddev                     2.95 ns         2.90 ns           10
BM_Leap_ternary_cv                         0.28 %          0.27 %            10
BM_Leap_chrono_mean                        1144 ns         1144 ns           10
BM_Leap_chrono_median                      1143 ns         1143 ns           10
BM_Leap_chrono_stddev                      4.75 ns         4.75 ns           10
BM_Leap_chrono_cv                          0.42 %          0.42 %            10
BM_Leap_boost_mean                         1018 ns         1018 ns           10
BM_Leap_boost_median                       1018 ns         1018 ns           10
BM_Leap_boost_stddev                       2.51 ns         2.45 ns           10
BM_Leap_boost_cv                           0.25 %          0.24 %            10
BM_empty_read_mean                        0.614 ns        0.614 ns           10
BM_empty_read_median                      0.614 ns        0.614 ns           10
BM_empty_read_stddev                      0.001 ns        0.001 ns           10
BM_empty_read_cv                           0.13 %          0.12 %            10
```text

@clechasseur
Copy link

Finally managed to run benchmarks on my old, slow computer 😉

$ ./leap_benchmark --benchmark_report_aggregates_only=true --benchmark_repetitions=10
2024-01-11T07:30:31-05:00
Running ./leap_benchmark
Run on (4 X 2712 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x2)
  L1 Instruction 32 KiB (x2)
  L2 Unified 256 KiB (x2)
  L3 Unified 3072 KiB (x1)
Load Average: 0.85, 0.63, 0.62
-------------------------------------------------------------------------------
Benchmark                                     Time             CPU   Iterations
-------------------------------------------------------------------------------
BM_Leap_boolean_chain_mean                17235 ns        17235 ns           10
BM_Leap_boolean_chain_median              17157 ns        17157 ns           10
BM_Leap_boolean_chain_stddev                193 ns          193 ns           10
BM_Leap_boolean_chain_cv                   1.12 %          1.12 %            10
BM_Leap_boolean_chain_inverse_mean        35652 ns        35652 ns           10
BM_Leap_boolean_chain_inverse_median      34359 ns        34359 ns           10
BM_Leap_boolean_chain_inverse_stddev       2747 ns         2747 ns           10
BM_Leap_boolean_chain_inverse_cv           7.70 %          7.70 %            10
BM_Leap_ternary_mean                      21083 ns        21084 ns           10
BM_Leap_ternary_median                    20425 ns        20425 ns           10
BM_Leap_ternary_stddev                     1350 ns         1350 ns           10
BM_Leap_ternary_cv                         6.40 %          6.40 %            10
BM_Leap_chrono_mean                       20505 ns        20506 ns           10
BM_Leap_chrono_median                     19862 ns        19863 ns           10
BM_Leap_chrono_stddev                      1577 ns         1577 ns           10
BM_Leap_chrono_cv                          7.69 %          7.69 %            10
BM_Leap_boost_mean                        36426 ns        36427 ns           10
BM_Leap_boost_median                      35927 ns        35927 ns           10
BM_Leap_boost_stddev                        824 ns          824 ns           10
BM_Leap_boost_cv                           2.26 %          2.26 %            10
BM_empty_read_mean                         1834 ns         1834 ns           10
BM_empty_read_median                       1833 ns         1833 ns           10
BM_empty_read_stddev                       5.73 ns         5.71 ns           10
BM_empty_read_cv                           0.31 %          0.31 %            10

Looks like on my computer, boost is much slower than chrono, which differs from @vaeng's results. Aside from that it seems to be similar (slower but proportional).

@vaeng
Copy link
Contributor Author

vaeng commented Jan 11, 2024

@clechasseur at least chain and inverse chain are in a "sensible" order.

What do you and @siebenschlaefer think? Should I add or edit some of the article, or let it stand as it is for my machine?

@vaeng
Copy link
Contributor Author

vaeng commented Jan 12, 2024

@ErikSchierboom have you had diverging benchmarks for an approach article before?

@clechasseur
Copy link

When I read the different approach documents for the Leap exercise, I wondered whether results could vary depending on how the CPU handles pipelining. The "inverse chain" is an example; on paper, it looks worse because the edge case is checked fist. But what if instructions are reordered in some way? I am wondering whether @siebenschlaefer's results can be explained by the fact that they're using an AMD processor (I am on Intel).

I will also admit to not being a fan of such very low-level optimizations in general. Optimizing a method to determine if a year is a leap year seems a bit extreme: is that really the performance bottleneck of your program? Let's say that it's true that the ternary approach is 27% slower; it's 27% of such an infinitesimal amount of time, does it really matter? But I will leave it to you to determine whether the article should be posted: in itself, it seems fine.

@vaeng
Copy link
Contributor Author

vaeng commented Jan 12, 2024

AMD processor (I am on Intel)

My results are AMD as well.

Let's say that it's true that the ternary approach is 27% slower; it's 27% of such an infinitesimal amount of time, does it really matter?

I should include that into the article.

@ErikSchierboom
Copy link
Member

@ErikSchierboom have you had diverging benchmarks for an approach article before?

Nope, but I've been the only one doing them on my articles :D

@vaeng
Copy link
Contributor Author

vaeng commented Jan 13, 2024

I think the wording in the article is a good summary of our discussion here @siebenschlaefer @ErikSchierboom.
I'd like to merge the article to see the new invert mechanism :)

@iHiD iHiD merged commit 77b2998 into main Jan 13, 2024
8 checks passed
@iHiD iHiD deleted the feat(leap)--add-performance-article branch January 13, 2024 17:29
@iHiD
Copy link
Member

iHiD commented Jan 13, 2024

Merged. Everyone feel free to do follow ups if appropriate but it's good for us to test this -invertible functionality before Tues :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
x:type/docs Work on Documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants