Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SIMD benches #1553

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Add SIMD benches #1553

wants to merge 2 commits into from

Conversation

dhardy
Copy link
Member

@dhardy dhardy commented Jan 14, 2025

I accidentally closed #1552 with a force push, so...

Summary

Add benchmarks for uniform distribution / single-sample variates for Simd types: u8x8, u8x16, u8x32, u8x64, i16x8, i16x16, i16x32.

Also, change the name of the non-SIMD benchmarks to "x1" e.g. sample_i16x1/SmallRng/distr.

Motivation

This is a pre-requisite for any type of SIMD optimisation.

Details

Sample output (5800X):

$ cargo +nightly bench --bench uniform --features simd_support -- SmallRng
    Finished `bench` profile [optimized] target(s) in 0.03s
     Running benches/uniform.rs (target/release/deps/uniform-fff5b09bd763a405)
sample_i8x1/SmallRng/single
                        time:   [1.5923 ns 1.5929 ns 1.5936 ns]
Found 8408 outliers among 100000 measurements (8.41%)
  278 (0.28%) low severe
  303 (0.30%) low mild
  3554 (3.55%) high mild
  4273 (4.27%) high severe
Criterion.rs ERROR: Error in Gnuplot:          line 0: Can't plot with an empty x range!


Benchmarking sample_i8x1/SmallRng/distr: Warming up for 1.0000 s
Warning: Unable to complete 100000 samples in 3.0s. You may wish to increase target time to 5.3s, enable flat sampling, or reduce sample count to 53020.
sample_i8x1/SmallRng/distr
                        time:   [1.0419 ns 1.0422 ns 1.0425 ns]
Found 12826 outliers among 100000 measurements (12.83%)
  4043 (4.04%) high mild
  8783 (8.78%) high severe

sample_i16x1/SmallRng/single
                        time:   [1.4809 ns 1.4812 ns 1.4817 ns]
Found 22841 outliers among 100000 measurements (22.84%)
  105 (0.10%) high mild
  22736 (22.74%) high severe
Criterion.rs ERROR: Error in Gnuplot:          line 0: Can't plot with an empty x range!


Criterion.rs ERROR: Error in Gnuplot:          line 0: Can't plot with an empty x range!


Criterion.rs ERROR: Error in Gnuplot:          line 0: Can't plot with an empty x range!


Benchmarking sample_i16x1/SmallRng/distr: Warming up for 1.0000 s
Warning: Unable to complete 100000 samples in 3.0s. You may wish to increase target time to 5.3s, enable flat sampling, or reduce sample count to 52990.
sample_i16x1/SmallRng/distr
                        time:   [1.0454 ns 1.0458 ns 1.0461 ns]
Found 12165 outliers among 100000 measurements (12.16%)
  3140 (3.14%) high mild
  9025 (9.03%) high severe

sample_i32x1/SmallRng/single
                        time:   [3.1251 ns 3.1331 ns 3.1411 ns]
Found 12 outliers among 100000 measurements (0.01%)
  5 (0.01%) high mild
  7 (0.01%) high severe
sample_i32x1/SmallRng/distr
                        time:   [1.9073 ns 1.9139 ns 1.9206 ns]
Found 8875 outliers among 100000 measurements (8.88%)
  5546 (5.55%) high mild
  3329 (3.33%) high severe

sample_i64x1/SmallRng/single
                        time:   [4.3892 ns 4.3953 ns 4.4018 ns]
Found 73 outliers among 100000 measurements (0.07%)
  64 (0.06%) high mild
  9 (0.01%) high severe
sample_i64x1/SmallRng/distr
                        time:   [1.7550 ns 1.7616 ns 1.7681 ns]
Found 9194 outliers among 100000 measurements (9.19%)
  5378 (5.38%) high mild
  3816 (3.82%) high severe

sample_i128x1/SmallRng/single
                        time:   [9.7639 ns 9.7734 ns 9.7839 ns]
Found 162 outliers among 100000 measurements (0.16%)
  135 (0.14%) high mild
  27 (0.03%) high severe
sample_i128x1/SmallRng/distr
                        time:   [3.8971 ns 3.9066 ns 3.9166 ns]
Found 8601 outliers among 100000 measurements (8.60%)
  6231 (6.23%) high mild
  2370 (2.37%) high severe

sample_u8x8/SmallRng/single
                        time:   [23.973 ns 24.000 ns 24.027 ns]
Found 1602 outliers among 100000 measurements (1.60%)
  1209 (1.21%) low mild
  300 (0.30%) high mild
  93 (0.09%) high severe
sample_u8x8/SmallRng/distr
                        time:   [12.358 ns 12.379 ns 12.400 ns]
Found 300 outliers among 100000 measurements (0.30%)
  264 (0.26%) high mild
  36 (0.04%) high severe

sample_u8x16/SmallRng/single
                        time:   [43.275 ns 43.308 ns 43.344 ns]
Found 407 outliers among 100000 measurements (0.41%)
  2 (0.00%) low mild
  342 (0.34%) high mild
  63 (0.06%) high severe
sample_u8x16/SmallRng/distr
                        time:   [26.532 ns 26.560 ns 26.587 ns]
Found 289 outliers among 100000 measurements (0.29%)
  6 (0.01%) low mild
  243 (0.24%) high mild
  40 (0.04%) high severe

sample_u8x32/SmallRng/single
                        time:   [71.023 ns 71.052 ns 71.083 ns]
Found 1010 outliers among 100000 measurements (1.01%)
  350 (0.35%) low mild
  625 (0.62%) high mild
  35 (0.04%) high severe
sample_u8x32/SmallRng/distr
                        time:   [35.570 ns 35.598 ns 35.625 ns]
Found 906 outliers among 100000 measurements (0.91%)
  397 (0.40%) low mild
  402 (0.40%) high mild
  107 (0.11%) high severe

sample_u8x64/SmallRng/single
                        time:   [121.40 ns 121.44 ns 121.49 ns]
Found 3279 outliers among 100000 measurements (3.28%)
  206 (0.21%) low mild
  1918 (1.92%) high mild
  1155 (1.16%) high severe
sample_u8x64/SmallRng/distr
                        time:   [54.650 ns 54.680 ns 54.711 ns]
Found 845 outliers among 100000 measurements (0.84%)
  358 (0.36%) low mild
  363 (0.36%) high mild
  124 (0.12%) high severe

sample_i16x8/SmallRng/single
                        time:   [31.671 ns 31.700 ns 31.730 ns]
Found 1196 outliers among 100000 measurements (1.20%)
  475 (0.47%) low mild
  637 (0.64%) high mild
  84 (0.08%) high severe
sample_i16x8/SmallRng/distr
                        time:   [21.576 ns 21.602 ns 21.628 ns]
Found 335 outliers among 100000 measurements (0.34%)
  31 (0.03%) low mild
  283 (0.28%) high mild
  21 (0.02%) high severe

sample_i16x16/SmallRng/single
                        time:   [50.998 ns 51.044 ns 51.093 ns]
Found 1362 outliers among 100000 measurements (1.36%)
  755 (0.76%) high mild
  607 (0.61%) high severe
sample_i16x16/SmallRng/distr
                        time:   [29.717 ns 29.745 ns 29.773 ns]
Found 228 outliers among 100000 measurements (0.23%)
  1 (0.00%) low mild
  179 (0.18%) high mild
  48 (0.05%) high severe

sample_i16x32/SmallRng/single
                        time:   [83.029 ns 83.066 ns 83.106 ns]
Found 2168 outliers among 100000 measurements (2.17%)
  534 (0.53%) low mild
  1000 (1.00%) high mild
  634 (0.63%) high severe
sample_i16x32/SmallRng/distr
                        time:   [43.454 ns 43.482 ns 43.511 ns]
Found 1070 outliers among 100000 measurements (1.07%)
  516 (0.52%) low mild
  497 (0.50%) high mild
  57 (0.06%) high severe

Further motivation

In particular, I wanted to know whether the target_feature optimisations in src/distr/utils.rs are useful. Not using the sse2 and avx2 features on my CPU (which doesn't support AVX512) I get very similar results implying they may not be useful:

sample_i16x8/SmallRng/single
                        time:   [32.139 ns 32.168 ns 32.197 ns]
                        change: [+1.3338% +1.4755% +1.6156%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 1234 outliers among 100000 measurements (1.23%)
  491 (0.49%) low mild
  635 (0.64%) high mild
  108 (0.11%) high severe
sample_i16x8/SmallRng/distr
                        time:   [22.544 ns 22.570 ns 22.596 ns]
                        change: [+4.3081% +4.4816% +4.6666%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 510 outliers among 100000 measurements (0.51%)
  13 (0.01%) low mild
  437 (0.44%) high mild
  60 (0.06%) high severe
sample_i16x16/SmallRng/single
                        time:   [50.056 ns 50.089 ns 50.123 ns]
                        change: [-1.9848% -1.8715% -1.7608%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 639 outliers among 100000 measurements (0.64%)
  455 (0.46%) high mild
  184 (0.18%) high severe
sample_i16x16/SmallRng/distr
                        time:   [29.633 ns 29.662 ns 29.690 ns]
                        change: [-0.4125% -0.2789% -0.1469%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 135 outliers among 100000 measurements (0.14%)
  2 (0.00%) low mild
  99 (0.10%) high mild
  34 (0.03%) high severe

I was planning to then test whether or not run-time detection of CPU features was viable, but with the above results it may not even be worth asking.

@dhardy dhardy requested a review from josephlr January 14, 2025 16:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant