Add SIMD benches #1553

dhardy · 2025-01-14T16:16:22Z

I accidentally closed #1552 with a force push, so...

Summary

Add benchmarks for uniform distribution / single-sample variates for Simd types: u8x8, u8x16, u8x32, u8x64, i16x8, i16x16, i16x32.

Also, change the name of the non-SIMD benchmarks to "x1" e.g. sample_i16x1/SmallRng/distr.

Motivation

This is a pre-requisite for any type of SIMD optimisation.

Details

Sample output (5800X):

$ cargo +nightly bench --bench uniform --features simd_support -- SmallRng
    Finished `bench` profile [optimized] target(s) in 0.03s
     Running benches/uniform.rs (target/release/deps/uniform-fff5b09bd763a405)
sample_i8x1/SmallRng/single
                        time:   [1.5923 ns 1.5929 ns 1.5936 ns]
Found 8408 outliers among 100000 measurements (8.41%)
  278 (0.28%) low severe
  303 (0.30%) low mild
  3554 (3.55%) high mild
  4273 (4.27%) high severe
Criterion.rs ERROR: Error in Gnuplot:          line 0: Can't plot with an empty x range!


Benchmarking sample_i8x1/SmallRng/distr: Warming up for 1.0000 s
Warning: Unable to complete 100000 samples in 3.0s. You may wish to increase target time to 5.3s, enable flat sampling, or reduce sample count to 53020.
sample_i8x1/SmallRng/distr
                        time:   [1.0419 ns 1.0422 ns 1.0425 ns]
Found 12826 outliers among 100000 measurements (12.83%)
  4043 (4.04%) high mild
  8783 (8.78%) high severe

sample_i16x1/SmallRng/single
                        time:   [1.4809 ns 1.4812 ns 1.4817 ns]
Found 22841 outliers among 100000 measurements (22.84%)
  105 (0.10%) high mild
  22736 (22.74%) high severe
Criterion.rs ERROR: Error in Gnuplot:          line 0: Can't plot with an empty x range!


Criterion.rs ERROR: Error in Gnuplot:          line 0: Can't plot with an empty x range!


Criterion.rs ERROR: Error in Gnuplot:          line 0: Can't plot with an empty x range!


Benchmarking sample_i16x1/SmallRng/distr: Warming up for 1.0000 s
Warning: Unable to complete 100000 samples in 3.0s. You may wish to increase target time to 5.3s, enable flat sampling, or reduce sample count to 52990.
sample_i16x1/SmallRng/distr
                        time:   [1.0454 ns 1.0458 ns 1.0461 ns]
Found 12165 outliers among 100000 measurements (12.16%)
  3140 (3.14%) high mild
  9025 (9.03%) high severe

sample_i32x1/SmallRng/single
                        time:   [3.1251 ns 3.1331 ns 3.1411 ns]
Found 12 outliers among 100000 measurements (0.01%)
  5 (0.01%) high mild
  7 (0.01%) high severe
sample_i32x1/SmallRng/distr
                        time:   [1.9073 ns 1.9139 ns 1.9206 ns]
Found 8875 outliers among 100000 measurements (8.88%)
  5546 (5.55%) high mild
  3329 (3.33%) high severe

sample_i64x1/SmallRng/single
                        time:   [4.3892 ns 4.3953 ns 4.4018 ns]
Found 73 outliers among 100000 measurements (0.07%)
  64 (0.06%) high mild
  9 (0.01%) high severe
sample_i64x1/SmallRng/distr
                        time:   [1.7550 ns 1.7616 ns 1.7681 ns]
Found 9194 outliers among 100000 measurements (9.19%)
  5378 (5.38%) high mild
  3816 (3.82%) high severe

sample_i128x1/SmallRng/single
                        time:   [9.7639 ns 9.7734 ns 9.7839 ns]
Found 162 outliers among 100000 measurements (0.16%)
  135 (0.14%) high mild
  27 (0.03%) high severe
sample_i128x1/SmallRng/distr
                        time:   [3.8971 ns 3.9066 ns 3.9166 ns]
Found 8601 outliers among 100000 measurements (8.60%)
  6231 (6.23%) high mild
  2370 (2.37%) high severe

sample_u8x8/SmallRng/single
                        time:   [23.973 ns 24.000 ns 24.027 ns]
Found 1602 outliers among 100000 measurements (1.60%)
  1209 (1.21%) low mild
  300 (0.30%) high mild
  93 (0.09%) high severe
sample_u8x8/SmallRng/distr
                        time:   [12.358 ns 12.379 ns 12.400 ns]
Found 300 outliers among 100000 measurements (0.30%)
  264 (0.26%) high mild
  36 (0.04%) high severe

sample_u8x16/SmallRng/single
                        time:   [43.275 ns 43.308 ns 43.344 ns]
Found 407 outliers among 100000 measurements (0.41%)
  2 (0.00%) low mild
  342 (0.34%) high mild
  63 (0.06%) high severe
sample_u8x16/SmallRng/distr
                        time:   [26.532 ns 26.560 ns 26.587 ns]
Found 289 outliers among 100000 measurements (0.29%)
  6 (0.01%) low mild
  243 (0.24%) high mild
  40 (0.04%) high severe

sample_u8x32/SmallRng/single
                        time:   [71.023 ns 71.052 ns 71.083 ns]
Found 1010 outliers among 100000 measurements (1.01%)
  350 (0.35%) low mild
  625 (0.62%) high mild
  35 (0.04%) high severe
sample_u8x32/SmallRng/distr
                        time:   [35.570 ns 35.598 ns 35.625 ns]
Found 906 outliers among 100000 measurements (0.91%)
  397 (0.40%) low mild
  402 (0.40%) high mild
  107 (0.11%) high severe

sample_u8x64/SmallRng/single
                        time:   [121.40 ns 121.44 ns 121.49 ns]
Found 3279 outliers among 100000 measurements (3.28%)
  206 (0.21%) low mild
  1918 (1.92%) high mild
  1155 (1.16%) high severe
sample_u8x64/SmallRng/distr
                        time:   [54.650 ns 54.680 ns 54.711 ns]
Found 845 outliers among 100000 measurements (0.84%)
  358 (0.36%) low mild
  363 (0.36%) high mild
  124 (0.12%) high severe

sample_i16x8/SmallRng/single
                        time:   [31.671 ns 31.700 ns 31.730 ns]
Found 1196 outliers among 100000 measurements (1.20%)
  475 (0.47%) low mild
  637 (0.64%) high mild
  84 (0.08%) high severe
sample_i16x8/SmallRng/distr
                        time:   [21.576 ns 21.602 ns 21.628 ns]
Found 335 outliers among 100000 measurements (0.34%)
  31 (0.03%) low mild
  283 (0.28%) high mild
  21 (0.02%) high severe

sample_i16x16/SmallRng/single
                        time:   [50.998 ns 51.044 ns 51.093 ns]
Found 1362 outliers among 100000 measurements (1.36%)
  755 (0.76%) high mild
  607 (0.61%) high severe
sample_i16x16/SmallRng/distr
                        time:   [29.717 ns 29.745 ns 29.773 ns]
Found 228 outliers among 100000 measurements (0.23%)
  1 (0.00%) low mild
  179 (0.18%) high mild
  48 (0.05%) high severe

sample_i16x32/SmallRng/single
                        time:   [83.029 ns 83.066 ns 83.106 ns]
Found 2168 outliers among 100000 measurements (2.17%)
  534 (0.53%) low mild
  1000 (1.00%) high mild
  634 (0.63%) high severe
sample_i16x32/SmallRng/distr
                        time:   [43.454 ns 43.482 ns 43.511 ns]
Found 1070 outliers among 100000 measurements (1.07%)
  516 (0.52%) low mild
  497 (0.50%) high mild
  57 (0.06%) high severe

Further motivation

In particular, I wanted to know whether the target_feature optimisations in src/distr/utils.rs are useful. Not using the sse2 and avx2 features on my CPU (which doesn't support AVX512) I get very similar results implying they may not be useful:

sample_i16x8/SmallRng/single
                        time:   [32.139 ns 32.168 ns 32.197 ns]
                        change: [+1.3338% +1.4755% +1.6156%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 1234 outliers among 100000 measurements (1.23%)
  491 (0.49%) low mild
  635 (0.64%) high mild
  108 (0.11%) high severe
sample_i16x8/SmallRng/distr
                        time:   [22.544 ns 22.570 ns 22.596 ns]
                        change: [+4.3081% +4.4816% +4.6666%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 510 outliers among 100000 measurements (0.51%)
  13 (0.01%) low mild
  437 (0.44%) high mild
  60 (0.06%) high severe
sample_i16x16/SmallRng/single
                        time:   [50.056 ns 50.089 ns 50.123 ns]
                        change: [-1.9848% -1.8715% -1.7608%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 639 outliers among 100000 measurements (0.64%)
  455 (0.46%) high mild
  184 (0.18%) high severe
sample_i16x16/SmallRng/distr
                        time:   [29.633 ns 29.662 ns 29.690 ns]
                        change: [-0.4125% -0.2789% -0.1469%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 135 outliers among 100000 measurements (0.14%)
  2 (0.00%) low mild
  99 (0.10%) high mild
  34 (0.03%) high severe

I was planning to then test whether or not run-time detection of CPU features was viable, but with the above results it may not even be worth asking.

dhardy added 2 commits January 14, 2025 10:11

benches/uniform: revise sample! macro for SIMD

039e740

benches/uniform: add SIMD benches

5ec6390

dhardy requested a review from josephlr January 14, 2025 16:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add SIMD benches #1553

Add SIMD benches #1553

dhardy commented Jan 14, 2025

Add SIMD benches #1553

Are you sure you want to change the base?

Add SIMD benches #1553

Conversation

dhardy commented Jan 14, 2025

Summary

Motivation

Details

Further motivation