Population count comparison for Core i5 M540 @ 2.53GHz
Generated on: 2016-03-26
CPU: Core i5 M540 @ 2.53GHz
Compiler: GCC 4.9.2 (Debian 4.9.2-10)
Instruction set: SSE
Number of runs: 5
All times are given in seconds .
procedure
description
lookup-8
lookup in std::uint8_t[256] LUT
lookup-64
lookup in std::uint64_t[256] LUT
bit-parallel
naive bit parallel method
bit-parallel-optimized
a bit better bit parallel
bit-parallel-mul
bit-parallel with fewer instructions
harley-seal
Harley-Seal popcount (4th iteration)
sse-bit-parallel
SSE implementation of bit-parallel-optimized (unrolled)
sse-bit-parallel-original
SSE implementation of bit-parallel-optimized
sse-bit-parallel-better
SSE implementation of bit-parallel with fewer instructions
sse-harley-seal
SSE implementation of Harley-Seal
sse-lookup
SSSE3 variant using pshufb instruction (unrolled)
sse-lookup-original
SSSE3 variant using pshufb instruction
cpu
CPU instruction popcnt (64-bit variant)
sse-cpu
load data with SSE, then count bits using popcnt
builtin-popcnt
builtin for popcnt
builtin-popcnt32
builtin for popcnt (32-bit variant)
builtin-popcnt-unrolled
unrolled builtin-popcnt
builtin-popcnt-unrolled32
unrolled builtin-popcnt32
builtin-popcnt-unrolled-errata
unrolled builtin-popcnt avoiding false-dependency
builtin-popcnt-unrolled-errata-manual
unrolled builtin-popcnt avoiding false-dependency (asembly code)
builtin-popcnt-movdq
builtin-popcnt where data is loaded via SSE registers
builtin-popcnt-movdq-unrolled
builtin-popcnt-movdq unrolled
builtin-popcnt-movdq-unrolled_manual
builtin-popcnt-movdq unrolled (assembly code)
procedure
32 B
64 B
128 B
256 B
512 B
1024 B
2048 B
4096 B
lookup-8
2.30682
2.19949
2.15913
2.12274
3.40312
3.37968
3.36830
3.36327
lookup-64
2.30520
2.19460
2.15665
2.12098
3.39359
3.37388
3.36426
3.35894
bit-parallel
2.13734
1.99894
1.94310
1.90146
3.00652
2.98865
2.98472
2.97496
bit-parallel-optimized
1.37613
1.23683
1.16308
1.13311
1.79759
1.76732
1.75583
1.75727
bit-parallel-mul
1.27090
1.18854
1.14315
1.12855
1.77835
1.78635
1.76195
1.74768
harley-seal
1.48025
1.29651
0.79610
0.63324
0.90016
0.85752
0.83508
0.82458
sse-bit-parallel
2.69392
2.39464
1.41028
0.95430
1.16728
0.99899
0.91217
0.86662
sse-bit-parallel-original
2.36376
1.44147
1.03644
0.84391
1.17614
1.19961
1.14922
1.10145
sse-bit-parallel-better
2.72236
2.40755
1.36711
0.91569
1.10007
0.92206
0.83866
0.78519
sse-harley-seal
1.97896
1.27526
0.90073
0.43162
0.56059
0.47240
0.43234
0.41591
sse-lookup
0.75539
0.54069
0.35012
0.31090
0.46971
0.45597
0.44554
0.44298
sse-lookup-original
1.80431
1.12350
0.77243
0.59811
0.82280
0.89240
0.81698
0.75707
cpu
0.49250
0.37698
0.32133
0.29132
0.44240
0.43117
0.42524
0.34746
sse-cpu
2.46184
0.45881
0.37843
0.33627
0.51150
0.48877
0.48091
0.47667
builtin-popcnt
0.49263
0.44286
0.46112
0.42607
0.65241
0.66710
0.64707
0.63798
builtin-popcnt32
0.85318
0.81902
0.80633
0.79473
1.32992
1.30050
1.27972
1.26759
builtin-popcnt-unrolled
0.49270
0.37709
0.32117
0.29111
0.44242
0.43123
0.42522
0.28776
builtin-popcnt-unrolled32
0.91986
0.78649
0.72367
0.68804
1.07508
1.06164
0.74972
0.71987
builtin-popcnt-unrolled-errata
0.59095
0.44246
0.37058
0.33233
0.50194
0.48662
0.47922
0.28935
builtin-popcnt-unrolled-errata-manual
0.62275
0.45880
0.37864
0.33610
0.50461
0.48852
0.48039
0.29710
builtin-popcnt-movdq
0.39942
0.37706
0.35306
0.34013
0.53455
0.52919
0.45700
0.44145
builtin-popcnt-movdq-unrolled
0.52712
0.39337
0.33021
0.29545
0.44566
0.43309
0.42648
0.38394
builtin-popcnt-movdq-unrolled_manual
0.60784
0.46728
0.39911
0.36264
0.55238
0.53818
0.53127
0.39940
procedure
time [s]
relative time (less is better)
lookup-8
2.30682
██████████████████████████████████████████▎
lookup-64
2.30520
██████████████████████████████████████████▎
bit-parallel
2.13734
███████████████████████████████████████▎
bit-parallel-optimized
1.37613
█████████████████████████▎
bit-parallel-mul
1.27090
███████████████████████▎
harley-seal
1.48025
███████████████████████████▏
sse-bit-parallel
2.69392
█████████████████████████████████████████████████▍
sse-bit-parallel-original
2.36376
███████████████████████████████████████████▍
sse-bit-parallel-better
2.72236
██████████████████████████████████████████████████
sse-harley-seal
1.97896
████████████████████████████████████▎
sse-lookup
0.75539
█████████████▊
sse-lookup-original
1.80431
█████████████████████████████████▏
cpu
0.49250
█████████
sse-cpu
2.46184
█████████████████████████████████████████████▏
builtin-popcnt
0.49263
█████████
builtin-popcnt32
0.85318
███████████████▋
builtin-popcnt-unrolled
0.49270
█████████
builtin-popcnt-unrolled32
0.91986
████████████████▉
builtin-popcnt-unrolled-errata
0.59095
██████████▊
builtin-popcnt-unrolled-errata-manual
0.62275
███████████▍
builtin-popcnt-movdq
0.39942
███████▎
builtin-popcnt-movdq-unrolled
0.52712
█████████▋
builtin-popcnt-movdq-unrolled_manual
0.60784
███████████▏
procedure
time [s]
relative time (less is better)
lookup-8
2.19949
█████████████████████████████████████████████▋
lookup-64
2.19460
█████████████████████████████████████████████▌
bit-parallel
1.99894
█████████████████████████████████████████▌
bit-parallel-optimized
1.23683
█████████████████████████▋
bit-parallel-mul
1.18854
████████████████████████▋
harley-seal
1.29651
██████████████████████████▉
sse-bit-parallel
2.39464
█████████████████████████████████████████████████▋
sse-bit-parallel-original
1.44147
█████████████████████████████▉
sse-bit-parallel-better
2.40755
██████████████████████████████████████████████████
sse-harley-seal
1.27526
██████████████████████████▍
sse-lookup
0.54069
███████████▏
sse-lookup-original
1.12350
███████████████████████▎
cpu
0.37698
███████▊
sse-cpu
0.45881
█████████▌
builtin-popcnt
0.44286
█████████▏
builtin-popcnt32
0.81902
█████████████████
builtin-popcnt-unrolled
0.37709
███████▊
builtin-popcnt-unrolled32
0.78649
████████████████▎
builtin-popcnt-unrolled-errata
0.44246
█████████▏
builtin-popcnt-unrolled-errata-manual
0.45880
█████████▌
builtin-popcnt-movdq
0.37706
███████▊
builtin-popcnt-movdq-unrolled
0.39337
████████▏
builtin-popcnt-movdq-unrolled_manual
0.46728
█████████▋
procedure
time [s]
relative time (less is better)
lookup-8
2.15913
██████████████████████████████████████████████████
lookup-64
2.15665
█████████████████████████████████████████████████▉
bit-parallel
1.94310
████████████████████████████████████████████▉
bit-parallel-optimized
1.16308
██████████████████████████▉
bit-parallel-mul
1.14315
██████████████████████████▍
harley-seal
0.79610
██████████████████▍
sse-bit-parallel
1.41028
████████████████████████████████▋
sse-bit-parallel-original
1.03644
████████████████████████
sse-bit-parallel-better
1.36711
███████████████████████████████▋
sse-harley-seal
0.90073
████████████████████▊
sse-lookup
0.35012
████████
sse-lookup-original
0.77243
█████████████████▉
cpu
0.32133
███████▍
sse-cpu
0.37843
████████▊
builtin-popcnt
0.46112
██████████▋
builtin-popcnt32
0.80633
██████████████████▋
builtin-popcnt-unrolled
0.32117
███████▍
builtin-popcnt-unrolled32
0.72367
████████████████▊
builtin-popcnt-unrolled-errata
0.37058
████████▌
builtin-popcnt-unrolled-errata-manual
0.37864
████████▊
builtin-popcnt-movdq
0.35306
████████▏
builtin-popcnt-movdq-unrolled
0.33021
███████▋
builtin-popcnt-movdq-unrolled_manual
0.39911
█████████▏
procedure
time [s]
relative time (less is better)
lookup-8
2.12274
██████████████████████████████████████████████████
lookup-64
2.12098
█████████████████████████████████████████████████▉
bit-parallel
1.90146
████████████████████████████████████████████▊
bit-parallel-optimized
1.13311
██████████████████████████▋
bit-parallel-mul
1.12855
██████████████████████████▌
harley-seal
0.63324
██████████████▉
sse-bit-parallel
0.95430
██████████████████████▍
sse-bit-parallel-original
0.84391
███████████████████▉
sse-bit-parallel-better
0.91569
█████████████████████▌
sse-harley-seal
0.43162
██████████▏
sse-lookup
0.31090
███████▎
sse-lookup-original
0.59811
██████████████
cpu
0.29132
██████▊
sse-cpu
0.33627
███████▉
builtin-popcnt
0.42607
██████████
builtin-popcnt32
0.79473
██████████████████▋
builtin-popcnt-unrolled
0.29111
██████▊
builtin-popcnt-unrolled32
0.68804
████████████████▏
builtin-popcnt-unrolled-errata
0.33233
███████▊
builtin-popcnt-unrolled-errata-manual
0.33610
███████▉
builtin-popcnt-movdq
0.34013
████████
builtin-popcnt-movdq-unrolled
0.29545
██████▉
builtin-popcnt-movdq-unrolled_manual
0.36264
████████▌
procedure
time [s]
relative time (less is better)
lookup-8
3.40312
██████████████████████████████████████████████████
lookup-64
3.39359
█████████████████████████████████████████████████▊
bit-parallel
3.00652
████████████████████████████████████████████▏
bit-parallel-optimized
1.79759
██████████████████████████▍
bit-parallel-mul
1.77835
██████████████████████████▏
harley-seal
0.90016
█████████████▏
sse-bit-parallel
1.16728
█████████████████▏
sse-bit-parallel-original
1.17614
█████████████████▎
sse-bit-parallel-better
1.10007
████████████████▏
sse-harley-seal
0.56059
████████▏
sse-lookup
0.46971
██████▉
sse-lookup-original
0.82280
████████████
cpu
0.44240
██████▍
sse-cpu
0.51150
███████▌
builtin-popcnt
0.65241
█████████▌
builtin-popcnt32
1.32992
███████████████████▌
builtin-popcnt-unrolled
0.44242
██████▌
builtin-popcnt-unrolled32
1.07508
███████████████▊
builtin-popcnt-unrolled-errata
0.50194
███████▎
builtin-popcnt-unrolled-errata-manual
0.50461
███████▍
builtin-popcnt-movdq
0.53455
███████▊
builtin-popcnt-movdq-unrolled
0.44566
██████▌
builtin-popcnt-movdq-unrolled_manual
0.55238
████████
procedure
time [s]
relative time (less is better)
lookup-8
3.37968
██████████████████████████████████████████████████
lookup-64
3.37388
█████████████████████████████████████████████████▉
bit-parallel
2.98865
████████████████████████████████████████████▏
bit-parallel-optimized
1.76732
██████████████████████████▏
bit-parallel-mul
1.78635
██████████████████████████▍
harley-seal
0.85752
████████████▋
sse-bit-parallel
0.99899
██████████████▊
sse-bit-parallel-original
1.19961
█████████████████▋
sse-bit-parallel-better
0.92206
█████████████▋
sse-harley-seal
0.47240
██████▉
sse-lookup
0.45597
██████▋
sse-lookup-original
0.89240
█████████████▏
cpu
0.43117
██████▍
sse-cpu
0.48877
███████▏
builtin-popcnt
0.66710
█████████▊
builtin-popcnt32
1.30050
███████████████████▏
builtin-popcnt-unrolled
0.43123
██████▍
builtin-popcnt-unrolled32
1.06164
███████████████▋
builtin-popcnt-unrolled-errata
0.48662
███████▏
builtin-popcnt-unrolled-errata-manual
0.48852
███████▏
builtin-popcnt-movdq
0.52919
███████▊
builtin-popcnt-movdq-unrolled
0.43309
██████▍
builtin-popcnt-movdq-unrolled_manual
0.53818
███████▉
procedure
time [s]
relative time (less is better)
lookup-8
3.36830
██████████████████████████████████████████████████
lookup-64
3.36426
█████████████████████████████████████████████████▉
bit-parallel
2.98472
████████████████████████████████████████████▎
bit-parallel-optimized
1.75583
██████████████████████████
bit-parallel-mul
1.76195
██████████████████████████▏
harley-seal
0.83508
████████████▍
sse-bit-parallel
0.91217
█████████████▌
sse-bit-parallel-original
1.14922
█████████████████
sse-bit-parallel-better
0.83866
████████████▍
sse-harley-seal
0.43234
██████▍
sse-lookup
0.44554
██████▌
sse-lookup-original
0.81698
████████████▏
cpu
0.42524
██████▎
sse-cpu
0.48091
███████▏
builtin-popcnt
0.64707
█████████▌
builtin-popcnt32
1.27972
██████████████████▉
builtin-popcnt-unrolled
0.42522
██████▎
builtin-popcnt-unrolled32
0.74972
███████████▏
builtin-popcnt-unrolled-errata
0.47922
███████
builtin-popcnt-unrolled-errata-manual
0.48039
███████▏
builtin-popcnt-movdq
0.45700
██████▊
builtin-popcnt-movdq-unrolled
0.42648
██████▎
builtin-popcnt-movdq-unrolled_manual
0.53127
███████▉
procedure
time [s]
relative time (less is better)
lookup-8
3.36327
██████████████████████████████████████████████████
lookup-64
3.35894
█████████████████████████████████████████████████▉
bit-parallel
2.97496
████████████████████████████████████████████▏
bit-parallel-optimized
1.75727
██████████████████████████
bit-parallel-mul
1.74768
█████████████████████████▉
harley-seal
0.82458
████████████▎
sse-bit-parallel
0.86662
████████████▉
sse-bit-parallel-original
1.10145
████████████████▎
sse-bit-parallel-better
0.78519
███████████▋
sse-harley-seal
0.41591
██████▏
sse-lookup
0.44298
██████▌
sse-lookup-original
0.75707
███████████▎
cpu
0.34746
█████▏
sse-cpu
0.47667
███████
builtin-popcnt
0.63798
█████████▍
builtin-popcnt32
1.26759
██████████████████▊
builtin-popcnt-unrolled
0.28776
████▎
builtin-popcnt-unrolled32
0.71987
██████████▋
builtin-popcnt-unrolled-errata
0.28935
████▎
builtin-popcnt-unrolled-errata-manual
0.29710
████▍
builtin-popcnt-movdq
0.44145
██████▌
builtin-popcnt-movdq-unrolled
0.38394
█████▋
builtin-popcnt-movdq-unrolled_manual
0.39940
█████▉
procedure
32 B
64 B
128 B
256 B
512 B
1024 B
2048 B
4096 B
lookup-8
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
lookup-64
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
bit-parallel
1.08
1.10
1.11
1.12
1.13
1.13
1.13
1.13
bit-parallel-optimized
1.68
1.78
1.86
1.87
1.89
1.91
1.92
1.91
bit-parallel-mul
1.82
1.85
1.89
1.88
1.91
1.89
1.91
1.92
harley-seal
1.56
1.70
2.71
3.35
3.78
3.94
4.03
4.08
sse-bit-parallel
0.86
0.92
1.53
2.22
2.92
3.38
3.69
3.88
sse-bit-parallel-original
0.98
1.53
2.08
2.52
2.89
2.82
2.93
3.05
sse-bit-parallel-better
0.85
0.91
1.58
2.32
3.09
3.67
4.02
4.28
sse-harley-seal
1.17
1.72
2.40
4.92
6.07
7.15
7.79
8.09
sse-lookup
3.05
4.07
6.17
6.83
7.25
7.41
7.56
7.59
sse-lookup-original
1.28
1.96
2.80
3.55
4.14
3.79
4.12
4.44
cpu
4.68
5.83
6.72
7.29
7.69
7.84
7.92
9.68
sse-cpu
0.94
4.79
5.71
6.31
6.65
6.91
7.00
7.06
builtin-popcnt
4.68
4.97
4.68
4.98
5.22
5.07
5.21
5.27
builtin-popcnt32
2.70
2.69
2.68
2.67
2.56
2.60
2.63
2.65
builtin-popcnt-unrolled
4.68
5.83
6.72
7.29
7.69
7.84
7.92
11.69
builtin-popcnt-unrolled32
2.51
2.80
2.98
3.09
3.17
3.18
4.49
4.67
builtin-popcnt-unrolled-errata
3.90
4.97
5.83
6.39
6.78
6.95
7.03
11.62
builtin-popcnt-unrolled-errata-manual
3.70
4.79
5.70
6.32
6.74
6.92
7.01
11.32
builtin-popcnt-movdq
5.78
5.83
6.12
6.24
6.37
6.39
7.37
7.62
builtin-popcnt-movdq-unrolled
4.38
5.59
6.54
7.18
7.64
7.80
7.90
8.76
builtin-popcnt-movdq-unrolled_manual
3.80
4.71
5.41
5.85
6.16
6.28
6.34
8.42
Download westmere-m540-gcc4.9.2-sse.csv