Generated on: 2019-12-08
Contents
CPU: Xeon W-2104 CPU @ 3.20GHz
Compiler: gcc version 8.1.0 (Ubuntu 8.1.0-5ubuntu1~16.04)
Instruction set: AVX512BW
Number of runs: 5
All times are given in seconds.
procedure | description |
---|---|
lookup-8 | lookup in std::uint8_t[256] LUT |
lookup-64 | lookup in std::uint64_t[256] LUT |
bit-parallel | naive bit parallel method |
bit-parallel-optimized | a bit better bit parallel |
bit-parallel-mul | bit-parallel with fewer instructions |
bit-parallel32 | naive bit parallel method (32 bit) |
bit-parallel-optimized32 | a bit better bit parallel (32 bit) |
harley-seal | Harley-Seal popcount (4th iteration) |
sse-bit-parallel | SSE implementation of bit-parallel-optimized (unrolled) |
sse-bit-parallel-original | SSE implementation of bit-parallel-optimized |
sse-bit-parallel-better | SSE implementation of bit-parallel with fewer instructions |
sse-harley-seal | SSE implementation of Harley-Seal |
sse-lookup | SSSE3 variant using pshufb instruction (unrolled) |
sse-lookup-original | SSSE3 variant using pshufb instruction |
avx2-lookup | AVX2 variant using pshufb instruction (unrolled) |
avx2-lookup-original | AVX2 variant using pshufb instruction |
avx2-harley-seal | AVX2 implementation of Harley-Seal |
cpu | CPU instruction popcnt (64-bit variant) |
sse-cpu | load data with SSE, then count bits using popcnt |
avx2-cpu | load data with AVX2, then count bits using popcnt |
avx512-harley-seal | AVX512 implementation of Harley-Seal |
avx512bw-shuf | AVX512BW implementation uses shuffle instruction |
builtin-popcnt | builtin for popcnt |
builtin-popcnt32 | builtin for popcnt (32-bit variant) |
builtin-popcnt-unrolled | unrolled builtin-popcnt |
builtin-popcnt-unrolled32 | unrolled builtin-popcnt32 |
builtin-popcnt-unrolled-errata | unrolled builtin-popcnt avoiding false-dependency |
builtin-popcnt-unrolled-errata-manual | unrolled builtin-popcnt avoiding false-dependency (asembly code) |
builtin-popcnt-movdq | builtin-popcnt where data is loaded via SSE registers |
builtin-popcnt-movdq-unrolled | builtin-popcnt-movdq unrolled |
builtin-popcnt-movdq-unrolled_manual | builtin-popcnt-movdq unrolled (assembly code) |
procedure | 32 B | 64 B | 128 B | 256 B | 512 B | 1024 B | 2048 B | 4096 B |
---|---|---|---|---|---|---|---|---|
lookup-8 | 1.19116 | 1.09751 | 1.05118 | 1.02817 | 1.68133 | 1.64420 | 1.62533 | 1.61539 |
lookup-64 | 1.16511 | 1.09198 | 1.05227 | 1.03253 | 1.69700 | 1.65446 | 1.63190 | 1.62115 |
bit-parallel | 1.26917 | 1.14385 | 1.08548 | 1.05825 | 1.67150 | 1.66064 | 1.65513 | 1.66514 |
bit-parallel-optimized | 0.90882 | 0.78403 | 0.73039 | 0.70171 | 1.09967 | 1.08818 | 1.08248 | 1.09080 |
bit-parallel-mul | 0.75216 | 0.67385 | 0.64054 | 0.62674 | 0.99193 | 1.02415 | 1.00252 | 0.99264 |
bit-parallel32 | 1.81757 | 1.74194 | 1.71687 | 1.70441 | 2.71705 | 2.71217 | 2.73145 | 2.71965 |
bit-parallel-optimized32 | 1.40133 | 1.32995 | 1.29086 | 1.27213 | 2.02035 | 2.01282 | 2.02735 | 2.01658 |
harley-seal | 1.01330 | 0.83291 | 0.50931 | 0.39572 | 0.53908 | 0.49207 | 0.46857 | 0.46524 |
sse-bit-parallel | 2.00714 | 1.61196 | 1.29326 | 0.78731 | 0.90954 | 0.73791 | 0.64884 | 0.60464 |
sse-bit-parallel-original | 1.21799 | 0.78844 | 0.58476 | 0.49625 | 0.72684 | 0.68865 | 0.67598 | 0.67192 |
sse-bit-parallel-better | 1.64924 | 1.55477 | 0.93940 | 0.62167 | 0.73995 | 0.61430 | 0.55213 | 0.52073 |
sse-harley-seal | 1.22938 | 0.78968 | 0.56908 | 0.27129 | 0.33717 | 0.28957 | 0.26622 | 0.25450 |
sse-lookup | 0.50139 | 0.35814 | 0.24174 | 0.20464 | 0.31130 | 0.30164 | 0.29683 | 0.29483 |
sse-lookup-original | 1.64531 | 0.95388 | 0.60114 | 0.43871 | 0.58401 | 0.53005 | 0.52198 | 0.49898 |
avx2-lookup | 0.47421 | 0.30170 | 0.20555 | 0.14685 | 0.19706 | 0.16914 | 0.15924 | 0.15487 |
avx2-lookup-original | 1.50636 | 0.88887 | 0.52544 | 0.55798 | 0.43556 | 0.36215 | 0.33397 | 0.32436 |
avx2-harley-seal | 1.03406 | 0.58683 | 0.37282 | 0.26332 | 0.20285 | 0.15388 | 0.13064 | 0.11857 |
cpu | 0.34469 | 0.23502 | 0.16451 | 0.13317 | 0.20055 | 0.20061 | 0.20058 | 0.20631 |
sse-cpu | 1.72117 | 0.25092 | 0.21348 | 0.19193 | 0.29147 | 0.28359 | 0.27968 | 0.27772 |
avx2-cpu | 2.80064 | 2.11516 | 0.28004 | 0.23874 | 0.35099 | 0.33524 | 0.32587 | 0.32200 |
avx512-harley-seal | 3.94491 | 0.81799 | 0.46683 | 0.29606 | 0.33146 | 0.12317 | 0.08344 | 0.06327 |
avx512bw-shuf | 2.10369 | 1.78203 | 1.01979 | 0.66235 | 0.63044 | 0.39010 | 0.22794 | 0.18797 |
builtin-popcnt | 0.34478 | 0.29778 | 0.27428 | 0.26253 | 0.41058 | 0.44010 | 0.42057 | 0.41129 |
builtin-popcnt32 | 0.50161 | 0.50126 | 0.50292 | 0.50593 | 0.90008 | 0.87112 | 0.84699 | 0.84165 |
builtin-popcnt-unrolled | 0.31336 | 0.25068 | 0.21934 | 0.20368 | 0.31310 | 0.30703 | 0.30345 | 0.30967 |
builtin-popcnt-unrolled32 | 0.43750 | 0.40176 | 0.32173 | 0.31296 | 0.48960 | 0.48478 | 0.50331 | 0.49151 |
builtin-popcnt-unrolled-errata | 0.28202 | 0.20368 | 0.15673 | 0.13317 | 0.20368 | 0.20211 | 0.20133 | 0.20848 |
builtin-popcnt-unrolled-errata-manual | 0.45735 | 0.30761 | 0.23215 | 0.19540 | 0.28193 | 0.26632 | 0.25850 | 0.26378 |
builtin-popcnt-movdq | 0.21151 | 0.18420 | 0.17814 | 0.17976 | 0.29001 | 0.29146 | 0.30206 | 0.29324 |
builtin-popcnt-movdq-unrolled | 0.32505 | 0.23502 | 0.18807 | 0.16586 | 0.24785 | 0.23762 | 0.23254 | 0.23797 |
builtin-popcnt-movdq-unrolled_manual | 0.40737 | 0.25910 | 0.20416 | 0.18313 | 0.28186 | 0.26956 | 0.26400 | 0.28840 |
procedure | time [s] | relative time (less is better) |
---|---|---|
lookup-8 | 1.19116 | ███████████████ |
lookup-64 | 1.16511 | ██████████████▊ |
bit-parallel | 1.26917 | ████████████████ |
bit-parallel-optimized | 0.90882 | ███████████▌ |
bit-parallel-mul | 0.75216 | █████████▌ |
bit-parallel32 | 1.81757 | ███████████████████████ |
bit-parallel-optimized32 | 1.40133 | █████████████████▊ |
harley-seal | 1.01330 | ████████████▊ |
sse-bit-parallel | 2.00714 | █████████████████████████▍ |
sse-bit-parallel-original | 1.21799 | ███████████████▍ |
sse-bit-parallel-better | 1.64924 | ████████████████████▉ |
sse-harley-seal | 1.22938 | ███████████████▌ |
sse-lookup | 0.50139 | ██████▎ |
sse-lookup-original | 1.64531 | ████████████████████▊ |
avx2-lookup | 0.47421 | ██████ |
avx2-lookup-original | 1.50636 | ███████████████████ |
avx2-harley-seal | 1.03406 | █████████████ |
cpu | 0.34469 | ████▎ |
sse-cpu | 1.72117 | █████████████████████▊ |
avx2-cpu | 2.80064 | ███████████████████████████████████▍ |
avx512-harley-seal | 3.94491 | ██████████████████████████████████████████████████ |
avx512bw-shuf | 2.10369 | ██████████████████████████▋ |
builtin-popcnt | 0.34478 | ████▎ |
builtin-popcnt32 | 0.50161 | ██████▎ |
builtin-popcnt-unrolled | 0.31336 | ███▉ |
builtin-popcnt-unrolled32 | 0.43750 | █████▌ |
builtin-popcnt-unrolled-errata | 0.28202 | ███▌ |
builtin-popcnt-unrolled-errata-manual | 0.45735 | █████▊ |
builtin-popcnt-movdq | 0.21151 | ██▋ |
builtin-popcnt-movdq-unrolled | 0.32505 | ████ |
builtin-popcnt-movdq-unrolled_manual | 0.40737 | █████▏ |
procedure | time [s] | relative time (less is better) |
---|---|---|
lookup-8 | 1.09751 | █████████████████████████▉ |
lookup-64 | 1.09198 | █████████████████████████▊ |
bit-parallel | 1.14385 | ███████████████████████████ |
bit-parallel-optimized | 0.78403 | ██████████████████▌ |
bit-parallel-mul | 0.67385 | ███████████████▉ |
bit-parallel32 | 1.74194 | █████████████████████████████████████████▏ |
bit-parallel-optimized32 | 1.32995 | ███████████████████████████████▍ |
harley-seal | 0.83291 | ███████████████████▋ |
sse-bit-parallel | 1.61196 | ██████████████████████████████████████ |
sse-bit-parallel-original | 0.78844 | ██████████████████▋ |
sse-bit-parallel-better | 1.55477 | ████████████████████████████████████▊ |
sse-harley-seal | 0.78968 | ██████████████████▋ |
sse-lookup | 0.35814 | ████████▍ |
sse-lookup-original | 0.95388 | ██████████████████████▌ |
avx2-lookup | 0.30170 | ███████▏ |
avx2-lookup-original | 0.88887 | █████████████████████ |
avx2-harley-seal | 0.58683 | █████████████▊ |
cpu | 0.23502 | █████▌ |
sse-cpu | 0.25092 | █████▉ |
avx2-cpu | 2.11516 | ██████████████████████████████████████████████████ |
avx512-harley-seal | 0.81799 | ███████████████████▎ |
avx512bw-shuf | 1.78203 | ██████████████████████████████████████████▏ |
builtin-popcnt | 0.29778 | ███████ |
builtin-popcnt32 | 0.50126 | ███████████▊ |
builtin-popcnt-unrolled | 0.25068 | █████▉ |
builtin-popcnt-unrolled32 | 0.40176 | █████████▍ |
builtin-popcnt-unrolled-errata | 0.20368 | ████▊ |
builtin-popcnt-unrolled-errata-manual | 0.30761 | ███████▎ |
builtin-popcnt-movdq | 0.18420 | ████▎ |
builtin-popcnt-movdq-unrolled | 0.23502 | █████▌ |
builtin-popcnt-movdq-unrolled_manual | 0.25910 | ██████ |
procedure | time [s] | relative time (less is better) |
---|---|---|
lookup-8 | 1.05118 | ██████████████████████████████▌ |
lookup-64 | 1.05227 | ██████████████████████████████▋ |
bit-parallel | 1.08548 | ███████████████████████████████▌ |
bit-parallel-optimized | 0.73039 | █████████████████████▎ |
bit-parallel-mul | 0.64054 | ██████████████████▋ |
bit-parallel32 | 1.71687 | ██████████████████████████████████████████████████ |
bit-parallel-optimized32 | 1.29086 | █████████████████████████████████████▌ |
harley-seal | 0.50931 | ██████████████▊ |
sse-bit-parallel | 1.29326 | █████████████████████████████████████▋ |
sse-bit-parallel-original | 0.58476 | █████████████████ |
sse-bit-parallel-better | 0.93940 | ███████████████████████████▎ |
sse-harley-seal | 0.56908 | ████████████████▌ |
sse-lookup | 0.24174 | ███████ |
sse-lookup-original | 0.60114 | █████████████████▌ |
avx2-lookup | 0.20555 | █████▉ |
avx2-lookup-original | 0.52544 | ███████████████▎ |
avx2-harley-seal | 0.37282 | ██████████▊ |
cpu | 0.16451 | ████▊ |
sse-cpu | 0.21348 | ██████▏ |
avx2-cpu | 0.28004 | ████████▏ |
avx512-harley-seal | 0.46683 | █████████████▌ |
avx512bw-shuf | 1.01979 | █████████████████████████████▋ |
builtin-popcnt | 0.27428 | ███████▉ |
builtin-popcnt32 | 0.50292 | ██████████████▋ |
builtin-popcnt-unrolled | 0.21934 | ██████▍ |
builtin-popcnt-unrolled32 | 0.32173 | █████████▎ |
builtin-popcnt-unrolled-errata | 0.15673 | ████▌ |
builtin-popcnt-unrolled-errata-manual | 0.23215 | ██████▊ |
builtin-popcnt-movdq | 0.17814 | █████▏ |
builtin-popcnt-movdq-unrolled | 0.18807 | █████▍ |
builtin-popcnt-movdq-unrolled_manual | 0.20416 | █████▉ |
procedure | time [s] | relative time (less is better) |
---|---|---|
lookup-8 | 1.02817 | ██████████████████████████████▏ |
lookup-64 | 1.03253 | ██████████████████████████████▎ |
bit-parallel | 1.05825 | ███████████████████████████████ |
bit-parallel-optimized | 0.70171 | ████████████████████▌ |
bit-parallel-mul | 0.62674 | ██████████████████▍ |
bit-parallel32 | 1.70441 | ██████████████████████████████████████████████████ |
bit-parallel-optimized32 | 1.27213 | █████████████████████████████████████▎ |
harley-seal | 0.39572 | ███████████▌ |
sse-bit-parallel | 0.78731 | ███████████████████████ |
sse-bit-parallel-original | 0.49625 | ██████████████▌ |
sse-bit-parallel-better | 0.62167 | ██████████████████▏ |
sse-harley-seal | 0.27129 | ███████▉ |
sse-lookup | 0.20464 | ██████ |
sse-lookup-original | 0.43871 | ████████████▊ |
avx2-lookup | 0.14685 | ████▎ |
avx2-lookup-original | 0.55798 | ████████████████▎ |
avx2-harley-seal | 0.26332 | ███████▋ |
cpu | 0.13317 | ███▉ |
sse-cpu | 0.19193 | █████▋ |
avx2-cpu | 0.23874 | ███████ |
avx512-harley-seal | 0.29606 | ████████▋ |
avx512bw-shuf | 0.66235 | ███████████████████▍ |
builtin-popcnt | 0.26253 | ███████▋ |
builtin-popcnt32 | 0.50593 | ██████████████▊ |
builtin-popcnt-unrolled | 0.20368 | █████▉ |
builtin-popcnt-unrolled32 | 0.31296 | █████████▏ |
builtin-popcnt-unrolled-errata | 0.13317 | ███▉ |
builtin-popcnt-unrolled-errata-manual | 0.19540 | █████▋ |
builtin-popcnt-movdq | 0.17976 | █████▎ |
builtin-popcnt-movdq-unrolled | 0.16586 | ████▊ |
builtin-popcnt-movdq-unrolled_manual | 0.18313 | █████▎ |
procedure | time [s] | relative time (less is better) |
---|---|---|
lookup-8 | 1.68133 | ██████████████████████████████▉ |
lookup-64 | 1.69700 | ███████████████████████████████▏ |
bit-parallel | 1.67150 | ██████████████████████████████▊ |
bit-parallel-optimized | 1.09967 | ████████████████████▏ |
bit-parallel-mul | 0.99193 | ██████████████████▎ |
bit-parallel32 | 2.71705 | ██████████████████████████████████████████████████ |
bit-parallel-optimized32 | 2.02035 | █████████████████████████████████████▏ |
harley-seal | 0.53908 | █████████▉ |
sse-bit-parallel | 0.90954 | ████████████████▋ |
sse-bit-parallel-original | 0.72684 | █████████████▍ |
sse-bit-parallel-better | 0.73995 | █████████████▌ |
sse-harley-seal | 0.33717 | ██████▏ |
sse-lookup | 0.31130 | █████▋ |
sse-lookup-original | 0.58401 | ██████████▋ |
avx2-lookup | 0.19706 | ███▋ |
avx2-lookup-original | 0.43556 | ████████ |
avx2-harley-seal | 0.20285 | ███▋ |
cpu | 0.20055 | ███▋ |
sse-cpu | 0.29147 | █████▎ |
avx2-cpu | 0.35099 | ██████▍ |
avx512-harley-seal | 0.33146 | ██████ |
avx512bw-shuf | 0.63044 | ███████████▌ |
builtin-popcnt | 0.41058 | ███████▌ |
builtin-popcnt32 | 0.90008 | ████████████████▌ |
builtin-popcnt-unrolled | 0.31310 | █████▊ |
builtin-popcnt-unrolled32 | 0.48960 | █████████ |
builtin-popcnt-unrolled-errata | 0.20368 | ███▋ |
builtin-popcnt-unrolled-errata-manual | 0.28193 | █████▏ |
builtin-popcnt-movdq | 0.29001 | █████▎ |
builtin-popcnt-movdq-unrolled | 0.24785 | ████▌ |
builtin-popcnt-movdq-unrolled_manual | 0.28186 | █████▏ |
procedure | time [s] | relative time (less is better) |
---|---|---|
lookup-8 | 1.64420 | ██████████████████████████████▎ |
lookup-64 | 1.65446 | ██████████████████████████████▌ |
bit-parallel | 1.66064 | ██████████████████████████████▌ |
bit-parallel-optimized | 1.08818 | ████████████████████ |
bit-parallel-mul | 1.02415 | ██████████████████▉ |
bit-parallel32 | 2.71217 | ██████████████████████████████████████████████████ |
bit-parallel-optimized32 | 2.01282 | █████████████████████████████████████ |
harley-seal | 0.49207 | █████████ |
sse-bit-parallel | 0.73791 | █████████████▌ |
sse-bit-parallel-original | 0.68865 | ████████████▋ |
sse-bit-parallel-better | 0.61430 | ███████████▎ |
sse-harley-seal | 0.28957 | █████▎ |
sse-lookup | 0.30164 | █████▌ |
sse-lookup-original | 0.53005 | █████████▊ |
avx2-lookup | 0.16914 | ███ |
avx2-lookup-original | 0.36215 | ██████▋ |
avx2-harley-seal | 0.15388 | ██▊ |
cpu | 0.20061 | ███▋ |
sse-cpu | 0.28359 | █████▏ |
avx2-cpu | 0.33524 | ██████▏ |
avx512-harley-seal | 0.12317 | ██▎ |
avx512bw-shuf | 0.39010 | ███████▏ |
builtin-popcnt | 0.44010 | ████████ |
builtin-popcnt32 | 0.87112 | ████████████████ |
builtin-popcnt-unrolled | 0.30703 | █████▋ |
builtin-popcnt-unrolled32 | 0.48478 | ████████▉ |
builtin-popcnt-unrolled-errata | 0.20211 | ███▋ |
builtin-popcnt-unrolled-errata-manual | 0.26632 | ████▉ |
builtin-popcnt-movdq | 0.29146 | █████▎ |
builtin-popcnt-movdq-unrolled | 0.23762 | ████▍ |
builtin-popcnt-movdq-unrolled_manual | 0.26956 | ████▉ |
procedure | time [s] | relative time (less is better) |
---|---|---|
lookup-8 | 1.62533 | █████████████████████████████▊ |
lookup-64 | 1.63190 | █████████████████████████████▊ |
bit-parallel | 1.65513 | ██████████████████████████████▎ |
bit-parallel-optimized | 1.08248 | ███████████████████▊ |
bit-parallel-mul | 1.00252 | ██████████████████▎ |
bit-parallel32 | 2.73145 | ██████████████████████████████████████████████████ |
bit-parallel-optimized32 | 2.02735 | █████████████████████████████████████ |
harley-seal | 0.46857 | ████████▌ |
sse-bit-parallel | 0.64884 | ███████████▉ |
sse-bit-parallel-original | 0.67598 | ████████████▎ |
sse-bit-parallel-better | 0.55213 | ██████████ |
sse-harley-seal | 0.26622 | ████▊ |
sse-lookup | 0.29683 | █████▍ |
sse-lookup-original | 0.52198 | █████████▌ |
avx2-lookup | 0.15924 | ██▉ |
avx2-lookup-original | 0.33397 | ██████ |
avx2-harley-seal | 0.13064 | ██▍ |
cpu | 0.20058 | ███▋ |
sse-cpu | 0.27968 | █████ |
avx2-cpu | 0.32587 | █████▉ |
avx512-harley-seal | 0.08344 | █▌ |
avx512bw-shuf | 0.22794 | ████▏ |
builtin-popcnt | 0.42057 | ███████▋ |
builtin-popcnt32 | 0.84699 | ███████████████▌ |
builtin-popcnt-unrolled | 0.30345 | █████▌ |
builtin-popcnt-unrolled32 | 0.50331 | █████████▏ |
builtin-popcnt-unrolled-errata | 0.20133 | ███▋ |
builtin-popcnt-unrolled-errata-manual | 0.25850 | ████▋ |
builtin-popcnt-movdq | 0.30206 | █████▌ |
builtin-popcnt-movdq-unrolled | 0.23254 | ████▎ |
builtin-popcnt-movdq-unrolled_manual | 0.26400 | ████▊ |
procedure | time [s] | relative time (less is better) |
---|---|---|
lookup-8 | 1.61539 | █████████████████████████████▋ |
lookup-64 | 1.62115 | █████████████████████████████▊ |
bit-parallel | 1.66514 | ██████████████████████████████▌ |
bit-parallel-optimized | 1.09080 | ████████████████████ |
bit-parallel-mul | 0.99264 | ██████████████████▏ |
bit-parallel32 | 2.71965 | ██████████████████████████████████████████████████ |
bit-parallel-optimized32 | 2.01658 | █████████████████████████████████████ |
harley-seal | 0.46524 | ████████▌ |
sse-bit-parallel | 0.60464 | ███████████ |
sse-bit-parallel-original | 0.67192 | ████████████▎ |
sse-bit-parallel-better | 0.52073 | █████████▌ |
sse-harley-seal | 0.25450 | ████▋ |
sse-lookup | 0.29483 | █████▍ |
sse-lookup-original | 0.49898 | █████████▏ |
avx2-lookup | 0.15487 | ██▊ |
avx2-lookup-original | 0.32436 | █████▉ |
avx2-harley-seal | 0.11857 | ██▏ |
cpu | 0.20631 | ███▊ |
sse-cpu | 0.27772 | █████ |
avx2-cpu | 0.32200 | █████▉ |
avx512-harley-seal | 0.06327 | █▏ |
avx512bw-shuf | 0.18797 | ███▍ |
builtin-popcnt | 0.41129 | ███████▌ |
builtin-popcnt32 | 0.84165 | ███████████████▍ |
builtin-popcnt-unrolled | 0.30967 | █████▋ |
builtin-popcnt-unrolled32 | 0.49151 | █████████ |
builtin-popcnt-unrolled-errata | 0.20848 | ███▊ |
builtin-popcnt-unrolled-errata-manual | 0.26378 | ████▊ |
builtin-popcnt-movdq | 0.29324 | █████▍ |
builtin-popcnt-movdq-unrolled | 0.23797 | ████▎ |
builtin-popcnt-movdq-unrolled_manual | 0.28840 | █████▎ |
procedure | 32 B | 64 B | 128 B | 256 B | 512 B | 1024 B | 2048 B | 4096 B |
---|---|---|---|---|---|---|---|---|
lookup-8 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
lookup-64 | 1.02 | 1.01 | 1.00 | 1.00 | 0.99 | 0.99 | 1.00 | 1.00 |
bit-parallel | 0.94 | 0.96 | 0.97 | 0.97 | 1.01 | 0.99 | 0.98 | 0.97 |
bit-parallel-optimized | 1.31 | 1.40 | 1.44 | 1.47 | 1.53 | 1.51 | 1.50 | 1.48 |
bit-parallel-mul | 1.58 | 1.63 | 1.64 | 1.64 | 1.70 | 1.61 | 1.62 | 1.63 |
bit-parallel32 | 0.66 | 0.63 | 0.61 | 0.60 | 0.62 | 0.61 | 0.60 | 0.59 |
bit-parallel-optimized32 | 0.85 | 0.83 | 0.81 | 0.81 | 0.83 | 0.82 | 0.80 | 0.80 |
harley-seal | 1.18 | 1.32 | 2.06 | 2.60 | 3.12 | 3.34 | 3.47 | 3.47 |
sse-bit-parallel | 0.59 | 0.68 | 0.81 | 1.31 | 1.85 | 2.23 | 2.50 | 2.67 |
sse-bit-parallel-original | 0.98 | 1.39 | 1.80 | 2.07 | 2.31 | 2.39 | 2.40 | 2.40 |
sse-bit-parallel-better | 0.72 | 0.71 | 1.12 | 1.65 | 2.27 | 2.68 | 2.94 | 3.10 |
sse-harley-seal | 0.97 | 1.39 | 1.85 | 3.79 | 4.99 | 5.68 | 6.11 | 6.35 |
sse-lookup | 2.38 | 3.06 | 4.35 | 5.02 | 5.40 | 5.45 | 5.48 | 5.48 |
sse-lookup-original | 0.72 | 1.15 | 1.75 | 2.34 | 2.88 | 3.10 | 3.11 | 3.24 |
avx2-lookup | 2.51 | 3.64 | 5.11 | 7.00 | 8.53 | 9.72 | 10.21 | 10.43 |
avx2-lookup-original | 0.79 | 1.23 | 2.00 | 1.84 | 3.86 | 4.54 | 4.87 | 4.98 |
avx2-harley-seal | 1.15 | 1.87 | 2.82 | 3.90 | 8.29 | 10.68 | 12.44 | 13.62 |
cpu | 3.46 | 4.67 | 6.39 | 7.72 | 8.38 | 8.20 | 8.10 | 7.83 |
sse-cpu | 0.69 | 4.37 | 4.92 | 5.36 | 5.77 | 5.80 | 5.81 | 5.82 |
avx2-cpu | 0.43 | 0.52 | 3.75 | 4.31 | 4.79 | 4.90 | 4.99 | 5.02 |
avx512-harley-seal | 0.30 | 1.34 | 2.25 | 3.47 | 5.07 | 13.35 | 19.48 | 25.53 |
avx512bw-shuf | 0.57 | 0.62 | 1.03 | 1.55 | 2.67 | 4.21 | 7.13 | 8.59 |
builtin-popcnt | 3.45 | 3.69 | 3.83 | 3.92 | 4.09 | 3.74 | 3.86 | 3.93 |
builtin-popcnt32 | 2.37 | 2.19 | 2.09 | 2.03 | 1.87 | 1.89 | 1.92 | 1.92 |
builtin-popcnt-unrolled | 3.80 | 4.38 | 4.79 | 5.05 | 5.37 | 5.36 | 5.36 | 5.22 |
builtin-popcnt-unrolled32 | 2.72 | 2.73 | 3.27 | 3.29 | 3.43 | 3.39 | 3.23 | 3.29 |
builtin-popcnt-unrolled-errata | 4.22 | 5.39 | 6.71 | 7.72 | 8.25 | 8.14 | 8.07 | 7.75 |
builtin-popcnt-unrolled-errata-manual | 2.60 | 3.57 | 4.53 | 5.26 | 5.96 | 6.17 | 6.29 | 6.12 |
builtin-popcnt-movdq | 5.63 | 5.96 | 5.90 | 5.72 | 5.80 | 5.64 | 5.38 | 5.51 |
builtin-popcnt-movdq-unrolled | 3.66 | 4.67 | 5.59 | 6.20 | 6.78 | 6.92 | 6.99 | 6.79 |
builtin-popcnt-movdq-unrolled_manual | 2.92 | 4.24 | 5.15 | 5.61 | 5.97 | 6.10 | 6.16 | 5.60 |
Download skylake-x-w-2104-gcc8.1.0.csv