This is a cpu tool for benchmarking the peak performance of floating-points and AI ISAs.
It can automatically sense the local SIMD|DSA ISAs while compiling.
OS | x86-64 | arm64 | riscv64 | loongarch64 |
---|---|---|---|---|
Linux | yes | yes | yes | yes |
MacOS | no | no | no | no |
Windows | no | no | no | no |
Arch | ISA | Feature | Data Type | Description |
---|---|---|---|---|
SIMD | SSE | Vector | fp32 | Before Sandy Bridge |
SIMD | SSE2 | Vector | fp64 | Before Sandy Bridge |
SIMD | AVX | Vector | fp32/fp64 | From Sandy Bridge |
SIMD | FMA | Vector | fp32/fp64 | From Haswell/Zen |
SIMD | AVX512f | Vector | fp32/fp64 | From Skylake X/Zen4 |
SIMD | AVX512_VNNI | Vector | int8/int16 | From IceLake |
SIMD | AVX_VNNI | Vector | int8/int16 | From Alder Lake |
SIMD | AVX512_FP16 | Vector | fp16 | From Intel Sapphire Rapids |
SIMD | AVX512_BF16 | Vector | bf16 | From AMD Zen4 |
SIMD | AVX_VNNI_INT8 | Vector | int8 | Unknown |
DSA | AMX_INT8 | Matrix | int8 | From Intel Sapphire Rapids |
DSA | AMX_BF16 | Matrix | bf16 | From Intel Sapphire Rapids |
Arch | ISA | Feature | Data Type | Description |
---|---|---|---|---|
SIMD | asimd | Vector | fp32/fp64 | From Cortex-A57/A53 |
SIMD | asimd_hp | Vector | fp16 | From Cortex-A75/A55 |
SIMD | asimd_dp | Vector | int8 | From Cortex-A75/A55 |
SIMD | bf16 | Matrix | bf16 | From Cortex-X2/A710/A510 |
SIMD | i8mm | Matrix | int8 | From Cortex-X2/A710/A510 |
Arch | ISA | Feature | Data Type | Description |
---|---|---|---|---|
SIMD | V | Vector | fp16/fp32/fp64 | From RISC-V "V" vector extension. Version 1.0 |
DSA | ime | Matrix | int8 | From SpacemiT-X60 |
NOTE: ime is a SpacemiT custom vendor extension.
Arch | ISA | Feature | Data Type | Description |
---|---|---|---|---|
SIMD | LASX | Vector | fp32/fp64 | From Loongson 3A5000 |
SIMD | LSX | Vector | fp32/fp64 | From Loongson 3A5000 |
Scalar | FP | Scalar | fp32/fp64 | From Loongson 3A5000 |
build x64 version:
./build_x64.sh
build arm64 version:
./build_arm64.sh
build riscv64 version:
./build_riscv64.sh
build loongarch64 version:
./build_loongarch64.sh
clean:
./clean.sh
./cpufp --thread_pool=[xxx] --idle_time=yyy
--thread_pool: [xxx] is the list of cpu thread to benchmarking, from setting affinities. Please reference the result of lstopo command. For example, [0,3,5-8,13-15].
--idle_time: the interval time(sec) between any two adjacent benchmarks, default is 0.
x86-64 cpufp benchmark results
riscv64 cpufp benchmark results
loongarch64 cpufp benchmark results
Add armv9(SVE, SVE2 & SME) Supports.