Skip to content

Latest commit

 

History

History
178 lines (154 loc) · 8.38 KB

README.md

File metadata and controls

178 lines (154 loc) · 8.38 KB

Benchmark

Benchmark 是评估目标硬件平台网络模型运行速度的简单途径,只依赖于网络结构(xxx_benchmark.tmfile)即可。

生成 benchmark 专用 tmfile

虽然可以直接使用完整的 tmfile 运行 benchmark 示例,但是我们建议采用 benchmark 专用 tmfile 模型,节省文件传输时间。

  1. 在使用模型转换工具 convert_model_to_tm 之前,设置环境变量 :

    $ export TM_FOR_BENCHMARK=1
  2. 将原始框架模型转换为 tmfile benchmark 专用模型,以 Caffe 框架的 mobilenet_v1 举例:

    $ ./comvert_tm_tool -f caffe -p mobilenet_v1.prototxt -m mobilenet_v1.caffemodel -o mobilenet_v1_benchmark.tmfile

    我们已经提前转换了一小部分评估模型在 benchmark/models 中。


编译

默认完成 Tengine Lite 编译,目标平台的 benchmark 可执行程序存放在 build-dir/install/bin/tm_benchmark

bug1989@DESKTOP-SGN0H2A:/mnt/d/ubuntu/gitlab/build-linux$ tree install
install
├── bin
│   ├── tm_benchmark
│   ├── tm_classification
│   └── tm_mobilenet_ssd
├── include
│   └── tengine_c_api.h
└── lib
    └── libtengine-lite.so

使用方法

$ ./tm_benchmark -h
[Usage]:  [-h] [-r repeat_count] [-t thread_count] [-p cpu affinity, 0:auto, 1:big, 2:middle, 3:little] [-s net]

例子,如何在 android 平台上运行 tm_benchmark

# for running on android device, upload to /data/local/tmp/ folder
$ adb push tm_benchmark /data/local/tmp/
$ adb push <tengine-lite-root-dir>/benchmark/models /data/local/tmp/
$ adb shell

# executed in android adb shell
$ cd /data/local/tmp/
$ ./tm_benchmark

Typical output (executed in linux)

Khadas VIM3 (Cortex-A73 2.2GHz x 4 + Cortex-A53 1.8GHz x 2)

khadas@Khadas:~/tengine-lite/benchmark$ ../build/benchmark/tm_benchmark -r 5 -t 1 -p 1
start to run register cpu allocator
loop_counts = 5
num_threads = 1
power       = 1
tengine-lite library version: 1.0-dev
     squeezenet_v1.1  min =   55.66 ms   max =   56.19 ms   avg =   56.04 ms
         mobilenetv1  min =  103.18 ms   max =  105.37 ms   avg =  104.26 ms
         mobilenetv2  min =   91.46 ms   max =   93.07 ms   avg =   91.92 ms
         mobilenetv3  min =   56.30 ms   max =   57.17 ms   avg =   56.64 ms
        shufflenetv2  min =   29.92 ms   max =   30.62 ms   avg =   30.29 ms
            resnet18  min =  162.31 ms   max =  162.74 ms   avg =  162.48 ms
            resnet50  min =  495.61 ms   max =  498.00 ms   avg =  496.99 ms
           googlenet  min =  199.16 ms   max =  200.32 ms   avg =  199.72 ms
         inceptionv3  min =  801.93 ms   max =  813.71 ms   avg =  807.08 ms
               vgg16  min =  866.41 ms   max =  877.53 ms   avg =  871.45 ms
                mssd  min =  204.10 ms   max =  208.92 ms   avg =  206.05 ms
          retinaface  min =   28.57 ms   max =   29.06 ms   avg =   28.86 ms
         yolov3_tiny  min =  233.68 ms   max =  235.12 ms   avg =  234.19 ms
      mobilefacenets  min =   44.32 ms   max =   44.82 ms   avg =   44.60 ms
ALL TEST DONE
khadas@Khadas:~/tengine-lite/benchmark$ ../build/benchmark/tm_benchmark -r 5 -t 4 -p 1
start to run register cpu allocator
loop_counts = 5
num_threads = 4
power       = 1
tengine-lite library version: 1.0-dev
     squeezenet_v1.1  min =   22.10 ms   max =   22.33 ms   avg =   22.24 ms
         mobilenetv1  min =   32.07 ms   max =   32.68 ms   avg =   32.49 ms
         mobilenetv2  min =   40.16 ms   max =   40.59 ms   avg =   40.32 ms
         mobilenetv3  min =   32.37 ms   max =   32.60 ms   avg =   32.49 ms
        shufflenetv2  min =   12.67 ms   max =   12.91 ms   avg =   12.76 ms
            resnet18  min =   69.67 ms   max =   70.34 ms   avg =   69.91 ms
            resnet50  min =  174.66 ms   max =  175.34 ms   avg =  174.94 ms
           googlenet  min =   84.43 ms   max =   85.01 ms   avg =   84.82 ms
         inceptionv3  min =  274.61 ms   max =  276.78 ms   avg =  275.74 ms
               vgg16  min =  379.63 ms   max =  385.95 ms   avg =  382.01 ms
                mssd  min =   66.67 ms   max =   67.28 ms   avg =   67.01 ms
          retinaface  min =   15.15 ms   max =   15.34 ms   avg =   15.24 ms
         yolov3_tiny  min =  110.07 ms   max =  110.81 ms   avg =  110.50 ms
      mobilefacenets  min =   16.97 ms   max =   17.16 ms   avg =   17.06 ms
ALL TEST DONE
khadas@Khadas:~/tengine-lite/benchmark$ ../build/benchmark/tm_benchmark -r 5 -t 1 -p 3
start to run register cpu allocator
loop_counts = 5
num_threads = 1
power       = 3
tengine-lite library version: 1.0-dev
     squeezenet_v1.1  min =  116.30 ms   max =  116.43 ms   avg =  116.34 ms
         mobilenetv1  min =  236.10 ms   max =  236.35 ms   avg =  236.21 ms
         mobilenetv2  min =  198.35 ms   max =  198.58 ms   avg =  198.42 ms
         mobilenetv3  min =  128.56 ms   max =  128.99 ms   avg =  128.76 ms
        shufflenetv2  min =   66.71 ms   max =   66.85 ms   avg =   66.75 ms
            resnet18  min =  358.30 ms   max =  358.49 ms   avg =  358.44 ms
            resnet50  min = 1094.14 ms   max = 1094.90 ms   avg = 1094.45 ms
           googlenet  min =  434.48 ms   max =  434.83 ms   avg =  434.61 ms
         inceptionv3  min = 1778.71 ms   max = 1779.36 ms   avg = 1779.03 ms
               vgg16  min = 1903.84 ms   max = 1932.26 ms   avg = 1909.85 ms
                mssd  min =  462.74 ms   max =  463.72 ms   avg =  463.13 ms
          retinaface  min =   59.83 ms   max =   59.94 ms   avg =   59.89 ms
         yolov3_tiny  min =  501.01 ms   max =  501.60 ms   avg =  501.32 ms
      mobilefacenets  min =   99.05 ms   max =   99.22 ms   avg =   99.13 ms
ALL TEST DONE
khadas@Khadas:~/tengine-lite/benchmark$ ../build/benchmark/tm_benchmark -r 5 -t 2 -p 3
start to run register cpu allocator
loop_counts = 5
num_threads = 2
power       = 3
tengine-lite library version: 1.0-dev
     squeezenet_v1.1  min =   63.93 ms   max =   64.02 ms   avg =   63.97 ms
         mobilenetv1  min =  115.33 ms   max =  115.47 ms   avg =  115.40 ms
         mobilenetv2  min =  105.52 ms   max =  105.74 ms   avg =  105.58 ms
         mobilenetv3  min =   83.13 ms   max =   84.02 ms   avg =   83.63 ms
        shufflenetv2  min =   40.04 ms   max =   40.13 ms   avg =   40.09 ms
            resnet18  min =  208.76 ms   max =  209.16 ms   avg =  208.88 ms
            resnet50  min =  600.78 ms   max =  607.13 ms   avg =  603.52 ms
           googlenet  min =  252.26 ms   max =  252.46 ms   avg =  252.34 ms
         inceptionv3  min =  949.61 ms   max =  960.68 ms   avg =  953.56 ms
               vgg16  min = 1105.32 ms   max = 1120.49 ms   avg = 1108.90 ms
                mssd  min =  237.19 ms   max =  237.38 ms   avg =  237.30 ms
          retinaface  min =   36.85 ms   max =   36.96 ms   avg =   36.89 ms
         yolov3_tiny  min =  297.31 ms   max =  298.04 ms   avg =  297.62 ms
      mobilefacenets  min =   53.09 ms   max =   53.18 ms   avg =   53.14 ms
ALL TEST DONE

EAIDK610 (Cortex-A72 1.8GHz x 2 + Cortex-A53 1.4GHz x 4)

[openailab@localhost benchmark]$ ../cmake-build-debug/benchmark/tm_benchmark -r 8
loop_counts  = 8
num_threads  = 1
power        = 0
     squeezenet_v1.1  min =   60.95 ms   max =   64.99 ms   avg =   61.91 ms
         mobilenetv1  min =  107.07 ms   max =  110.94 ms   avg =  108.07 ms
         mobilenetv2  min =  103.30 ms   max =  106.83 ms   avg =  104.08 ms
         mobilenetv3  min =   68.91 ms   max =   70.60 ms   avg =   69.44 ms
        shufflenetv2  min =   31.73 ms   max =   33.16 ms   avg =   32.14 ms
            resnet18  min =  209.66 ms   max =  211.33 ms   avg =  210.19 ms
            resnet50  min =  572.76 ms   max =  577.32 ms   avg =  575.06 ms
           googlenet  min =  253.46 ms   max =  256.21 ms   avg =  254.89 ms
         inceptionv3  min = 1014.39 ms   max = 1021.56 ms   avg = 1018.37 ms
               vgg16  min = 1165.28 ms   max = 1182.80 ms   avg = 1171.24 ms
                mssd  min =  219.30 ms   max =  225.62 ms   avg =  221.70 ms
          retinaface  min =   33.99 ms   max =   35.46 ms   avg =   34.41 ms
         yolov3_tiny  min =  309.41 ms   max =  317.77 ms   avg =  312.79 ms
      mobilefacenets  min =   46.79 ms   max =   49.18 ms   avg =   47.22 ms
ALL TEST DONE