results/3XzI.txt

sbc-bench v0.9.6 Firefly ITX-3588J HDMI(Linux) (Thu, 12 May 2022 08:32:25 +0000)

Distributor ID:	Ubuntu
Description:	Ubuntu 20.04.4 LTS
Release:	20.04
Codename:	focal
Architecture:	arm64

/usr/bin/gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0

Uptime: 08:32:25 up  1:50,  3 users,  load average: 0.08, 0.21, 1.07

Linux 5.10.66 (firefly) 	05/12/22 	_aarch64_	(8 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           8.39    0.08    0.47    0.02    0.00   91.04

Device             tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn    kB_dscd
mmcblk1           5.54       119.28       182.18         0.00     793195    1211460          0
mmcblk1boot0      0.02         0.08         0.00         0.00        548          0          0
mmcblk1boot1      0.02         0.08         0.00         0.00        548          0          0

              total        used        free      shared  buff/cache   available
Mem:          7.5Gi       661Mi       6.5Gi        41Mi       305Mi       6.7Gi
Swap:            0B          0B          0B

##########################################################################

Checking cpufreq OPP for cpu0-cpu3 (Cortex-A55):

No cpufreq support available. Measured on cpu1: 915 Mhz (913.539/913.509/913.448)

Checking cpufreq OPP for cpu4-cpu5 (Cortex-A76):

No cpufreq support available. Measured on cpu5: 980 Mhz (977.693/977.623/977.589)

Checking cpufreq OPP for cpu6-cpu7 (Cortex-A76):

No cpufreq support available. Measured on cpu7: 985 Mhz (982.738/982.668/982.622)

##########################################################################

Executing benchmark on cpu0 (Cortex-A55):

tinymembench v0.4.9 (simple benchmark for memory throughput and latency)

==========================================================================
== Memory bandwidth tests                                               ==
==                                                                      ==
== Note 1: 1MB = 1000000 bytes                                          ==
== Note 2: Results for 'copy' tests show how many bytes can be          ==
==         copied per second (adding together read and writen           ==
==         bytes would have provided twice higher numbers)              ==
== Note 3: 2-pass copy means that we are using a small temporary buffer ==
==         to first fetch data into it, and only then write it to the   ==
==         destination (source -> L1 cache, L1 cache -> destination)    ==
== Note 4: If sample standard deviation exceeds 0.1%, it is shown in    ==
==         brackets                                                     ==
==========================================================================

 C copy backwards                                     :   2699.2 MB/s (0.3%)
 C copy backwards (32 byte blocks)                    :   2648.8 MB/s (0.3%)
 C copy backwards (64 byte blocks)                    :   2690.0 MB/s (0.2%)
 C copy                                               :   3227.0 MB/s
 C copy prefetched (32 bytes step)                    :   1875.2 MB/s (0.3%)
 C copy prefetched (64 bytes step)                    :   3377.4 MB/s
 C 2-pass copy                                        :   1506.8 MB/s (0.2%)
 C 2-pass copy prefetched (32 bytes step)             :   1184.1 MB/s
 C 2-pass copy prefetched (64 bytes step)             :   1660.4 MB/s
 C fill                                               :   7061.7 MB/s
 C fill (shuffle within 16 byte blocks)               :   6311.5 MB/s
 C fill (shuffle within 32 byte blocks)               :   6310.6 MB/s
 C fill (shuffle within 64 byte blocks)               :   6156.5 MB/s
 ---
 standard memcpy                                      :   3670.4 MB/s
 standard memset                                      :  11118.8 MB/s
 ---
 NEON LDP/STP copy                                    :   3040.4 MB/s
 NEON LDP/STP copy pldl2strm (32 bytes step)          :   1367.8 MB/s
 NEON LDP/STP copy pldl2strm (64 bytes step)          :   2181.1 MB/s
 NEON LDP/STP copy pldl1keep (32 bytes step)          :   2218.3 MB/s
 NEON LDP/STP copy pldl1keep (64 bytes step)          :   2799.8 MB/s
 NEON LD1/ST1 copy                                    :   2935.5 MB/s
 NEON STP fill                                        :  11087.8 MB/s
 NEON STNP fill                                       :   7867.2 MB/s (0.4%)
 ARM LDP/STP copy                                     :   3041.3 MB/s
 ARM STP fill                                         :  11083.4 MB/s
 ARM STNP fill                                        :   7866.8 MB/s (0.3%)

==========================================================================
== Framebuffer read tests.                                              ==
==                                                                      ==
== Many ARM devices use a part of the system memory as the framebuffer, ==
== typically mapped as uncached but with write-combining enabled.       ==
== Writes to such framebuffers are quite fast, but reads are much       ==
== slower and very sensitive to the alignment and the selection of      ==
== CPU instructions which are used for accessing memory.                ==
==                                                                      ==
== Many x86 systems allocate the framebuffer in the GPU memory,         ==
== accessible for the CPU via a relatively slow PCI-E bus. Moreover,    ==
== PCI-E is asymmetric and handles reads a lot worse than writes.       ==
==                                                                      ==
== If uncached framebuffer reads are reasonably fast (at least 100 MB/s ==
== or preferably >300 MB/s), then using the shadow framebuffer layer    ==
== is not necessary in Xorg DDX drivers, resulting in a nice overall    ==
== performance improvement. For example, the xf86-video-fbturbo DDX     ==
== uses this trick.                                                     ==
==========================================================================

 NEON LDP/STP copy (from framebuffer)                 :    282.6 MB/s
 NEON LDP/STP 2-pass copy (from framebuffer)          :    260.1 MB/s
 NEON LD1/ST1 copy (from framebuffer)                 :     77.2 MB/s (0.1%)
 NEON LD1/ST1 2-pass copy (from framebuffer)          :     75.1 MB/s
 ARM LDP/STP copy (from framebuffer)                  :    148.7 MB/s
 ARM LDP/STP 2-pass copy (from framebuffer)           :    142.8 MB/s

==========================================================================
== Memory latency test                                                  ==
==                                                                      ==
== Average time is measured for random memory accesses in the buffers   ==
== of different sizes. The larger is the buffer, the more significant   ==
== are relative contributions of TLB, L1/L2 cache misses and SDRAM      ==
== accesses. For extremely large buffer sizes we are expecting to see   ==
== page table walk with several requests to SDRAM for almost every      ==
== memory access (though 64MiB is not nearly large enough to experience ==
== this effect to its fullest).                                         ==
==                                                                      ==
== Note 1: All the numbers are representing extra time, which needs to  ==
==         be added to L1 cache latency. The cycle timings for L1 cache ==
==         latency can be usually found in the processor documentation. ==
== Note 2: Dual random read means that we are simultaneously performing ==
==         two independent memory accesses at a time. In the case if    ==
==         the memory subsystem can't handle multiple outstanding       ==
==         requests, dual random read has the same timings as two       ==
==         single reads performed one after another.                    ==
==========================================================================

block size : single random read / dual random read
      1024 :    0.0 ns          /     0.0 ns 
      2048 :    0.0 ns          /     0.0 ns 
      4096 :    0.0 ns          /     0.0 ns 
      8192 :    0.0 ns          /     0.0 ns 
     16384 :    0.1 ns          /     0.1 ns 
     32768 :    1.1 ns          /     1.9 ns 
     65536 :    2.9 ns          /     5.3 ns 
    131072 :    6.5 ns          /    10.9 ns 
    262144 :   15.7 ns          /    23.3 ns 
    524288 :   22.6 ns          /    29.6 ns 
   1048576 :   26.4 ns          /    31.6 ns 
   2097152 :   28.6 ns          /    32.9 ns 
   4194304 :   58.0 ns          /    81.4 ns 
   8388608 :  102.4 ns          /   135.9 ns 
  16777216 :  127.2 ns          /   155.6 ns 
  33554432 :  141.9 ns          /   167.7 ns 
  67108864 :  154.9 ns          /   184.2 ns 

Executing benchmark on cpu4 (Cortex-A76):

tinymembench v0.4.9 (simple benchmark for memory throughput and latency)

==========================================================================
== Memory bandwidth tests                                               ==
==                                                                      ==
== Note 1: 1MB = 1000000 bytes                                          ==
== Note 2: Results for 'copy' tests show how many bytes can be          ==
==         copied per second (adding together read and writen           ==
==         bytes would have provided twice higher numbers)              ==
== Note 3: 2-pass copy means that we are using a small temporary buffer ==
==         to first fetch data into it, and only then write it to the   ==
==         destination (source -> L1 cache, L1 cache -> destination)    ==
== Note 4: If sample standard deviation exceeds 0.1%, it is shown in    ==
==         brackets                                                     ==
==========================================================================

 C copy backwards                                     :   8064.1 MB/s
 C copy backwards (32 byte blocks)                    :   7610.5 MB/s
 C copy backwards (64 byte blocks)                    :   7609.8 MB/s
 C copy                                               :   8054.7 MB/s
 C copy prefetched (32 bytes step)                    :   6880.0 MB/s
 C copy prefetched (64 bytes step)                    :   7278.1 MB/s
 C 2-pass copy                                        :   2584.1 MB/s (0.2%)
 C 2-pass copy prefetched (32 bytes step)             :   3203.2 MB/s
 C 2-pass copy prefetched (64 bytes step)             :   3370.7 MB/s
 C fill                                               :  15037.9 MB/s (0.3%)
 C fill (shuffle within 16 byte blocks)               :  15039.3 MB/s (0.3%)
 C fill (shuffle within 32 byte blocks)               :  15033.7 MB/s (0.4%)
 C fill (shuffle within 64 byte blocks)               :  14990.3 MB/s (0.3%)
 ---
 standard memcpy                                      :  10189.0 MB/s (0.1%)
 standard memset                                      :  15066.4 MB/s (0.5%)
 ---
 NEON LDP/STP copy                                    :  10153.3 MB/s (0.1%)
 NEON LDP/STP copy pldl2strm (32 bytes step)          :   7427.3 MB/s
 NEON LDP/STP copy pldl2strm (64 bytes step)          :   7768.2 MB/s
 NEON LDP/STP copy pldl1keep (32 bytes step)          :   7760.8 MB/s
 NEON LDP/STP copy pldl1keep (64 bytes step)          :   7760.2 MB/s
 NEON LD1/ST1 copy                                    :  10179.1 MB/s (0.2%)
 NEON STP fill                                        :  15073.8 MB/s (0.4%)
 NEON STNP fill                                       :  15075.4 MB/s (0.5%)
 ARM LDP/STP copy                                     :  10164.9 MB/s (0.1%)
 ARM STP fill                                         :  15071.5 MB/s (0.4%)
 ARM STNP fill                                        :  15070.2 MB/s (0.4%)

==========================================================================
== Framebuffer read tests.                                              ==
==                                                                      ==
== Many ARM devices use a part of the system memory as the framebuffer, ==
== typically mapped as uncached but with write-combining enabled.       ==
== Writes to such framebuffers are quite fast, but reads are much       ==
== slower and very sensitive to the alignment and the selection of      ==
== CPU instructions which are used for accessing memory.                ==
==                                                                      ==
== Many x86 systems allocate the framebuffer in the GPU memory,         ==
== accessible for the CPU via a relatively slow PCI-E bus. Moreover,    ==
== PCI-E is asymmetric and handles reads a lot worse than writes.       ==
==                                                                      ==
== If uncached framebuffer reads are reasonably fast (at least 100 MB/s ==
== or preferably >300 MB/s), then using the shadow framebuffer layer    ==
== is not necessary in Xorg DDX drivers, resulting in a nice overall    ==
== performance improvement. For example, the xf86-video-fbturbo DDX     ==
== uses this trick.                                                     ==
==========================================================================

 NEON LDP/STP copy (from framebuffer)                 :   1477.5 MB/s
 NEON LDP/STP 2-pass copy (from framebuffer)          :   1246.9 MB/s
 NEON LD1/ST1 copy (from framebuffer)                 :   1487.3 MB/s (0.1%)
 NEON LD1/ST1 2-pass copy (from framebuffer)          :   1249.3 MB/s (0.2%)
 ARM LDP/STP copy (from framebuffer)                  :   1429.1 MB/s
 ARM LDP/STP 2-pass copy (from framebuffer)           :   1243.7 MB/s (0.2%)

==========================================================================
== Memory latency test                                                  ==
==                                                                      ==
== Average time is measured for random memory accesses in the buffers   ==
== of different sizes. The larger is the buffer, the more significant   ==
== are relative contributions of TLB, L1/L2 cache misses and SDRAM      ==
== accesses. For extremely large buffer sizes we are expecting to see   ==
== page table walk with several requests to SDRAM for almost every      ==
== memory access (though 64MiB is not nearly large enough to experience ==
== this effect to its fullest).                                         ==
==                                                                      ==
== Note 1: All the numbers are representing extra time, which needs to  ==
==         be added to L1 cache latency. The cycle timings for L1 cache ==
==         latency can be usually found in the processor documentation. ==
== Note 2: Dual random read means that we are simultaneously performing ==
==         two independent memory accesses at a time. In the case if    ==
==         the memory subsystem can't handle multiple outstanding       ==
==         requests, dual random read has the same timings as two       ==
==         single reads performed one after another.                    ==
==========================================================================

block size : single random read / dual random read
      1024 :    0.0 ns          /     0.0 ns 
      2048 :    0.0 ns          /     0.0 ns 
      4096 :    0.0 ns          /     0.0 ns 
      8192 :    0.0 ns          /     0.0 ns 
     16384 :    0.0 ns          /     0.0 ns 
     32768 :    0.0 ns          /     0.0 ns 
     65536 :    0.0 ns          /     0.0 ns 
    131072 :    2.6 ns          /     3.6 ns 
    262144 :    4.9 ns          /     6.5 ns 
    524288 :   10.5 ns          /    14.0 ns 
   1048576 :   21.2 ns          /    27.7 ns 
   2097152 :   27.3 ns          /    32.4 ns 
   4194304 :   52.7 ns          /    72.5 ns 
   8388608 :   93.7 ns          /   121.7 ns 
  16777216 :  122.9 ns          /   145.7 ns 
  33554432 :  138.2 ns          /   155.2 ns 
  67108864 :  149.3 ns          /   162.0 ns 

Executing benchmark on cpu6 (Cortex-A76):

tinymembench v0.4.9 (simple benchmark for memory throughput and latency)

==========================================================================
== Memory bandwidth tests                                               ==
==                                                                      ==
== Note 1: 1MB = 1000000 bytes                                          ==
== Note 2: Results for 'copy' tests show how many bytes can be          ==
==         copied per second (adding together read and writen           ==
==         bytes would have provided twice higher numbers)              ==
== Note 3: 2-pass copy means that we are using a small temporary buffer ==
==         to first fetch data into it, and only then write it to the   ==
==         destination (source -> L1 cache, L1 cache -> destination)    ==
== Note 4: If sample standard deviation exceeds 0.1%, it is shown in    ==
==         brackets                                                     ==
==========================================================================

 C copy backwards                                     :   8104.3 MB/s
 C copy backwards (32 byte blocks)                    :   7651.2 MB/s
 C copy backwards (64 byte blocks)                    :   7648.7 MB/s
 C copy                                               :   8097.5 MB/s
 C copy prefetched (32 bytes step)                    :   6916.6 MB/s
 C copy prefetched (64 bytes step)                    :   7316.2 MB/s
 C 2-pass copy                                        :   2604.2 MB/s
 C 2-pass copy prefetched (32 bytes step)             :   3223.9 MB/s
 C 2-pass copy prefetched (64 bytes step)             :   3411.9 MB/s
 C fill                                               :  15061.2 MB/s (0.5%)
 C fill (shuffle within 16 byte blocks)               :  15062.9 MB/s (0.3%)
 C fill (shuffle within 32 byte blocks)               :  15066.5 MB/s (0.3%)
 C fill (shuffle within 64 byte blocks)               :  15002.4 MB/s (0.3%)
 ---
 standard memcpy                                      :  10188.3 MB/s (0.2%)
 standard memset                                      :  15078.5 MB/s (0.4%)
 ---
 NEON LDP/STP copy                                    :  10148.7 MB/s (0.1%)
 NEON LDP/STP copy pldl2strm (32 bytes step)          :   7469.0 MB/s
 NEON LDP/STP copy pldl2strm (64 bytes step)          :   7800.5 MB/s (0.1%)
 NEON LDP/STP copy pldl1keep (32 bytes step)          :   7791.3 MB/s
 NEON LDP/STP copy pldl1keep (64 bytes step)          :   7797.3 MB/s
 NEON LD1/ST1 copy                                    :  10170.2 MB/s (0.2%)
 NEON STP fill                                        :  15076.8 MB/s (0.5%)
 NEON STNP fill                                       :  15078.3 MB/s (0.5%)
 ARM LDP/STP copy                                     :  10160.6 MB/s (0.2%)
 ARM STP fill                                         :  15077.1 MB/s (0.5%)
 ARM STNP fill                                        :  15077.7 MB/s (0.5%)

==========================================================================
== Framebuffer read tests.                                              ==
==                                                                      ==
== Many ARM devices use a part of the system memory as the framebuffer, ==
== typically mapped as uncached but with write-combining enabled.       ==
== Writes to such framebuffers are quite fast, but reads are much       ==
== slower and very sensitive to the alignment and the selection of      ==
== CPU instructions which are used for accessing memory.                ==
==                                                                      ==
== Many x86 systems allocate the framebuffer in the GPU memory,         ==
== accessible for the CPU via a relatively slow PCI-E bus. Moreover,    ==
== PCI-E is asymmetric and handles reads a lot worse than writes.       ==
==                                                                      ==
== If uncached framebuffer reads are reasonably fast (at least 100 MB/s ==
== or preferably >300 MB/s), then using the shadow framebuffer layer    ==
== is not necessary in Xorg DDX drivers, resulting in a nice overall    ==
== performance improvement. For example, the xf86-video-fbturbo DDX     ==
== uses this trick.                                                     ==
==========================================================================

 NEON LDP/STP copy (from framebuffer)                 :   1479.1 MB/s
 NEON LDP/STP 2-pass copy (from framebuffer)          :   1249.9 MB/s
 NEON LD1/ST1 copy (from framebuffer)                 :   1487.6 MB/s
 NEON LD1/ST1 2-pass copy (from framebuffer)          :   1250.1 MB/s
 ARM LDP/STP copy (from framebuffer)                  :   1437.5 MB/s
 ARM LDP/STP 2-pass copy (from framebuffer)           :   1247.1 MB/s (0.2%)

==========================================================================
== Memory latency test                                                  ==
==                                                                      ==
== Average time is measured for random memory accesses in the buffers   ==
== of different sizes. The larger is the buffer, the more significant   ==
== are relative contributions of TLB, L1/L2 cache misses and SDRAM      ==
== accesses. For extremely large buffer sizes we are expecting to see   ==
== page table walk with several requests to SDRAM for almost every      ==
== memory access (though 64MiB is not nearly large enough to experience ==
== this effect to its fullest).                                         ==
==                                                                      ==
== Note 1: All the numbers are representing extra time, which needs to  ==
==         be added to L1 cache latency. The cycle timings for L1 cache ==
==         latency can be usually found in the processor documentation. ==
== Note 2: Dual random read means that we are simultaneously performing ==
==         two independent memory accesses at a time. In the case if    ==
==         the memory subsystem can't handle multiple outstanding       ==
==         requests, dual random read has the same timings as two       ==
==         single reads performed one after another.                    ==
==========================================================================

block size : single random read / dual random read
      1024 :    0.0 ns          /     0.0 ns 
      2048 :    0.0 ns          /     0.0 ns 
      4096 :    0.0 ns          /     0.0 ns 
      8192 :    0.0 ns          /     0.0 ns 
     16384 :    0.0 ns          /     0.0 ns 
     32768 :    0.0 ns          /     0.0 ns 
     65536 :    0.0 ns          /     0.0 ns 
    131072 :    2.6 ns          /     3.6 ns 
    262144 :    5.2 ns          /     6.7 ns 
    524288 :    8.6 ns          /    10.8 ns 
   1048576 :   21.1 ns          /    27.7 ns 
   2097152 :   28.1 ns          /    33.1 ns 
   4194304 :   52.9 ns          /    73.6 ns 
   8388608 :   98.1 ns          /   127.7 ns 
  16777216 :  125.2 ns          /   148.3 ns 
  33554432 :  139.4 ns          /   155.9 ns 
  67108864 :  148.8 ns          /   161.5 ns 

##########################################################################

Executing ramlat on cpu0 (Cortex-A55), results in ns:

       size:  1x32  2x32  1x64  2x64 1xPTR 2xPTR 4xPTR
         4k: 10.13 13.62 8.074 13.62 5.817 7.050 13.62 
         8k: 10.13 13.62 8.074 13.62 5.817 7.052 13.62 
        16k: 10.13 13.62 8.075 13.62 5.817 7.051 13.63 
        32k: 10.29 13.89 8.198 13.86 5.894 7.178 13.91 
        64k: 23.10 30.14 21.18 30.15 20.05 27.42 40.67 
       128k: 32.64 41.00 30.31 40.98 29.55 39.45 67.35 
       256k: 37.88 61.04 35.70 61.04 33.95 54.66 102.0 
       512k: 40.03 70.56 37.71 70.47 35.48 62.81 120.6 
      1024k: 40.26 71.19 37.95 71.07 35.70 63.41 132.2 
      2048k: 42.39 74.62 39.75 80.05 37.51 66.89 139.0 
      4096k: 77.05 148.9 78.10 137.7 71.01 169.5 343.1 
      8192k: 121.3 207.7 119.2 207.7 138.4 219.4 425.9 
     16384k: 145.5 236.3 139.9 236.3 150.0 247.6 521.4 

Executing ramlat on cpu4 (Cortex-A76), results in ns:

       size:  1x32  2x32  1x64  2x64 1xPTR 2xPTR 4xPTR
         4k: 11.08 11.81 11.07 12.13 9.857 11.41 10.03 
         8k: 11.08 11.81 11.05 12.13 9.864 11.40 10.03 
        16k: 11.06 11.81 11.07 12.13 9.857 11.40 10.03 
        32k: 11.02 11.81 11.07 12.13 9.859 11.40 10.03 
        64k: 11.12 11.87 11.12 12.18 9.901 11.08 10.15 
       128k: 17.51 20.93 17.51 20.87 17.10 18.61 20.63 
       256k: 21.52 23.49 21.53 23.45 20.08 22.49 23.14 
       512k: 28.35 32.08 28.25 32.05 26.44 31.36 32.73 
      1024k: 43.91 47.31 43.88 47.21 41.79 47.09 53.15 
      2048k: 46.26 49.78 45.64 49.67 43.66 49.95 57.44 
      4096k: 89.25 89.21 80.80 86.07 93.58 92.23 87.47 
      8192k: 131.5 128.7 134.6 126.1 127.0 125.0 117.2 
     16384k: 153.5 146.3 161.9 150.9 151.0 146.3 150.7 

Executing ramlat on cpu6 (Cortex-A76), results in ns:

       size:  1x32  2x32  1x64  2x64 1xPTR 2xPTR 4xPTR
         4k: 10.97 11.75 11.02 12.07 9.810 11.35 9.975 
         8k: 10.99 11.75 11.02 12.07 9.812 11.35 9.983 
        16k: 10.98 11.75 11.02 12.07 9.810 11.35 9.981 
        32k: 10.99 11.75 11.02 12.07 9.812 11.35 9.979 
        64k: 11.05 11.82 11.07 12.13 9.866 11.07 10.15 
       128k: 17.31 20.68 17.31 20.54 16.92 18.40 20.45 
       256k: 21.49 23.09 21.50 23.11 20.02 22.48 23.05 
       512k: 27.93 30.94 28.03 30.94 26.00 30.43 32.75 
      1024k: 44.53 47.15 44.48 47.07 42.24 46.94 52.65 
      2048k: 47.55 51.82 47.05 51.75 44.89 51.66 57.85 
      4096k: 81.66 91.45 81.77 85.32 94.19 83.40 79.31 
      8192k: 130.7 126.1 145.9 127.0 129.3 125.1 117.5 
     16384k: 159.9 148.0 153.4 147.8 154.7 147.1 146.2 

##########################################################################

Executing benchmark on each cluster individually

OpenSSL 1.1.1f, built on 31 Mar 2020
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
aes-128-cbc      87990.15k   253062.91k   486757.63k   627099.99k   685861.55k   690350.76k
aes-128-cbc     293923.67k   553489.49k   708496.13k   758075.39k   777254.23k   779627.18k
aes-128-cbc     295437.31k   556587.65k   712456.96k   761737.56k   781170.01k   783510.19k
aes-192-cbc      83513.35k   223986.24k   391992.66k   481601.19k   516131.50k   518810.28k
aes-192-cbc     274177.48k   484878.02k   600051.63k   627351.55k   648481.45k   650035.20k
aes-192-cbc     275746.59k   487192.30k   603059.03k   630418.77k   651687.25k   653241.00k
aes-256-cbc      79842.01k   203770.41k   336984.83k   401778.69k   425866.58k   427753.47k
aes-256-cbc     256100.59k   430974.17k   520201.64k   545776.30k   556258.65k   557421.91k
aes-256-cbc     257136.12k   433054.95k   522670.85k   548495.70k   559035.73k   560196.27k

##########################################################################

Executing benchmark single-threaded on cpu0 (Cortex-A55)

7-Zip (a) [64] 16.02 : Copyright (c) 1999-2016 Igor Pavlov : 2016-05-21
p7zip Version 16.02 (locale=C,Utf16=off,HugeFiles=on,64 bits,8 CPUs LE)

LE
CPU Freq: - - - - 128000000 256000000 - - -

RAM size:    7674 MB,  # CPU hardware threads:   8
RAM usage:    435 MB,  # Benchmark threads:      1

                       Compressing  |                  Decompressing
Dict     Speed Usage    R/U Rating  |      Speed Usage    R/U Rating
         KiB/s     %   MIPS   MIPS  |      KiB/s     %   MIPS   MIPS

22:        788   100    767    767  |      11124   100    950    950
23:        743   100    757    757  |      10993   100    952    952
24:        731   100    786    786  |      10841   100    952    952
25:        715   100    817    817  |      10636   100    947    947
----------------------------------  | ------------------------------
Avr:             100    782    782  |              100    950    950
Tot:             100    866    866

Executing benchmark single-threaded on cpu4 (Cortex-A76)

7-Zip (a) [64] 16.02 : Copyright (c) 1999-2016 Igor Pavlov : 2016-05-21
p7zip Version 16.02 (locale=C,Utf16=off,HugeFiles=on,64 bits,8 CPUs LE)

LE
CPU Freq: 64000000 - - - - - - - 2048000000

RAM size:    7674 MB,  # CPU hardware threads:   8
RAM usage:    435 MB,  # Benchmark threads:      1

                       Compressing  |                  Decompressing
Dict     Speed Usage    R/U Rating  |      Speed Usage    R/U Rating
         KiB/s     %   MIPS   MIPS  |      KiB/s     %   MIPS   MIPS

22:       1673   100   1628   1628  |      16130   100   1377   1377
23:       1606   100   1638   1637  |      15941   100   1380   1380
24:       1564   100   1683   1682  |      15755   100   1383   1383
25:       1522   100   1739   1739  |      15512   100   1381   1381
----------------------------------  | ------------------------------
Avr:             100   1672   1672  |              100   1380   1380
Tot:             100   1526   1526

Executing benchmark single-threaded on cpu6 (Cortex-A76)

7-Zip (a) [64] 16.02 : Copyright (c) 1999-2016 Igor Pavlov : 2016-05-21
p7zip Version 16.02 (locale=C,Utf16=off,HugeFiles=on,64 bits,8 CPUs LE)

LE
CPU Freq: - - - - - - - 1024000000 -

RAM size:    7674 MB,  # CPU hardware threads:   8
RAM usage:    435 MB,  # Benchmark threads:      1

                       Compressing  |                  Decompressing
Dict     Speed Usage    R/U Rating  |      Speed Usage    R/U Rating
         KiB/s     %   MIPS   MIPS  |      KiB/s     %   MIPS   MIPS

22:       1690   100   1645   1645  |      16220   100   1385   1385
23:       1621   100   1652   1652  |      16077   100   1392   1392
24:       1570   100   1689   1689  |      15886   100   1395   1395
25:       1529   100   1747   1747  |      15634   100   1392   1392
----------------------------------  | ------------------------------
Avr:             100   1683   1683  |              100   1391   1391
Tot:             100   1537   1537

##########################################################################

Executing benchmark 3 times multi-threaded

7-Zip (a) [64] 16.02 : Copyright (c) 1999-2016 Igor Pavlov : 2016-05-21
p7zip Version 16.02 (locale=C,Utf16=off,HugeFiles=on,64 bits,8 CPUs LE)

LE
CPU Freq: - - 64000000 - 128000000 - - - -

RAM size:    7674 MB,  # CPU hardware threads:   8
RAM usage:   1765 MB,  # Benchmark threads:      8

                       Compressing  |                  Decompressing
Dict     Speed Usage    R/U Rating  |      Speed Usage    R/U Rating
         KiB/s     %   MIPS   MIPS  |      KiB/s     %   MIPS   MIPS

22:       9463   749   1228   9206  |      94633   685   1179   8072
23:       8861   732   1234   9029  |      94873   692   1186   8210
24:       8890   759   1260   9559  |      93571   693   1186   8213
25:       8564   758   1290   9779  |      91870   692   1181   8176
----------------------------------  | ------------------------------
Avr:             750   1253   9393  |              690   1183   8168
Tot:             720   1218   8780

7-Zip (a) [64] 16.02 : Copyright (c) 1999-2016 Igor Pavlov : 2016-05-21
p7zip Version 16.02 (locale=C,Utf16=off,HugeFiles=on,64 bits,8 CPUs LE)

LE
CPU Freq: 64000000 64000000 - - - - - - -

RAM size:    7674 MB,  # CPU hardware threads:   8
RAM usage:   1765 MB,  # Benchmark threads:      8

                       Compressing  |                  Decompressing
Dict     Speed Usage    R/U Rating  |      Speed Usage    R/U Rating
         KiB/s     %   MIPS   MIPS  |      KiB/s     %   MIPS   MIPS

22:       9481   748   1233   9224  |      96062   692   1184   8194
23:       8779   715   1251   8945  |      94904   693   1185   8213
24:       8628   736   1260   9277  |      93708   694   1185   8225
25:       8396   743   1290   9586  |      92749   699   1181   8254
----------------------------------  | ------------------------------
Avr:             736   1259   9258  |              694   1184   8221
Tot:             715   1221   8740

7-Zip (a) [64] 16.02 : Copyright (c) 1999-2016 Igor Pavlov : 2016-05-21
p7zip Version 16.02 (locale=C,Utf16=off,HugeFiles=on,64 bits,8 CPUs LE)

LE
CPU Freq: - - - - 128000000 - - - -

RAM size:    7674 MB,  # CPU hardware threads:   8
RAM usage:   1765 MB,  # Benchmark threads:      8

                       Compressing  |                  Decompressing
Dict     Speed Usage    R/U Rating  |      Speed Usage    R/U Rating
         KiB/s     %   MIPS   MIPS  |      KiB/s     %   MIPS   MIPS

22:       9510   754   1227   9251  |      96193   693   1184   8205
23:       8303   685   1235   8460  |      94683   691   1186   8194
24:       8903   765   1251   9573  |      93500   692   1186   8206
25:       8638   765   1289   9863  |      91817   691   1182   8171
----------------------------------  | ------------------------------
Avr:             742   1250   9287  |              692   1184   8194
Tot:             717   1217   8740

Compression: 9393,9258,9287
Decompression: 8168,8221,8194
Total: 8780,8740,8740

##########################################################################

Testing clockspeeds again. System health now:

Time      CPU n/a    load %cpu %sys %usr %nice %io %irq   Temp
09:03:27:   ---      7.95  96%   2%  94%   0%   0%   0%  34.2°C

Checking cpufreq OPP for cpu0-cpu3 (Cortex-A55):

No cpufreq support available. Measured on cpu1: 915 Mhz (911.102/910.931/910.801)

Checking cpufreq OPP for cpu4-cpu5 (Cortex-A76):

No cpufreq support available. Measured on cpu5: 980 Mhz (975.961/975.869/975.615)

Checking cpufreq OPP for cpu6-cpu7 (Cortex-A76):

No cpufreq support available. Measured on cpu7: 985 Mhz (980.721/980.372/978.839)

##########################################################################

Thermal source: /sys/devices/virtual/thermal/thermal_zone0/ (soc-thermal)

System health while running tinymembench:

Time      CPU n/a    load %cpu %sys %usr %nice %io %irq   Temp
08:32:30:   ---      0.15   8%   0%   8%   0%   0%   0%  31.5°C
08:34:30:   ---      0.99  13%   0%  12%   0%   0%   0%  31.5°C
08:36:31:   ---      1.00  12%   0%  12%   0%   0%   0%  31.5°C
08:38:31:   ---      1.00  12%   0%  12%   0%   0%   0%  33.3°C
08:40:31:   ---      1.01  12%   0%  12%   0%   0%   0%  32.4°C
08:42:31:   ---      1.00  12%   0%  12%   0%   0%   0%  31.5°C
08:44:31:   ---      1.04  13%   0%  12%   0%   0%   0%  34.2°C
08:46:31:   ---      1.05  13%   0%  12%   0%   0%   0%  32.4°C
08:48:31:   ---      1.03  12%   0%  12%   0%   0%   0%  31.5°C

System health while running ramlat:

Time      CPU n/a    load %cpu %sys %usr %nice %io %irq   Temp
08:48:52:   ---      1.09   9%   0%   8%   0%   0%   0%  31.5°C
08:49:01:   ---      1.08  12%   0%  12%   0%   0%   0%  31.5°C
08:49:10:   ---      1.07  13%   0%  12%   0%   0%   0%  31.5°C
08:49:19:   ---      1.06  13%   0%  12%   0%   0%   0%  31.5°C
08:49:28:   ---      1.05  12%   0%  12%   0%   0%   0%  31.5°C
08:49:37:   ---      1.04  13%   0%  12%   0%   0%   0%  31.5°C
08:49:46:   ---      1.04  13%   0%  12%   0%   0%   0%  31.5°C
08:49:55:   ---      1.03  12%   0%  12%   0%   0%   0%  31.5°C

System health while running OpenSSL benchmark:

Time      CPU n/a    load %cpu %sys %usr %nice %io %irq   Temp
08:50:02:   ---      1.03   9%   0%   8%   0%   0%   0%  32.4°C
08:50:18:   ---      1.02  12%   0%  12%   0%   0%   0%  31.5°C
08:50:34:   ---      1.02  12%   0%  12%   0%   0%   0%  31.5°C
08:50:50:   ---      1.01  12%   0%  12%   0%   0%   0%  31.5°C
08:51:06:   ---      1.01  12%   0%  12%   0%   0%   0%  31.5°C
08:51:22:   ---      1.00  12%   0%  12%   0%   0%   0%  31.5°C
08:51:38:   ---      1.00  12%   0%  12%   0%   0%   0%  31.5°C
08:51:54:   ---      1.00  12%   0%  12%   0%   0%   0%  31.5°C
08:52:10:   ---      1.00  12%   0%  12%   0%   0%   0%  31.5°C
08:52:26:   ---      1.00  12%   0%  12%   0%   0%   0%  31.5°C
08:52:42:   ---      1.00  12%   0%  12%   0%   0%   0%  31.5°C

System health while running 7-zip single core benchmark:

Time      CPU n/a    load %cpu %sys %usr %nice %io %irq   Temp
08:52:44:   ---      1.00   9%   0%   9%   0%   0%   0%  31.5°C
08:52:59:   ---      1.00  12%   0%  12%   0%   0%   0%  31.5°C
08:53:14:   ---      1.00  12%   0%  12%   0%   0%   0%  31.5°C
08:53:29:   ---      1.00  12%   0%  12%   0%   0%   0%  31.5°C
08:53:44:   ---      1.00  12%   0%  12%   0%   0%   0%  31.5°C
08:53:59:   ---      1.00  12%   0%  12%   0%   0%   0%  31.5°C
08:54:14:   ---      1.00  12%   0%  12%   0%   0%   0%  31.5°C
08:54:29:   ---      1.00  12%   0%  12%   0%   0%   0%  31.5°C
08:54:44:   ---      1.00  12%   0%  12%   0%   0%   0%  31.5°C
08:54:59:   ---      1.00  12%   0%  12%   0%   0%   0%  31.5°C
08:55:14:   ---      1.00  13%   0%  12%   0%   0%   0%  31.5°C
08:55:30:   ---      1.00  12%   0%  12%   0%   0%   0%  31.5°C
08:55:45:   ---      1.00  12%   0%  12%   0%   0%   0%  31.5°C
08:56:00:   ---      1.00  12%   0%  12%   0%   0%   0%  31.5°C
08:56:15:   ---      1.00  13%   0%  12%   0%   0%   0%  31.5°C
08:56:30:   ---      1.00  12%   0%  12%   0%   0%   0%  31.5°C
08:56:45:   ---      1.00  12%   0%  12%   0%   0%   0%  31.5°C
08:57:00:   ---      1.00  12%   0%  12%   0%   0%   0%  31.5°C
08:57:15:   ---      1.08  13%   0%  12%   0%   0%   0%  31.5°C
08:57:30:   ---      1.06  12%   0%  12%   0%   0%   0%  31.5°C
08:57:45:   ---      1.05  13%   0%  12%   0%   0%   0%  31.5°C
08:58:00:   ---      1.04  12%   0%  12%   0%   0%   0%  31.5°C
08:58:15:   ---      1.03  12%   0%  12%   0%   0%   0%  31.5°C
08:58:30:   ---      1.02  12%   0%  12%   0%   0%   0%  31.5°C

System health while running 7-zip multi core benchmark:

Time      CPU n/a    load %cpu %sys %usr %nice %io %irq   Temp
08:58:37:   ---      1.02   9%   0%   9%   0%   0%   0%  31.5°C
08:58:47:   ---      2.31  94%   1%  93%   0%   0%   0%  33.3°C
08:58:57:   ---      2.95  91%   0%  91%   0%   0%   0%  32.4°C
08:59:10:   ---      3.88  81%   1%  79%   0%   0%   0%  34.2°C
08:59:21:   ---      4.59  79%   0%  78%   0%   0%   0%  32.4°C
08:59:34:   ---      5.16  93%   2%  91%   0%   0%   0%  33.3°C
08:59:45:   ---      5.28  80%   0%  80%   0%   0%   0%  32.4°C
08:59:55:   ---      5.34  85%   3%  82%   0%   0%   0%  34.2°C
09:00:08:   ---      5.97  95%   2%  93%   0%   0%   0%  34.2°C
09:00:19:   ---      6.00  81%   1%  80%   0%   0%   0%  34.2°C
09:00:32:   ---      6.39  97%   0%  97%   0%   0%   0%  34.2°C
09:00:42:   ---      6.58  77%   1%  75%   0%   0%   0%  33.3°C
09:00:55:   ---      6.80  99%   0%  98%   0%   0%   0%  34.2°C
09:01:05:   ---      6.93  72%   1%  71%   0%   0%   0%  34.2°C
09:01:18:   ---      7.24  96%   1%  94%   0%   0%   0%  34.2°C
09:01:30:   ---      7.26  70%   1%  68%   0%   0%   0%  34.2°C
09:01:40:   ---      7.69  97%   2%  94%   0%   0%   0%  34.2°C
09:01:52:   ---      7.22  94%   1%  93%   0%   0%   0%  34.2°C
09:02:05:   ---      7.54  79%   1%  78%   0%   0%   0%  34.2°C
09:02:15:   ---      7.61  92%   0%  91%   0%   0%   0%  33.3°C
09:02:29:   ---      7.54  81%   1%  79%   0%   0%   0%  34.2°C
09:02:39:   ---      7.39  79%   0%  78%   0%   0%   0%  32.4°C
09:02:52:   ---      7.62  93%   2%  91%   0%   0%   0%  34.2°C
09:03:03:   ---      7.45  81%   0%  81%   0%   0%   0%  33.3°C
09:03:14:   ---      7.85  84%   2%  81%   0%   0%   0%  34.2°C
09:03:27:   ---      7.95  96%   2%  94%   0%   0%   0%  34.2°C

##########################################################################

Linux 5.10.66 (firefly) 	05/12/22 	_aarch64_	(8 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          11.91    0.06    0.47    0.02    0.00   87.53

Device             tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn    kB_dscd
mmcblk1           4.45        94.31       142.93         0.00     803915    1218352          0
mmcblk1boot0      0.02         0.06         0.00         0.00        548          0          0
mmcblk1boot1      0.02         0.06         0.00         0.00        548          0          0

              total        used        free      shared  buff/cache   available
Mem:          7.5Gi       872Mi       6.3Gi        41Mi       318Mi       6.5Gi
Swap:            0B          0B          0B

CPU sysfs topology (clusters, cpufreq members, clockspeeds)
                 cpufreq   min    max
 CPU    cluster  policy   speed  speed   core type
  0        0        0       -      -     Cortex-A55 / r2p0
  1        0        0       -      -     Cortex-A55 / r2p0
  2        0        0       -      -     Cortex-A55 / r2p0
  3        0        0       -      -     Cortex-A55 / r2p0
  4        1        0       -      -     Cortex-A76 / r4p0
  5        1        0       -      -     Cortex-A76 / r4p0
  6        2        0       -      -     Cortex-A76 / r4p0
  7        2        0       -      -     Cortex-A76 / r4p0

Architecture:                    aarch64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
CPU(s):                          8
On-line CPU(s) list:             0-7
Thread(s) per core:              1
Core(s) per socket:              2
Socket(s):                       3
Vendor ID:                       ARM
Model:                           0
Model name:                      Cortex-A55
Stepping:                        r2p0
BogoMIPS:                        48.00
L1d cache:                       256 KiB
L1i cache:                       256 KiB
L2 cache:                        1 MiB
L3 cache:                        3 MiB
Vulnerability Itlb multihit:     Not affected
Vulnerability L1tf:              Not affected
Vulnerability Mds:               Not affected
Vulnerability Meltdown:          Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:        Mitigation; __user pointer sanitization
Vulnerability Spectre v2:        Not affected
Vulnerability Srbds:             Not affected
Vulnerability Tsx async abort:   Not affected
Flags:                           fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp

SoC guess: Rockchip RK3588 (35880000)
 Compiler: /usr/bin/gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1/aarch64-linux-gnu)
 Userland: arm64
   Kernel: 5.10.66/aarch64
           CONFIG_HZ=300
           CONFIG_HZ_300=y
           CONFIG_PREEMPT_VOLUNTARY=y
           raid6: neonx8   gen()  6846 MB/s
           raid6: neonx8   xor()  5276 MB/s
           raid6: neonx4   gen()  7042 MB/s
           raid6: neonx4   xor()  5353 MB/s
           raid6: neonx2   gen()  6462 MB/s
           raid6: neonx2   xor()  5075 MB/s
           raid6: neonx1   gen()  5293 MB/s
           raid6: neonx1   xor()  4380 MB/s
           raid6: int64x8  gen()  1632 MB/s
           raid6: int64x8  xor()  1031 MB/s
           raid6: int64x4  gen()  2048 MB/s
           raid6: int64x4  xor()  1126 MB/s
           raid6: int64x2  gen()  3019 MB/s
           raid6: int64x2  xor()  1660 MB/s
           raid6: int64x1  gen()  2477 MB/s
           raid6: int64x1  xor()  1184 MB/s
           raid6: using algorithm neonx4 gen() 7042 MB/s
           raid6: .... xor() 5353 MB/s, rmw enabled
           raid6: using neon recovery algorithm
           xor: measuring software checksum speed
           xor: using function: arm64_neon (12928 MB/sec)

cpu0/index2: 128K, level: 2, type: Unified
cpu0/index0: 32K, level: 1, type: Data
cpu0/index3: 3072K, level: 3, type: Unified
cpu0/index1: 32K, level: 1, type: Instruction
cpu1/index2: 128K, level: 2, type: Unified
cpu1/index0: 32K, level: 1, type: Data
cpu1/index3: 3072K, level: 3, type: Unified
cpu1/index1: 32K, level: 1, type: Instruction
cpu2/index2: 128K, level: 2, type: Unified
cpu2/index0: 32K, level: 1, type: Data
cpu2/index3: 3072K, level: 3, type: Unified
cpu2/index1: 32K, level: 1, type: Instruction
cpu3/index2: 128K, level: 2, type: Unified
cpu3/index0: 32K, level: 1, type: Data
cpu3/index3: 3072K, level: 3, type: Unified
cpu3/index1: 32K, level: 1, type: Instruction
cpu4/index2: 512K, level: 2, type: Unified
cpu4/index0: 64K, level: 1, type: Data
cpu4/index3: 3072K, level: 3, type: Unified
cpu4/index1: 64K, level: 1, type: Instruction
cpu5/index2: 512K, level: 2, type: Unified
cpu5/index0: 64K, level: 1, type: Data
cpu5/index3: 3072K, level: 3, type: Unified
cpu5/index1: 64K, level: 1, type: Instruction
cpu6/index2: 512K, level: 2, type: Unified
cpu6/index0: 64K, level: 1, type: Data
cpu6/index3: 3072K, level: 3, type: Unified
cpu6/index1: 64K, level: 1, type: Instruction
cpu7/index2: 512K, level: 2, type: Unified
cpu7/index0: 64K, level: 1, type: Data
cpu7/index3: 3072K, level: 3, type: Unified
cpu7/index1: 64K, level: 1, type: Instruction

| Firefly ITX-3588J HDMI(Linux) | no cpufreq support | 5.10 | Focal arm64 | 8750 | 295440 | 560200 | 10190 | 15080 | - |