Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Results: Raspberry Pi Compute Module 5 #106

Open
geerlingguy opened this issue Nov 27, 2024 · 6 comments
Open

Results: Raspberry Pi Compute Module 5 #106

geerlingguy opened this issue Nov 27, 2024 · 6 comments

Comments

@geerlingguy
Copy link

https://0x0.st/XRKg.txt — from geerlingguy/sbc-reviews#58

@ThomasKaiser
Copy link
Owner

ThomasKaiser commented Nov 27, 2024

Memory performance (both bandwidth and latency) is slightly lower compared to my RPi 5B tested back in March.

But would need a retest with my board since maybe it's not related to hardware but settings (newer firmware release being more conservative wrt RAM timings).

Edit: the product page talks about '1GB, 2GB, 4GB, 8GB LPDDR4-4267 SDRAM with ECC' while the announcement blog post doesn't mention ECC at all.

@geerlingguy
Copy link
Author

@ThomasKaiser - the ECC is just the standard LPDDR4x on-chip ECC that (IIRC) is necessary due to the speed/size of the chips, for consistent performance.

Regarding memory performance, I may need to test that on all my boards, I think that was on the 4GB RAM / 32GB eMMC module, maybe a different module SKU could perform different?

@ThomasKaiser
Copy link
Owner

maybe a different module SKU could perform different?

Sure. But I would've expected better memory performance now since a few tweaks have been applied over time to firmware/bootloader (that is not based on ThreadX any more but whatever else they're not talking about)

@geerlingguy
Copy link
Author

Updated results with the latest Pi OS update (which includes NUMA faking and the latest SDRAM tweaks, which surprisingly result in lower raw memory performance as measured by tinymembench, but all the practical benchmarks I'm running see speedups (sometimes dramatic).

Link: https://0x0.st/Xh2H.txt

@geerlingguy
Copy link
Author

geerlingguy commented Dec 9, 2024

Some interesting diffs:

- libc memcpy copy                                 :   5705.0 MB/s (3, 0.2%)
- libc memchr scan                                 :  13722.6 MB/s (2)
- libc memset fill                                 :  12567.9 MB/s (3, 0.9%)
+ libc memcpy copy                                 :   5892.4 MB/s (2)
+ libc memchr scan                                 :  14188.5 MB/s (2)
+ libc memset fill                                 :   9675.1 MB/s (3, 1.4%)

-  * memcpy: 5705.0 MB/s, memchr: 13722.6 MB/s, memset: 12567.9 MB/s
-  * 16M latency: 119.2 118.6 118.8 117.8 120.8 134.6 130.2 139.9 
-  * 128M latency: 136.3 135.2 147.2 135.1 136.4 134.9 135.8 137.1 
-  * 7-zip MIPS (3 consecutive runs): 11250, 11265, 11217 (11240 avg), single-threaded: 3164
-  * `aes-256-cbc     540307.67k  1003613.67k  1256029.53k  1332866.39k  1365516.29k  1367834.62k`
-  * `aes-256-cbc     540608.23k  1003560.11k  1255918.42k  1332845.23k  1365215.91k  1368135.00k`
+  * memcpy: 5892.4 MB/s, memchr: 14188.5 MB/s, memset: 9675.1 MB/s
+  * 16M latency: 100.1 101.0 103.3 100.9 99.88 114.8 130.4 146.9 
+  * 128M latency: 116.7 115.2 116.6 115.2 119.5 115.8 116.6 118.4 
+  * 7-zip MIPS (3 consecutive runs): 11819, 11858, 11809 (11830 avg), single-threaded: 3306
+  * `aes-256-cbc     540521.66k  1003777.26k  1256054.36k  1332929.88k  1365549.06k  1368053.08k`
+  * `aes-256-cbc     540683.01k  1003568.53k  1256005.03k  1332878.68k  1365235.03k  1368211.46k`

@ThomasKaiser
Copy link
Owner

ThomasKaiser commented Dec 9, 2024

which surprisingly result in lower raw memory performance as measured by tinymembench

Well, tinymembench also reveals latency having improved a lot (I call the firmware still ThreadX since RPi guys just tell it's not an RTOS any more that brings up the VideoCore but don't tell what it is):

                ThreadX 26826259 from 2024/09/23         ThreadX 3858f977 from 2024/12/07
block size : single random read / dual random read    single random read / dual random read
    524288 :    4.0 ns          /     4.1 ns 	         3.9 ns          /     4.7 ns 
   1048576 :   10.7 ns          /    11.2 ns 	         9.7 ns          /    11.2 ns 
   2097152 :   18.8 ns          /    20.9 ns 	        16.9 ns          /    16.9 ns 
   4194304 :   54.1 ns          /    82.0 ns 	        50.3 ns          /    74.4 ns 
   8388608 :   85.1 ns          /   114.2 ns 	        78.8 ns          /   102.9 ns 
  16777216 :  101.1 ns          /   127.9 ns 	        93.2 ns          /   112.3 ns 
  33554432 :  111.3 ns          /   133.9 ns 	       101.7 ns          /   117.2 ns 
  67108864 :  115.8 ns          /   138.3 ns 	       106.5 ns          /   120.3 ns 

(starting with 512K since numbers prior to that are identical and most probably not DRAM but internal caches anyway)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants