-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sata Performance #5
Comments
I just tried with all six disks. here is the output. notice the 19 MB/s per drive. Are all the sata ports multiplexed together on one lane or something?
|
According to the MediaTek specs the chips has 3x PCI express lanes and looking at the specs for the ASM1061 (and the commodity 1061 cards you can buy on ebay) they claim you can run 2 full sata III 6gbp/s ports on the pci-ex lane. Looking at the bottom of the board (gnubee PC2) I can see 3 ASM1061 chips all seemingly connected to a different lane dirrectly to the mediatek chip. So from a theoretical hardware perspective that all seems to add up to the prospect of 6 full speed sata ports! But as you can see above the max throughput I can put through all the busses at the same time is less then 1 saturated SATA III port. Far far less! Only getting around 100MB/s when "theoretically" even one sata II port should be able to hit 300MB/s. Even the orignal pci-ex v1.0a has a theoretical throughput of 250MB/s so even if it was 3 pci-ex v1.0a lanes we should be able to see higher than 100MB/s across all devices. This suggests there's a kernel issue or driver issue somewhere that needs to be addressed, unless I'm misinterpreting the specs (or maybe a hardware bottle neck elsewhere?). I'd love to help work on this if you think its an issue. (sorry for the spam this is my first time hacking at a kernel for an sbc device and I'm loving it). Just need some direction of where to focus my efforts! http://www.asmedia.com.tw/eng/e_show_products.php?item=118 |
I own a GB-PC2, up with the latest kernel provided by @neilbrown (thanks a LOT, by the way). I tested a parallel dd on 3 disks, without RAID. I have similar results. Around 37MB/s on all 3, 110MB/s overall. Same overall amount for 2 paralleled dd. |
I think that CPU0 handles all the interrupts. I wonder if distributing them would help (or hurt. These three files all contain 'f' If you set them to It might be an interesting experiment |
I don't know how to change this. These "files" are read only. |
I'm convinced the marvell chip is using a pcie "switch" internally to provide the three lanes rather than it genuinely having 3 lanes. I did find a datasheet somewhere which made me suspect this but I can't remember where (posted a link on gnubee google group i think) - https://groups.google.com/forum/#!topic/gnubee/5_nKjgmKSoY |
Here are some tests I did that can hopefully give people things to compare to. I'm running a raid-5 array with bcache and btrfs. Bcache doesn't do anything for performance here, in fact is should do the opposite, but the main reason I'm using a GNUBEE is because I'm working on a solar-powered off-grid solution and I had hoped to be able to minimize power usage using it. #Copying dev-zero into a ramdisk
traverseda@storage:~$ dd if=/dev/zero of=/tmp/ramdisk/test.img bs=10k count=10k
10240+0 records in
10240+0 records out
104857600 bytes (105 MB, 100 MiB) copied, 0.705581 s, 149 MB/s
#Copying dev-zero into dev-zero, this might be no-op?
traverseda@storage:~$ dd if=/dev/zero of=/dev/zero bs=10k count=10k
[ lines omitted for brevity from now on ]
104857600 bytes (105 MB, 100 MiB) copied, 0.0727784 s, 1.4 GB/s
#Copying dev-zero into the ext4 root partition, which is not part of any raid array or cache
# and is directly on an SSD
traverseda@storage:~$ dd if=/dev/zero of=~/test.img bs=10k count=10k
104857600 bytes (105 MB, 100 MiB) copied, 1.72939 s, 60.6 MB/s
#Copying dev-zero onto a raid array with 5 drives, plus bcache.
traverseda@storage:~$ dd if=/dev/zero of=/mnt/array/traverseda/test.img bs=10k count=10k
104857600 bytes (105 MB, 100 MiB) copied, 4.33028 s, 24.2 MB/s
# Real on-disk size of the file, since it's /dev/zero
traverseda@storage:~$ sudo compsize /mnt/array/traverseda/test.img
Type Perc Disk Usage Uncompressed Referenced
TOTAL 3% 3.1M 100M 100M
zstd 3% 3.1M 100M 100M
# Copying a randomly-generated file to /dev/zero, to ensure we get good performance
traverseda@storage:~$ dd if=/tmp/ramdisk/random.img of=/dev/zero bs=10k count=10k
104857600 bytes (105 MB, 100 MiB) copied, 0.662688 s, 158 MB/s
# Copying same to btrfs
dd if=/dev/urandom of=/mnt/array/traverseda/test.img bs=10k count=10k
104857600 bytes (105 MB, 100 MiB) copied, 8.40688 s, 12.5 MB/s
# Comparing compressed sizes...
traverseda@storage:~$ sudo compsize /mnt/array/traverseda/random.img
Type Perc Disk Usage Uncompressed Referenced
TOTAL 100% 98M 98M 98M
none 100% 98M 98M 98M So you can see a few things here. One is that more CPU-intensive operations like compression actually speed up the file transfer, but not as much as you'd think. I push the CPU pretty hard with multiple layers of indirection and I actually get better performance on more compressible files. If I completely remove bcache I get exactly the same result as near as I can tell. So a few interesting points
|
I get this on a Crucial 120GB BX500 sata SSD plugged in first slot, on a XFS filesystem, latest image:
I roughly tried various block sizes, 16k looks like the sweet spot... |
This is probably more of a question as you seem to have more experience with hardware than I do (I am a programmer). I have been messing around with all the various raid combinations on the device (raid 0, 1 and 10) and no matter which raid combo I try I get the exact same write speed (about 72mbps) which I believe is the max write speed of the disks I am using (when used singularly)
So I tried a dd if=/dev/zero of=test bs=1M count=1000 on two threads to two of the drives and got the same combined write speed 72mbps (when you add the speeds together)
This got me scratching my head a bit. Is there only 1 6gbps lane all the ports are multiplexed through or is there something strange going on? I should have at least gotten above the write speed of a singular drive?
I have tried on old kernels too (like the original v3 kernel) and different debian distros (jessie/stretch and buster) and I think I am getting the same results.
Is there a hardware limitation I am missing or something?
On a side note, thanks for all your hard work! It has been inspiring for me playing with your tools and building the kernels etc.
The text was updated successfully, but these errors were encountered: