You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The optimized code combines data into 64-byte packets stored in temporary buffer, and then writes whole packet into output bucket. This improves TLB usage efficiency and avoids hardware write-combining, thus improving resulting speed (for my code and my computer) 5 times!
The text was updated successfully, but these errors were encountered:
Bulat-Ziganshin
changed the title
Make LSD radix sort faster by employing write-combining trick
Make LSD radix sort faster with software write-combining
Jul 2, 2018
Naive code: https://github.com/Bulat-Ziganshin/MT-LZ/blob/master/RadixSort.cpp#L38
Speed - 19 MKeys/s: https://github.com/Bulat-Ziganshin/MT-LZ/blob/master/results.txt#L377
Optimized code: https://github.com/Bulat-Ziganshin/MT-LZ/blob/master/RadixSort.cpp#L51
Speed - 94 MKeys/s: https://github.com/Bulat-Ziganshin/MT-LZ/blob/master/results.txt#L329
Speed measured on 100M 32-bit uniform keys, sorted by 4 passes of 256-bucket sort, using single core of i7-4770: https://github.com/Bulat-Ziganshin/MT-LZ/blob/master/TestRadixSort.cpp
The optimized code combines data into 64-byte packets stored in temporary buffer, and then writes whole packet into output bucket. This improves TLB usage efficiency and avoids hardware write-combining, thus improving resulting speed (for my code and my computer) 5 times!
The text was updated successfully, but these errors were encountered: