Skip to content

Latest commit

 

History

History
27 lines (15 loc) · 1.27 KB

README.md

File metadata and controls

27 lines (15 loc) · 1.27 KB

Pi-GEMM

This is a GPU-accelerated implementation of the GEMM matrix multiply function for the Raspberry Pi.

The core is an assembler loop for Broadcoms QPU processor, and is run as a custom program on their GPU. It produces a substantial speedup compared to an optimized CPU version, with the included test running in 500ms on my overclocked Pi, rather than 8,000 ms using the official Atlas library on Raspbian on the same device.

Getting Started

Download the repo, sudo apt-get install libatlas-dev m4, run make, and then run sudo ./gemm.

Notes

It always overwrites the output 'C' matrix, rather than incrementing it by 'beta'.

You have to run the program as 'su', so that the library can get direct access to the GPU.

License

All code is under the BSD three-clause license, included in this folder as LICENSE.

Credits

Written by Pete Warden at Jetpac Inc.

Thanks to eman on the Pi forums for the SHA-256 examples, Andrew Holme for creating the Fourier library, Herman Hermitage for his QPU documentation work, and Broadcom for releasing the hardware specifications of their GPU!