Skip to content

A Raspberry Pi GPU-accelerated implementation of the GEMM matrix-multiply function

License

Notifications You must be signed in to change notification settings

jetpacapp/pi-gemm

Repository files navigation

Pi-GEMM

This is a GPU-accelerated implementation of the GEMM matrix multiply function for the Raspberry Pi.

The core is an assembler loop for Broadcoms QPU processor, and is run as a custom program on their GPU. It produces a substantial speedup compared to an optimized CPU version, with the included test running in 500ms on my overclocked Pi, rather than 8,000 ms using the official Atlas library on Raspbian on the same device.

Getting Started

Download the repo, sudo apt-get install libatlas-dev m4, run make, and then run sudo ./gemm.

Notes

It always overwrites the output 'C' matrix, rather than incrementing it by 'beta'.

You have to run the program as 'su', so that the library can get direct access to the GPU.

License

All code is under the BSD three-clause license, included in this folder as LICENSE.

Credits

Written by Pete Warden at Jetpac Inc.

Thanks to eman on the Pi forums for the SHA-256 examples, Andrew Holme for creating the Fourier library, Herman Hermitage for his QPU documentation work, and Broadcom for releasing the hardware specifications of their GPU!

About

A Raspberry Pi GPU-accelerated implementation of the GEMM matrix-multiply function

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published