This project is a small hands-on project to practice with CUDA accelerated GPU kernels. It consists of a number of different operations or complete AI layers to run directly on the GPU. Each implementation has both a GPU and a CPU version such that the difference in performance could be seen.
- Matrix Multiplication (Naive method)
- Linear Regression
- Dense Layer
In the future, I have the intention to add a convolutional 2D layer and a Transformer encoder block.