Classification of GPU Run Process as high or low time-consuming using various classification Algorithms.
In this project, we will learn and implement SVM, Decision Trees and XGBoost Algorithms method. The purpose of this project is to compare different classification algorithms and test its capabilities for classifying records in a dataset. The dataset contains information about GPU kernel performance on matrix multiplication (A * B = C) where, A, B and C are metrices. Our goals are highlighted below:
• Implementing SVM, Decision Trees and XGBoost Algorithms to classify the GPU run time as high or low time-consuming process.
In this project, we have performed various experimentations by using different kernels with SVM algorithm, pruning Decision Trees and XGBoost to avoid overfitting. Based on various performance measures, the algorithms were evaluated and the best algorithm for each dataset was finalized.
- General Info
- Variable Description
- Technologies and Methods
- Project Report
- Status
- Contact
For this project, we have used the SGEMM GPU kernel performance Data Set available for download at UCI ML Website. The dataset measures the running time of various matrix multiplication processes where each matrix is of shape 2048 × 2048. The total number of observations in the dataset is 241600. For each test, four runs were performed, and their results were presented in the file.
For our project, we have taken the average of four runs and based on the Median value, our target/dependent variable was created with values higher than median run classified as '1' and below median values were classified as '0'.
Following table explains the variables in the dataset:
- MWG : per-matrix 2D tiling at workgroup level: {16, 32, 64, 128} (integer)
- NWG : per-matrix 2D tiling at workgroup level: {16, 32, 64, 128} (integer)
- KWG : inner dimension of 2D tiling at workgroup level: {16, 32} (integer)
- MDIMC : local workgroup size: {8, 16, 32} (integer)
- NDIMC : local workgroup size: {8, 16, 32} (integer)
- MDIMA : local memory shape: {8, 16, 32} (integer)
- NDIMB : local memory shape: {8, 16, 32} (integer)
- KWI : kernel loop unrolling factor: {2, 8} (integer)
- VWM : per-matrix vector widths for loading and storing: {1, 2, 4, 8} (integer)
- VWN : per-matrix vector widths for loading and storing: {1, 2, 4, 8} (integer)
- STRM : enable stride for accessing off-chip memory within a single thread: {0, 1} (categorical)
- STRN : enable stride for accessing off-chip memory within a single thread: {0, 1} (categorical)
- SA : per-matrix manual caching of the 2D workgroup tile: {0, 1} (categorical)
- SB : per-matrix manual caching of the 2D workgroup tile: {0, 1} (categorical)
- Run1 (ms) : performance times in milliseconds for 4 independent runs using the same parameters. They range between 13.25 and 3397.08.
- Run2 (ms) : performance times in milliseconds for 4 independent runs using the same parameters. They range between 13.25 and 3397.08.
- Run3 (ms) : performance times in milliseconds for 4 independent runs using the same parameters. They range between 13.25 and 3397.08.
- Run4 (ms) : performance times in milliseconds for 4 independent runs using the same parameters. They range between 13.25 and 3397.08.
- Avg Run Time : Average Run Time (Average of the 4 runs)
- Python (Pandas, Numpy, Matplotlib, Seaborn, Scikit-Learn)
- Microsoft Excel
- SVM (Linear, RBF and Polynomial kernel), Decision Tree, XGBoost, K-Fold Cross-validation
The project report is uploaded to the Git-Hub site and can be referenced here: Project Report
Project is: finsished
If you loved what you read here and feel like we can collaborate to produce some exciting stuff, or if you just want to shoot a question, please feel free to connect with me on LinkedIn.