Some of the CSV files could not be commited to this repo because they are so big.
Fraud detection dataset. https://www.kaggle.com/datasets/mlg-ulb/creditcardfraud?resource=download Second fraud detection dataset. https://www.kaggle.com/ealaxi/paysim1/download
The aim of this repo is to describe the different uses for AI/ML. For example, the fraud detection system uses models to detect outliers/abnormal behaviour/classify whereas the Crytpo Price Predictor trains/uses models to predict pricing.
Data is imported. Before training, a process of splitting the labelled is performed. Most of the labelled data is used to train the model, while a small portion is used for testing.
The app.py
script defined a test dataset size of test_size = 0.2
. Meaning 80% of the data is used to train the model. This an important detail to note for the models training.
In the app.py
script, 5 models are trained for fraud detection. Decision Tree, K-Nearest Neighbors, Logistic Regression, SVM, Random Forest Tree. These models are very common and the models used are a key detail to be noted.
After the model is trained, and before the model is deployed to production, it is evaluated using the 20% of the labelled dataset set aside for testing.
The confusion matrix is one of the most important assets to evaluating how a model performed againt the test set. The app.py
script generates this confusion matrix.
Other evaluation metrics are calucalted from the confusion matrix: Accuracy, precision, recall, f1 and many more. Some data scientists even define their own methods of evaluation.
This system calculates accuracy and f1 for each of the models and displays these metrics on the console.