Skip to content

Latest commit

 

History

History
126 lines (85 loc) · 2.94 KB

README.md

File metadata and controls

126 lines (85 loc) · 2.94 KB

Credit-Card-Fraud-Detection-

Credit Card Fraud Detection using Logistic Regression on credit card dataset

As this is a binary classification problem we will be using Logistic Regression model for model training

Workflow of model

  • Collection of data
  • Data Preprocessing
  • Splitting test and training data
  • Model Training
  • Model Evaluation
  • Prediction System

Dependencies used :

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
# importing data

transaction_dataset= pd.read_csv("/content/drive/MyDrive/google_collab/creditcard.csv")
transaction_dataset.head(10)

Data analysis

  • shape
  • info()
  • describe()
  • isnull
  • count_values()
  • dtypes

Sampling

  • 0 : Normal transaction
  • 1 : Fraudulent transaction
legit = transaction_dataset[transaction_dataset.Class == 0]
fraud = transaction_dataset[transaction_dataset.Class == 1]

comparing the samples

# comparing the values for both transaction 
transaction_dataset.groupby('Class').mean()

Under-Sampling

  • build a sample dataset having similar distribution of normal and fraudulent transactions.
  • number of fraudulent transaction is = 492

Visualization of data

plt.figure(figsize = (20,11))
# heatmap size in ration 16:9

sns.heatmap(new_transaction_dataset2.corr(), annot = True, cmap = 'coolwarm')
# heatmap parameters

plt.title("Heatmap for correlation matrix for credit card data ", fontsize = 22)
plt.show()

001

002

Splitting data (features and target)

X = new_transaction_dataset2.drop(columns = 'Class', axis = 1)
Y = new_transaction_dataset2['Class']

Splitting into training and test

X_train, X_test, Y_train, Y_test = train_test_split(X,Y, test_size = 0.2, stratify = Y, random_state = 2)

Model Training

model = LogisticRegression()
model.fit(X_train, Y_train)

Model Evaluation

print("\nAccuracy on Training data ",traning_data_accuracy,"\n")
print("Accuracy on Training data ",test_data_accuracy)

image


Contributor : Ankit Nainwal

Other Models

Please ⭐⭐⭐⭐⭐