Skip to content

Latest commit

 

History

History
273 lines (181 loc) · 6.04 KB

README.md

File metadata and controls

273 lines (181 loc) · 6.04 KB

website title image

👉 Implementation of ML/DL Metrics in Python 👈

Twitter


Implementation of various metrics for regression and classification problems. For Data Science and Machine Learning projects, it is important to have a good understanding of the metrics used to evaluate the performance of the model. This repository contains the implementation of various metrics for regression and classification problems. The metrics are implemented in Python and are available as a Python package. The metrics are implemented using NumPy and are implemented from scratch. The metrics are implemented using the formulae given in the Wikipedia pages for the respective metrics. The metrics are implemented in the following order:

Regression Metrics

  1. R2 Score

R2 score, also known as the coefficient of determination, is a statistical measure of how close the data are to the fitted regression line. It is also known as the coefficient of determination, or the coefficient of multiple determination for multiple regression.

$$R^2 = 1 - \frac{\sum_{i=1}^n (y_i - \hat{y}_i)^2}{\sum_{i=1}^n (y_i - \bar{y})^2}$$
  1. Mean Absolute Error
$$MAE = \frac{1}{n} \sum_{i=1}^n |y_i - \hat{y}_i|$$
  1. Mean Squared Error
$$MSE = \frac{1}{n} \sum_{i=1}^n (y_i - \hat{y}_i)^2$$
  1. Root Mean Squared Error
$$RMSE = \sqrt{\frac{1}{n} \sum_{i=1}^n (y_i - \hat{y}_i)^2}$$
  1. Mean Absolute Percentage Error
$$MAPE = \frac{100}{n} \sum_{i=1}^n \frac{|y_i - \hat{y}_i|}{y_i}$$
  1. Mean Squared Logarithmic Error
$$MSLE = \frac{1}{n} \sum_{i=1}^n (log(y_i + 1) - log(\hat{y}_i + 1))^2$$
  1. Median Absolute Error
$$MdAE = median(|y_i - \hat{y}_i|)$$
  1. Median Squared Error
$$MdSE = median((y_i - \hat{y}_i)^2)$$
  1. Median Absolute Percentage Error
$$MdAPE = median(\frac{|y_i - \hat{y}_i|}{y_i})$$
  1. Median Squared Logarithmic Error
$$MdSLE = median((log(y_i + 1) - log(\hat{y}_i + 1))^2)$$
  1. Explained Variance Score
$$EV = 1 - \frac{\sum_{i=1}^n (y_i - \hat{y}_i)^2}{\sum_{i=1}^n (y_i - \bar{y})^2}$$
  1. Max Error
$$max_error = max(|y_i - \hat{y}_i|)$$
  1. Mean Bias Error
$$MBE = \frac{1}{n} \sum_{i=1}^n (y_i - \hat{y}_i)$$
  1. Mean Percentage Error
$$MPE = \frac{100}{n} \sum_{i=1}^n \frac{y_i - \hat{y}_i}{y_i}$$
  1. Mean Squared Percentage Error
$$MSPE = \frac{100}{n} \sum_{i=1}^n \frac{(y_i - \hat{y}_i)^2}{y_i^2}$$
  1. Median Bias Error
$$MdBE = median(y_i - \hat{y}_i)$$
  1. Median Percentage Error
$$MdPE = median(\frac{y_i - \hat{y}_i}{y_i})$$
  1. Median Squared Percentage Error
$$MdSPE = median(\frac{(y_i - \hat{y}_i)^2}{y_i^2})$$
  1. Mean Absolute Scaled Error
$$MASE = \frac{1}{n} \sum_{i=1}^n \frac{|y_i - \hat{y}_i|}{\frac{1}{n-1} \sum_{i=1}^n |y_i - \bar{y}_i|}$$
  1. Mean Squared Scaled Error
$$MSSE = \frac{1}{n} \sum_{i=1}^n \frac{(y_i - \hat{y}_i)^2}{\frac{1}{n-1} \sum_{i=1}^n (y_i - \bar{y}_i)^2}$$
  1. Median Absolute Scaled Error
$$MdASE = median(\frac{|y_i - \hat{y}_i|}{\frac{1}{n-1} \sum_{i=1}^n |y_i - \bar{y}_i|})$$
  1. Median Squared Scaled Error
$$MdSSE = median(\frac{(y_i - \hat{y}_i)^2}{\frac{1}{n-1} \sum_{i=1}^n (y_i - \bar{y}_i)^2})$$

Classification Metrics

  1. Accuracy
$$Accuracy = \frac{TP + TN}{TP + TN + FP + FN}$$
  1. Precision
$$Precision = \frac{TP}{TP + FP}$$
  1. Recall
$$Recall = \frac{TP}{TP + FN}$$
  1. F1 Score
$$F1 = 2 \times \frac{Precision \times Recall}{Precision + Recall}$$
  1. Matthews Correlation Coefficient
$$MCC = \frac{TP \times TN - FP \times FN}{\sqrt{(TP + FP)(TP + FN)(TN + FP)(TN + FN)}}$$
  1. Cohen's Kappa
$$Kappa = \frac{p_o - p_e}{1 - p_e}$$

where

$$p_o = \frac{TP + TN}{TP + TN + FP + FN}$$ $$p_e = \frac{TP + FP}{TP + TN + FP + FN} \times \frac{TP + FN}{TP + TN + FP + FN} + \frac{TN + FP}{TP + TN + FP + FN} \times \frac{TN + FN}{TP + TN + FP + FN}$$
  1. Area Under the Receiver Operating Characteristic Curve (ROC AUC)
$$ROC AUC = \frac{1}{2} \sum_{i=1}^{n-1} (TPR_i - TPR_{i+1}) \times (FPR_i + FPR_{i+1})$$
  1. Area Under the Precision-Recall Curve (PR AUC)
$$PR AUC = \frac{1}{2} \sum_{i=1}^{n-1} (Recall_i - Recall_{i+1}) \times (Precision_i + Precision_{i+1})$$
  1. Hamming Loss
$$Hamming Loss = \frac{1}{n} \sum_{i=1}^n \frac{1}{m} \sum_{j=1}^m I(y_{ij} \neq \hat{y}_{ij})$$
  1. Zero-One Loss
$$Zero-One Loss = \frac{1}{n} \sum_{i=1}^n I(y_i \neq \hat{y}_i)$$
  1. Jaccard Similarity Score
$$Jaccard = \frac{TP}{TP + FP + FN}$$
  1. Fowlkes-Mallows Score
$$FM = \sqrt{\frac{TP}{TP + FP} \times \frac{TP}{TP + FN}}$$
  1. Log Loss
$$Log Loss = - \frac{1}{n} \sum_{i=1}^n \sum_{j=1}^m y_{ij} \times log(\hat{y}_{ij})$$
  1. Cross-Entropy Loss
$$Cross-Entropy Loss = - \frac{1}{n} \sum_{i=1}^n \sum_{j=1}^m y_{ij} \times log(\hat{y}_{ij}) - (1 - y_{ij}) \times log(1 - \hat{y}_{ij})$$
  1. Hinge Loss
$$Hinge Loss = \frac{1}{n} \sum_{i=1}^n \sum_{j=1}^m max(0, 1 - y_{ij} \times \hat{y}_{ij})$$
  1. Squared Hinge Loss
$$Squared Hinge Loss = \frac{1}{n} \sum_{i=1}^n \sum_{j=1}^m (max(0, 1 - y_{ij} \times \hat{y}_{ij}))^2$$
  1. Classification Error
$$Classification Error = \frac{1}{n} \sum_{i=1}^n I(y_i \neq \hat{y}_i)$$
  1. Balanced Classification Error
$$Balanced Classification Error = \frac{1}{n} \sum_{i=1}^n \frac{1}{m} \sum_{j=1}^m I(y_{ij} \neq \hat{y}_{ij})$$

Clustering Metrics