Handle Imbalanced Datasets

Accompanying code for the paper:

Y. Fathy, M. Jaber and A. Brintrup, "Learning With Imbalanced Data in Smart Manufacturing: A Comparative Analysis," in IEEE Access, vol. 9, pp. 2734-2757, 2021, doi: 10.1109/ACCESS.2020.3047838.

1_DataExploration.ipynb: is to explore Scania Dataset, impute the missing values and apply dimensionality reduction

Case I: Predictoin on original data

2_PredictiveModels_NoPCA.ipynb: performance of classifiers (logistic regression, XGBoost, Random Forest) on original Scania data (without augmenting synthetic faulty data samples). This includes imputing the missing values and no dimensionality reduction.
2_PredictiveModels_PCA.ipynb (reproduce results of Table 2 in the paper): performance of classifiers (logistic regression, XGBoost, Random Forest) on original Scania data (without augmenting synthetic faulty data samples). This includes imputing the missing values and dimensionality reduction on original using PCA (PCs = 11).

Case II: Predictoin with augmenting artificial faulty data samples using SMOTE

3_PredictiveModels_NoPCA-SMOTE.ipynb:: performance of classifiers (logistic regression, XGBoost, Random Forest) with augmenting artificial faulty data samples obtained using SMOTE (with different Imbalance Ratio) to original data (this includes data imputation of original data to fill missing values).
3_PredictiveModels_PCA-SMOTE.ipynb (reproduce results of Table 3 in the paper): performance of classifiers (logistic regression, XGBoost, Random Forest) with augmenting artificial faulty data samples obtained using SMOTE (with different Imbalance Ratio) to original data (this includes data imputation of original data to fill missing values and dimensionality reduction on original using PCA (PCs = 11)).

Case IV: Predictoin with augmenting artificial faulty data samples using GAN/CGAN

4_GAN_CGAN_Scania_PCA.ipynb: to generate data using GAN/CAGN. This includes imputing the missing values and dimensionality reduction on original using PCA (PCs = 11).
5_Optimise_PredictiveModels_PCA-GAN.ipynb(reproduce results of Table 4 in the paper): performance of classifiers (logistic regression, XGBoost, Random Forest) with augmenting artificial faulty data samples obtained using GAN (with different Imbalance Ratio using 4_GAN_CGAN_Scania_PCA.ipynb) to original data (this includes data imputation of original data to fill missing values and dimensionality reduction on original using PCA (PCs = 11)).
5_Optimise_PredictiveModels_PCA-CGAN.ipynb(reproduce results of Table 6 in the paper): performance of classifiers (logistic regression, XGBoost, Random Forest) with augmenting artificial faulty data samples obtained using CGAN (with different Imbalance Ratio using 4_GAN_CGAN_Scania_PCA.ipynb) to original data (this includes data imputation of original data to fill missing values and dimensionality reduction on original using PCA (PCs = 11)).

Case III: Predictoin with augmenting artificial faulty data samples using WGAN/WCGAN

4_WGAN_CWGAN_Scania_PCA.ipynb: to generate data using WGAN/WCAGN . This includes imputing the missing values and dimensionality reduction on original using PCA (PCs = 11).
5_Optimise_PredictiveModels_PCA-WGAN.ipynbreproduce results of Table 5 in the paper): performance of classifiers (logistic regression, XGBoost, Random Forest) with augmenting artificial faulty data samples obtained using WGAN (with different Imbalance Ratio using 4_WGAN_CWGAN_Scania_PCA.ipynb) to original data (this includes data imputation of original data to fill missing values and dimensionality reduction on original using PCA (PCs = 11)).
5_Optimise_PredictiveModels_PCA-CWGAN.ipynb (reproduce results of Table 7 in the paper): performance of classifiers (logistic regression, XGBoost, Random Forest) with augmenting artificial faulty data samples obtained using CWGAN (with different Imbalance Ratio using 4_WGAN_CWGAN_Scania_PCA.ipynb) to original data (this includes data imputation of original data to fill missing values and dimensionality reduction on original using PCA (PCs = 11)).

Note: this repo provides some depth behind the paper and I am not actively maintaining it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Handle Imbalanced Datasets

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
1_DataExploration.ipynb		1_DataExploration.ipynb
2_PredictiveModels_NoPCA.ipynb		2_PredictiveModels_NoPCA.ipynb
2_PredictiveModels_PCA.ipynb		2_PredictiveModels_PCA.ipynb
3_PredictiveModels_NoPCA-SMOTE.ipynb		3_PredictiveModels_NoPCA-SMOTE.ipynb
3_PredictiveModels_PCA-SMOTE.ipynb		3_PredictiveModels_PCA-SMOTE.ipynb
4_GAN_CGAN_Scania_PCA.ipynb		4_GAN_CGAN_Scania_PCA.ipynb
4_WGAN_CWGAN_Scania_PCA.ipynb		4_WGAN_CWGAN_Scania_PCA.ipynb
5_Optimise_PredictiveModels_PCA-CGAN.ipynb		5_Optimise_PredictiveModels_PCA-CGAN.ipynb
5_Optimise_PredictiveModels_PCA-CWGAN.ipynb		5_Optimise_PredictiveModels_PCA-CWGAN.ipynb
5_Optimise_PredictiveModels_PCA-GAN.ipynb		5_Optimise_PredictiveModels_PCA-GAN.ipynb
5_Optimise_PredictiveModels_PCA-WGAN.ipynb		5_Optimise_PredictiveModels_PCA-WGAN.ipynb
LICENSE		LICENSE
README.md		README.md

License

YasminFathy/HandleImbalancedDatasets

Folders and files

Latest commit

History

Repository files navigation

Handle Imbalanced Datasets

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages