Skip to content

Latest commit

 

History

History
51 lines (45 loc) · 1.85 KB

README.md

File metadata and controls

51 lines (45 loc) · 1.85 KB

Diabetes Prediction Project

Problem:

About one in seven U.S. adults has diabetes now, according to the Centers for Disease Control and Prevention. But by 2050, that rate could skyrocket to as many as one in three.

Solution:

In this project, I have built a classifier to predict Diabetes disease.

I have implemented different classification models on the dataset and evaluated the performance of the models.

Data Info:

The diabetes data set was originated from UCI Machine Learning Repository and can be downloaded from here (Or also provided in the repo).

Libraries Used:

  1. Numpy (for linear-algebra)
  2. Pandas (for data manipulation)
  3. Matplotlib (for data visualization)
  4. Seaborn (for data visualization)
  5. Scikit-learn (for data modeling)

Contents:

  • Importing the required libraries.
  • Importing and Reading the data.
  • Exploratory Data Analysis (EDA)
  • Data Visualizations
    • Heatmap
    • Pairplot
    • Countplot
  • Data Modeling
    • PART-1 k-Nearest Neighbors Classification Model
      • Modeling the data
      • Evaluating the model performance
    • PART-2 Logistic Regression Model
      • Modeling the data
      • Evaluating the model performance
    • PART-3 Decision Tree Classifier
      • Modeling the data
      • Evaluating the model performance
      • Feature Importance Bar plot
    • PART-4 Random Forest Classifier
      • Modeling the data
      • Evaluating the model performance
      • Feature Importance Bar plot
    • PART-5 Gradient Boosting Classifier
      • Modeling the data
      • Evaluating the model performance
      • Tuning Hyperparameters
      • Feature Importance Bar plot
    • PART-6 Support Vector Machines Classifier
      • Modeling the data
      • Evaluating the model performance