This project aims to predict whether a person has diabetes or not based on health-related traits using machine learning techniques. Early risk assessment for diabetes can result in prompt action and better patient outcomes, making this task crucial for the healthcare industry.
The dataset used for this project is the Pima Indians Diabetes Database. It contains several health-related variables, including blood pressure, BMI, glucose levels, and pregnancy status, among others.
Three machine learning models were implemented and evaluated:
- Logistic Regression
- K-Nearest Neighbours (KNN)
- Deep Neural Network using TensorFlow
The performance of each model was evaluated using the following metrics:
- Accuracy
- Precision
- Recall
- F1-score
diabetes.csv
: The dataset used for training and evaluation.prediction.ipynb
: Python script implementing different machine learning methods.CS3AI18 CW.pdf
: PDF file containg the specifiction for this project.
- Clone the repository to your local machine.
- Install the necessary dependencies (e.g., TensorFlow, scikit-learn).
- Run each Python script to train and evaluate the respective model.
- Analyze the results and compare the performance of the models.
- Explore additional feature engineering techniques.
- Experiment with different machine learning algorithms.
- Collect more data to improve model performance.