Stroke_Prediction model for DSTI python labs project
The objective of this project was to train a machine learning model to predict whether a patient had a stroke or not, using a data set of 5110 patients. Each patient represented an observation with variables such as stroke (yes/no), as well as demographic variables (i.e., gender, age), lifestyle (i.e., smoking) and health history (i.e., hypertension, BMI, glucose, etc.) that could be used to predict stroke. The complete methods of this project included an exploratory data analysis, feature engineering and selection for a model, model training, and model evaluation. There is a summary project report as well.
This project was conceived as part of a lab course in Machine Learning with Python through the Data Science Tech Institute. The stroke prediction project was based on the popular data set on Kaggle.com, with many examples of machine learning: https://www.kaggle.com/datasets/fedesoriano/stroke-prediction-dataset.
See requirements.txt for project dependencies. At a high level, critical python libraries for this project included:
numpy np
pandas pd
matplotlib.pyplot plt
seaborn sns
scipy stats
This project was first implemented in the Jupyter Notebook environment (6.4.11) of Anaconda using python version 3.7.
As of this writing, 934 code examples performed machine learning models with this data. Therefore the approach here is not novel, but please share any comments and suggestions in the issues section. They will indeed be helpful.