Flu Shot Learning: Predict H1N1 and Seasonal Flu Vaccines

The aim of this project is to predict the probability of a subject taking the H1N1 and Seasonal flu vaccines according to the provided data. This project is built for the data science competition: https://www.drivendata.org/competitions/66/flu-shot-learning/.

In this readme, I will explain the steps I took to achieve my results in the competition:

AUROC = 0.8442
Top 14% of participants - as of writing this

Repository Structure

Sub-folders:

input_data - raw data from competition;
interim_data - preprocessed data to be used in modelling;
output_data - model predictions to submit;
models - pickled models to be imported by the notebooks.

Main folder:

All the notebooks (.ipynb) and respective scripts (.py) for the project;
requirements.txt - project dependencies;

Notebooks interpretation

EDA
PREPROCESSING
MODEL_SELECTION - performs cross-validation of models to select the best one;
TUNING - tunes the selected models hyperparameters, to improve score;
GENERAL - joins all the steps and performs predictions.

Solution Framework

In order to solve the problem I have applied the following Data Science mindset:

Explore the data using EDA - gain insight on the main aspects of the data such as distributions, trends, predictors, etc.
Clean data in PREPROCESSING - apply the gained insight to preprocess the data and getting it ready for model consumption.
Perform cross validation of MODELS - select the models I want to use; these models will be a basis to test different preprocessing assumptions and will eventually be part of the final model;
Tune some of the models using OPTUNA;
Get everything together and make predictions;
Iterate through every step applying different preprocessing assumptions, model building techniques and trying to optimize the model to the AUROC metric.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
.ipynb_checkpoints		.ipynb_checkpoints
help_functions		help_functions
input_data		input_data
interim_data		interim_data
models		models
output_data		output_data
README.md		README.md
flu_shot_learning-EDA.ipynb		flu_shot_learning-EDA.ipynb
flu_shot_learning-EDA.py		flu_shot_learning-EDA.py
flu_shot_learning-GENERAL.ipynb		flu_shot_learning-GENERAL.ipynb
flu_shot_learning-GENERAL.py		flu_shot_learning-GENERAL.py
flu_shot_learning-MODEL_SELECTION.ipynb		flu_shot_learning-MODEL_SELECTION.ipynb
flu_shot_learning-MODEL_SELECTION.py		flu_shot_learning-MODEL_SELECTION.py
flu_shot_learning-PREPROCESSING.ipynb		flu_shot_learning-PREPROCESSING.ipynb
flu_shot_learning-PREPROCESSING.py		flu_shot_learning-PREPROCESSING.py
flu_shot_learning-TUNING.ipynb		flu_shot_learning-TUNING.ipynb
flu_shot_learning-TUNING.py		flu_shot_learning-TUNING.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Flu Shot Learning: Predict H1N1 and Seasonal Flu Vaccines

Repository Structure

Notebooks interpretation

Solution Framework

About

Releases

Packages

Languages

J-Rodrigues0/drivendata-flu-shot-learning

Folders and files

Latest commit

History

Repository files navigation

Flu Shot Learning: Predict H1N1 and Seasonal Flu Vaccines

Repository Structure

Notebooks interpretation

Solution Framework

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages