Skip to content

J-Rodrigues0/drivendata-flu-shot-learning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Flu Shot Learning: Predict H1N1 and Seasonal Flu Vaccines

The aim of this project is to predict the probability of a subject taking the H1N1 and Seasonal flu vaccines according to the provided data. This project is built for the data science competition: https://www.drivendata.org/competitions/66/flu-shot-learning/.

In this readme, I will explain the steps I took to achieve my results in the competition:

  • AUROC = 0.8442
  • Top 14% of participants - as of writing this

Repository Structure

Sub-folders:

  • input_data - raw data from competition;
  • interim_data - preprocessed data to be used in modelling;
  • output_data - model predictions to submit;
  • models - pickled models to be imported by the notebooks.

Main folder:

  • All the notebooks (.ipynb) and respective scripts (.py) for the project;
  • requirements.txt - project dependencies;

Notebooks interpretation

  1. EDA
  2. PREPROCESSING
  3. MODEL_SELECTION - performs cross-validation of models to select the best one;
  4. TUNING - tunes the selected models hyperparameters, to improve score;
  5. GENERAL - joins all the steps and performs predictions.

Solution Framework

In order to solve the problem I have applied the following Data Science mindset:

  1. Explore the data using EDA - gain insight on the main aspects of the data such as distributions, trends, predictors, etc.
  2. Clean data in PREPROCESSING - apply the gained insight to preprocess the data and getting it ready for model consumption.
  3. Perform cross validation of MODELS - select the models I want to use; these models will be a basis to test different preprocessing assumptions and will eventually be part of the final model;
  4. Tune some of the models using OPTUNA;
  5. Get everything together and make predictions;
  6. Iterate through every step applying different preprocessing assumptions, model building techniques and trying to optimize the model to the AUROC metric.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published