My final project for the course "How to Win a Data Science Competition at Coursera", celebrated in October 2018. The project is actually a competition hosted by Kaggle:
The project is split in two Jupyter notebooks. final_project_EDA contains the data exploration, while final_project_modelling contains the tasks of feature engineering, model optimization and ensembling. The notebooks should be self-explanatory. There is a script called that is used by the notebooks to read the data files from disk and one configuration file called settings.ini with the data file paths.
The notebooks can be executed on a Python 3.6 environment with the libraries described in the file requirements.txt. To reproduce the results you just need to copy the competition datasets under the datasets/ folder and run the notebooks cells.
Final submission can be found in submission.csv and final models are also serialized in pickle files.