As a bank decides which applicants to provide loans, they may wish to predict if the applicant will default on the loan. Through automated feature engineering, we can identify the predictive patterns in the financial data that can be used to ensure that clients capable of repayment are not rejected.
In this tutorial, we show how Featuretools can be used to perform feature engineering on a multi-table dataset of 300 thousand applicant financial information provided by Home Credit to train an accurate machine learning model to predict what if an applicant will repay a loan.
- We automatically generate 1820 features using Deep Feature Synthesis.
- We are able to generate features, check that we are content with those features, and create the feature matrix.
- We develop are able to generate features in 1 hour vs 10 hours with manual feature engineering.
-
Clone the repo
git clone https://github.com/Featuretools/predict-loan-repayment.git
-
Install the requirements
pip install -r requirements.txt
You will also need to install graphviz for this demo. Please install graphviz according to the instructions in the Featuretools Documentation
-
Download the data
You can download the data from Kaggle. After downloading, save the CSV to a directory called
input
in the root of this repository. -
Run the Tutorial notebook:
Automated Loan Repaymentjupyter notebook
Featuretools is an open source project created by Feature Labs. To see the other open source projects we're working on visit Feature Labs Open Source. If building impactful data science pipelines is important to you or your business, please get in touch.
Any questions can be directed to [email protected]