Health Insurance Cross-Sell Prediction

Project Overview

The goal of this project is to predict whether an existing health insurance customer will buy a vehicle insurance for the next year, provided by the same company, based on various features like demographics and previous insurance details. This repository is dedicated to solving the Health Insurance Cross-Sell Prediction problem using the dataset from Kaggle. The dataset can be found here.

The dataset provides a variety of features related to the insured person, such as:

Age
Gender
Region Code
Policy Sales Channel
Driving License
Vehicle Age
Annual Premium

I'll aim to develop a Machine Learning model that can predict the target variable Response (1: Will buy insurance, 0: Will not buy insurance).

Dataset

The dataset consists of one CSV file:

dataset.csv: The full dataset with features related to health insured persons.

The target variable in the dataset is Response, which indicates whether the customer will purchase insurance.

Column Name	Description
`id`	Unique identifier for each customer
`Gender`	Gender of the customer
`Age`	Age of the customer
`Driving_License`	0: Customer does not have DL, 1: Customer has DL
`Region_Code`	Unique code for the region of the customer
`Previously_Insured`	0: Customer does not have insurance, 1: Customer has insurance
`Vehicle_Age`	Age of the customer’s vehicle
`Vehicle_Damage`	1: Customer has damaged the vehicle, 0: Customer has not damaged the vehicle
`Annual_Premium`	The premium amount for insurance
`Policy_Sales_Channel`	Channel through which the policy was sold
`Vintage`	Number of days the customer has been associated with the company
`Response`	1: Will buy insurance, 0: Will not buy insurance

Approach

1. Data Preprocessing

Handling Missing Data: Identify and handle any missing data.
Feature Engineering: Analyze categorical and numerical features to create additional informative features.
Normalization/Scaling: Normalize or scale the numerical features to prepare them for model training.

2. Exploratory Data Analysis (EDA)

Perform descriptive statistics and visualization to understand data distributions, correlations, and patterns.
Analyze class imbalance in the target variable and address it if necessary.

3. Model Development

Model Selection: We will experiment with multiple models including:
- Logistic Regression
- Decision Trees
- Random Forests
- Gradient Boosting Machines (XGBoost, LightGBM)
- Neural Networks
Hyperparameter Tuning: Optimize model parameters using cross-validation techniques.
Evaluation Metrics: Accuracy, ROC-AUC, F1-Score, Precision, and Recall will be used to evaluate the model's performance.

4. Model Deployment

Finalize the best-performing model and save it for potential deployment.
Explore model interpretability and SHAP values to understand the features driving predictions.

Requirements

The code is written in Python and requires the following libraries:

Pandas
Requests
NumPy
Scikit-learn
Matplotlib
Seaborn

Installation

Clone the repository:

git clone https://github.com/HugoTex98/health-insurance-cross-sell-prediction.git
cd health-insurance-cross-sell-prediction

Create a virtual environment and activate it:

python3 -m venv venv
source venv/bin/activate  # On Windows use `venv\Scripts\activate`

Install the required dependencies:
```
pip install -r requirements.txt
```
Download the dataset from Kaggle and place it in /dataset directory.

Usage

Run the Notebook:

The main file of this project is a Jupyter Notebook that contains all steps from data loading, exploratory data analysis (EDA), model training, and evaluation. To run the notebook:

```bash
jupyter notebook notebooks/Health_Insurance_Cross_Sell_Prediction.ipynb
```

Follow Along in the Notebook:

Open the notebook in your browser, and run each cell sequentially. The notebook will guide you through:
- Data loading and preprocessing
- Exploratory Data Analysis (EDA)
- Feature engineering
- Model training and evaluation
- Making predictions on new data
Modifying the Notebook:

If you wish to experiment with the model, adjust parameters, or apply different techniques, you can modify the cells in the notebook. Simply rerun the relevant sections after making changes.
Save Results:

Any outputs such as plots, metrics, or predictions will be generated within the notebook. If you'd like to save any specific results (e.g., predictions), follow the instructions in the relevant notebook section.

Acknowledgements

Kaggle for providing the dataset.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Health Insurance Cross-Sell Prediction

Project Overview

Dataset

Approach

1. Data Preprocessing

2. Exploratory Data Analysis (EDA)

3. Model Development

4. Model Deployment

Requirements

Installation

Usage

Acknowledgements

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 91 Commits
dataset		dataset
modules		modules
Health_Insurance_Cross_Sell_Prediction.ipynb		Health_Insurance_Cross_Sell_Prediction.ipynb
README.md		README.md
requirements.txt		requirements.txt

HugoTex98/Health-Insurance-Cross-Sell-Prediction

Folders and files

Latest commit

History

Repository files navigation

Health Insurance Cross-Sell Prediction

Project Overview

Dataset

Approach

1. Data Preprocessing

2. Exploratory Data Analysis (EDA)

3. Model Development

4. Model Deployment

Requirements

Installation

Usage

Acknowledgements

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages