This project aims to answer why the customers churn for the telco company. This will be done through a jupyter notebook report that will explore the different features that are affecting churn. We will also be creating predictive model that tells both the likelyhood that a customer will churn and whether or not the customer will churn. We will also be making a presentation that walks through how the model works. Finally, we will have python files that contain functions that will deliver future data to our model in the same form that we have trained the model to receive.
- Notebook with report of the findings
-
.csv
file with predictions -
.py
files:- acquire.py
- prepare.py
- Presentation
- Completed README
-
Must include
env.py
file in directory.- Contact Codeup to request access to the MySQL Server that the data is stored on.
env.py
should include the following variablesuser
- should be your usernamepassword
- your passwordhost
- the host address for the MySQL Server
-
As long as you have the env file then
get_telco_data()
will do the rest on it's own.
-
prep_telco_data()
will split, clean, and encode the data for the preparation stage. -
prep_all_data()
will clean and encode a dataframe for using in modeling
- This contains the report of what is causing churn and a walkthrough of the pipeline to creating a model to predict whether a customer will churn.
- Acquire
- acquire.py
- Get data from
MySQL
:telco_churn
:customers
- Determine which tables join on
- Make function to acquire data into a
pd.DataFrame
- Get data from
- Notebook
- Demonstrate acquire.py
- Summarize data
- Plot Distributions
- acquire.py
- Prepare
- prepare.py
- Split data
- Handle Missing Values
- Handle datatype issues
- Encode strings
- Scale data
- Add new feature
tenure_years
- Create feature that combines
phone_service
andmultiple_lines
- Create feature that combines
dependents
andpartner
- Look into merging other variables
-
streaming_tv
andstreaming_movies
-
online_security
andonline_backup
-
- Notebook
- Explore missing values and document takeaways/action plans for handling them.
- Document takeaways
- Explore datatypes
- Adapt types or data values as needed to have numeric representation of each attribute.
- Demonstrate prepare.py works by running it
- prepare.py
- Exploration
- Answer key questions
- Feature Engineering
- Modeling
- Notebook
Columns | Definition |
---|---|
customer_id | customer number |
gender | gender identity |
senior_citizen | 0 = senior citizen, 1 = not senior citizen |
partner | undefined |
dependents | undefined |
tenure | total months as customer |
phone_service | yes, no = phone service |
multiple_lines | yes, no, not phone service |
internet_service_type_id | no, yes, no internet serivice |
online_security | yes = subscribed, no = not subcribed, or no internet service |
online_backup | yes = subscribed, no = not subcribed, or no internet service |
device_protection | yes = subscribed, no = not subcribed, or no internet service |
tech_support | yes = subscribed, no = not subcribed, or no internet service |
streaming_tv | yes = subscribed, no = not subcribed, or no internet service |
streaming_movies | yes = subscribed, no = not subcribed, or no internet service |
contract_type_id | refers to contract type |
paperless_billing | no = sent by mail, yes = electronic notification |
payment_type_id | id of method of payment |
monthly_charges | current monthly charge |
total_charges | total charges since becoming a customer |
churn | yes = no longer a customer, no = still a customer |
contract_type | Month-to-month, Two year. One year |
internet_service_type | None, DSL, Fiber optic |
payment_type | method of payment |