GitHub - BrooksIan/ChurnBabyChurn: Telco Churn

Data Science in Apache Py-Spark

Customer Churn Project

Ensemble Models

Level: Moderate

Language: Python

Requirements:

HDP + CDSW
Spark 2.3

Author: Ian Brooks

Follow: LinkedIn - Ian Brooks PhD

Orginal Fork From: CDSW Demo

Churn Baby Churn

This Github repo is designed to be optmized for Cloudera Data Science Workbench (CDSW), but it is not required. The PySpark code can be used with Apache Spark, and the code examples will run with the included dataset.

In this project, there are 5 different supervised classifer models designed for telco customer churn. The first four classsifer models user are: Random Forest, Gradient Boost Tree, Suport Vector Machines, and Multilayer Perception. The most sucessful model is a Stacked Ensemble Model.

CDSW Run Instructions

In CSDW, download the project using the git url for here
Open a new session, and execute the setup.sh file
In Experiments, run the following scripts
- dsforteko_pyspark.py - vanilla random forest churn model
- gbt_churn_pyspark.py - gradient boost tree churn model with normamlized variables, hyperturning, and crossvalidation
- mlp_churn_pyspark.py - multilayer perceptron churn model with normamlized variables, hyperturning, and crossvalidatio
- rf_churn_pyspark.py - random forest churn model with normamlized variables, hyperturning, and crossvalidation
- svm_churn_pyspark.py - support vection machine churn model with normamlized variables, hyperturning, and crossvalidation
Once all experiments have completed, the stacked ensemble classifer model be built, run the following script to build the stacked model.
- stacked_churn_pyspark.py - stacked ensemble model trained on the prediction of random forest, gradient boost tree, and support vector machine
Once the stacked model has been built, the stacked model can be deployed using the following script.
- predict_churn_stackedMLP_pyspark.py

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
data		data
gbt		gbt
mlp		mlp
models		models
rf		rf
spark		spark
stackedmlp		stackedmlp
svm		svm
.DS_Store		.DS_Store
CML_predict_churn_rf_pyspark.py		CML_predict_churn_rf_pyspark.py
CML_rf_churn_pyspark.py		CML_rf_churn_pyspark.py
Churn.jpeg		Churn.jpeg
README.md		README.md
Rapids_GBT.py		Rapids_GBT.py
Rapids_MLP.py		Rapids_MLP.py
Rapids_RF_Churn.py		Rapids_RF_Churn.py
Rapids_SVM.py		Rapids_SVM.py
cdsw-build.sh		cdsw-build.sh
dsfortelco_pyspark.py		dsfortelco_pyspark.py
gbt_churn_pyspark.py		gbt_churn_pyspark.py
mlp_churn_pyspark.py		mlp_churn_pyspark.py
pipeline_RF_test.R		pipeline_RF_test.R
predict_churn_all_pyspark.py		predict_churn_all_pyspark.py
predict_churn_gbt_pyspark.py		predict_churn_gbt_pyspark.py
predict_churn_mlp_pyspark.py		predict_churn_mlp_pyspark.py
predict_churn_pyspark.py		predict_churn_pyspark.py
predict_churn_rf_pyspark.py		predict_churn_rf_pyspark.py
predict_churn_stackedMLP_pyspark.py		predict_churn_stackedMLP_pyspark.py
predict_churn_svm_pyspark.py		predict_churn_svm_pyspark.py
rf_churn_pyspark.py		rf_churn_pyspark.py
setup.sh		setup.sh
spark-defaults.conf		spark-defaults.conf
stacked_churn_pyspark.py		stacked_churn_pyspark.py
svm_churn_pyspark.py		svm_churn_pyspark.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Science in Apache Py-Spark

Customer Churn Project

Ensemble Models

Churn Baby Churn

CDSW Run Instructions

About

Releases

Packages

Contributors 2

Languages

BrooksIan/ChurnBabyChurn

Folders and files

Latest commit

History

Repository files navigation

Data Science in Apache Py-Spark

Customer Churn Project

Ensemble Models

Churn Baby Churn

CDSW Run Instructions

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages