This project attempts to leverage hotel reservation data to build a robust regression pipeline to predict the profitability of hotels through one of their financial metrics, ‘average daily rate’ or the revenue generated by an occupied room. The motivation of the project stems from the impact that COVID had on the hospitality industry. As it recovers from the economic shutdowns and stringent travel rules, the industry must look to mitigate future monetary losses. Along with testing 6 machine learning models, I take a deep dive into black model interpretability by taking the best model and looking at global and local feature importance.
Machine Learning Models:
- Lasso
- Ridge
- Elasticnet
- SVR
- Random Forest
- XGBoost
Global Feature Importance Methods:
- Permutation Importance
- XGB Gain
- SHAP Values
Local Feature Importance:
- SHAP Values
Project is created with :
- Python 3.10.5
To run this project, install the following package/versions onto to your local machine
conda install scikit-learn = 1.1.1
conda install pandas = 1.4.2
conda install numpy = 1.22.4
conda install matplotlib = 3.5.2
conda install seaborn = 0.11.2
conda install folium = 0.13.0
conda install kaleido = 0.2.1
conda install pickle = 0.7.5
conda install xgboost = 1.5.1
conda install shap = 0.40.0