ML Regression Project
The goal of this project is to use linear regression model to predict the renowned late movie critic Roger Ebert's film rating if he were alive today. My primary and secondary datasets were obtained through webscraping process. Using numerical and categorical features, along with some feature engineering, I built some linear regression models. After multiple trials of 5-folds cross-validation test, modifying datasets, and more feature engineering, my final linear regression model with lasso regularization has R-squared value of 0.396 and mean absolute error of 0.55, which in layman's term, the prediction is off by 0.5 star.
To learn more, see my blog post and presentation slides.
- Data Acquisition
a. Ebert Data
b. Other Data - EDA & Feature Engineer & Selection
- Regression Modeling & Testing
a. More Feature Engineering - Attempt on Random Forest Regressor
- BeautifulSoup
- Selenium
- Python (Pandas, numpy, pickle)
- Matplotlib
- seaborn
- scikitlearn
- statsmodel