In this project, California census data is used to build a model of housing prices in the state. This data includes metrics such as population, median income, and median housing prices for each block in California. Block groups are smallest geographical for which US Census Bureau publishes sample data.
Prediction of block's median housing price.
RMSE - Root Mean Squared Error as it is generally preferred performance measure for regression tasks.
- Getting the data
- Quick analysis of data
- Creation of test set
- Visualizing data and observation
- Deriving new features from existing ones
- Data cleaning
- Creating transformation pipelines
- Model selection - trying different models
- Model evaluation using cross-validation
- Hyperparameter tuning using Grid Search
- Evaluation on test set
- Deployment [yet to be done]