Skip to content

This is going to be my first end to end ML project implementation covering all required stages taking guidence from book called "Hands On Machine Learning".

Notifications You must be signed in to change notification settings

pb319/California_House-Price-Prediction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

72 Commits
 
 
 
 
 
 

Repository files navigation

California_House-Price-Prediction

This is my first end-to-end ML project implementation covering all required stages taking guidance from the book called "Hands-On Machine Learning"

Table of Contents

Big Picture

  1. Problem Statement

    • Welcome to Machine Learning Housing Corporation!
    • Organization Objective: Replacing expensive, time-consuming and less effective manual prediction techniques with Machine Learning
  2. Framing the problem

    • Data: California Census Data
    • A typical Univariate Multiple Regression task.
    • Training Set is labelled hence "Supervised Learning"
    • Data is small hence we shall opt for "Batch Learning"
  3. Performance Measure

    • Root Mean Square Error (RMSE) - l 2 Norm
    • Mean Absolute Percentage Error (MAPE)

No Data Snooping

  1. Get the Data
    • Overview and Primary Understanding
  2. Test Set
    • Firstly employed Simple Random Sampling to draw a test & train set using Scikit- Learn
    • Secondly utilized Stratified Sampling by categorizing the whole datase on median_income
    • Later we compared Sampling Bias from both the sampling techniques

Exploratory Data Analysis

  1. Creating Viualizations

Screenshot from 2024-06-06 10-24-13


  • There is a celar depiction of clusters in and around San Diego, Los Angeles, San Feancisco, etc.
  • From the above figure we can see a general figure that ocean_proximity seems to be associated with median_price_value
  • Bus still there are exceptions in North California, so we've to deploy some feature engineering here as well.
  • Features such as proximity to clusters can also be checked.

  1. Correlation Matrix and Scatter Plot

Screenshot from 2024-06-06 19-53-02


  • In General it shows a strong positive trend.
  • Straight line at $500,000 reemphasize the price_cap
  • Concerns are a few straight lines in and around $450,000, $350,000, $280,000, $230,000 and so on in the below.
  • We may remove the concerned districts.

Data Preparation

  1. Data Cleaning

    • Missing value of total_bedrooms (1.01%) has been treated using SimpleImputer Class of Scikit Learn.

Screenshot from 2024-06-09 11-19-36


  1. Handling Text Attribute

    • OneHotEncoding is used to handle ocean_proximity column
  2. Feature Scaling and Transformation Pipeline

    • Laid down a single transformation pipeline to transform both numeric and categorical attributes
    • StandardScaler(), ColumnTransformer() classes have been utilized.

Model Selection and Training

  1. Model Selection

    • Linear Regression, Decision Tree Regressor, and Random Forest Regressor models have been fitted on the training set.
  2. Model Evaluation

    • "K-Fold Cross Validation" depicts RSME scores of 68973.97, 69919.68, 50631.51 respectively.
    • Random Forest Regressor looks very promising.
    • Note: The score on the training set is still much lower than on validation sets, which means still overfitting the training set.

Model Fine Tuning

  1. Grid Search Cross Validation

    • Deployed GridSearchCV() to fine-tune hyperparameters
    • Got ---> RandomForestRegressor(max_features= 6, n_estimators= 30)
    • RSME score slightly improved from 50631.51 to 50586.27 (K Fold Cross Validation)

Screenshot from 2024-06-16 12-25-50


Evaluation of Test Set

  • K-Fold Cross Validation (RSME) score of 46811.29 having [44833.99542182, 48708.38642949] confidence interval with a 5% level of Significance.
  • Mean Absolute Percetage Error(MAPE) score 0.1771.

🛡️ Demonstration Video


Thank You So Much...🙏🙏

About

This is going to be my first end to end ML project implementation covering all required stages taking guidence from book called "Hands On Machine Learning".

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published