In this project, I play the role of a Data Analyst for a Real Estate Investment Trust. The Trust would like to start investing in residential real estate. The task at hand is to determine the market price of a house given a set of features. The project predicts housing prices using attributes or features such as square footage, number of bedrooms, number of floors, and so on. A Jupyter notebook has been provided in this repository.
This project uses Python for both the analysis and the visualization. An eclectic range of Python libraries have, however, been used:
- Python 3.8 (visualization + analysis)
- Jupypter Notebook (IDE)
This dataset contains house sale prices for King County, which includes Seattle. It includes houses sold between May 2014 and May 2015. It was taken from a Kaggle upload (https://www.kaggle.com/harlfoxem/housesalesprediction).
Here is the description of the data:
- id: A notation for a house
- date: Date house was sold
- price: Price is prediction target
- bedrooms: Number of bedrooms
- bathrooms: Number of bathrooms
- sqft_living: Square footage of the home
- sqft_lot: Square footage of the lot
- floors: Total floors (levels) in house
- waterfront: House which has a view to a waterfront
- view: Has been viewed
- condition: How good the condition is overall
- grade: overall grade given to the housing unit, based on King County grading system
- sqft_above: Square footage of house apart from basement
- sqft_basement: Square footage of the basement
- yr_built: Built Year
- yr_renovated: Year when house was renovated
- zipcode: Zip code
- lat: Latitude coordinate
- long: Longitude coordinate
- sqft_living15: Living room area in 2015 (implie some renovations) | This might or might not have affected the lotsize area
- sqft_lot15: LotSize area in 2015 (implies some renovations)
- Imports of Libraries and Packages
- Import of Dataset
- Data Wrangling/Preprocessing
- Exploratory Data Analysis
- Feature Selection
- Model Development
- Creaation of Data Pipeline
- Model Evaluation and Refinement
All the steps in the analysis have been explained in the Jupyter Notebook for this project. Some examples of visualizations used are as follows:
This is an intermediate-level project which involves some advanced concepts of Machine Learning and Predictive Modeling in Python using an IDE.
All rights related to the published dataset are reserved with the issuing authorities of the same (Kaggle).
The project may be used only as a learning resource; no part of the same must be copied for any other usage whatsover.