Nigerian House Price Prediction

This repository contains the code and resources for an end-to-end Housing Price Prediction MLOps project. The project utilizes a variety of tools and technologies to ensure efficient and robust development, deployment, and monitoring of a machine learning model for predicting housing prices.

Introduction

This project aims to predict Nigerian housing prices using a machine learning model. It incorporates various DevOps and MLOps practices to ensure streamlined development, deployment, and monitoring of the model.

Technologies Used

Visual Studio Code: A powerful and versatile code editor with built-in debugging, version control, and an extensive extension ecosystem.
Jupyter Notebook: An interactive, web-based environment for data analysis and scientific computing that supports code, visualizations, and narrative text.
PostgreSQL: A powerful open-source relational database management system known for its extensibility, reliability, and advanced features.
Python: A widely-used high-level programming language known for its simplicity and readability, commonly used for data manipulation and machine learning.
Pandas: A Python library for data manipulation and analysis, providing data structures and functions to efficiently work with structured data.
Matplotlib: A comprehensive data visualization library in Python, used to create static, interactive, and animated visualizations.
scikit-learn: A machine learning library for Python that provides simple and efficient tools for data mining and data analysis.
Flask: A lightweight web application framework in Python, suitable for building web applications and APIs.
MLflow: An open-source platform for managing the end-to-end machine learning lifecycle, including experiment tracking, reproducibility, and deployment.
Docker: A platform for developing, shipping, and running applications in containers, ensuring consistency across various environments.
Anaconda: A distribution of Python and R programming languages for data science and machine learning, providing a variety of packages and tools.
Linux: An open-source operating system kernel widely used for server environments, development, and hosting.
Amazon Web Services (AWS): A cloud computing platform offering a wide range of services for computing power, storage, and other functionalities.
Grafana: A monitoring and visualization tool used to track metrics, create dashboards, and gain insights from data.
Git: A distributed version control system used for tracking changes in code and collaborating with others.
Linter for ensuring code quality and adherence to coding standards.
Code formatter for maintaining consistent code style.
Tool for sorting and formatting Python imports.
Framework for managing and maintaining pre-commit hooks.

Project Structure

The project has been structured with the following folders and files:

.github: contains the CI/CD files (GitHub Actions)
config: contains grafana config files
dashboards: contains json format for monitoring dashboards
data: dataset and test sample for testing the model
model: full pipeline from preprocessing to prediction and monitoring using MLflow, Prefect, Grafana, Adminer, and docker-compose
notebooks: EDA and Modeling performed at the beginning of the project to establish a baseline
tests: unit tests
pyproject.toml: linting and formatting
requirements.txt: project requirements

Getting Started

Clone the repository: git clone https://github.com/yourusername/your-repo.git
Set up your environment and install dependencies: pip install -r requirements.txt
Follow instructions in relevant sections below to run preprocessing, training, and deployment.

Problem Description

The goal of this project is to develop a machine learning model that can accurately predict housing prices in Nigeria for various types of houses across all 36 states. The model aims to take into account features such as the type of house, location, bedroom size, bathroom size, parking space size, and other relevant factors to make accurate predictions. This predictive model can be a valuable tool for real estate professionals, homeowners, and potential buyers to estimate property values.

Dataset Description

The dataset used for this project is a Nigeria House Price Dataset that covers all 36 states of the country. It is located in the data/ directory. It contains the following columns:

ID: A unique identifier for each property.
Type of House: The type of the house, such as apartment, duplex, bungalow, etc.
Location: The location of the property within a specific state.
Bedroom Size: The number of bedrooms in the house.
Bathroom Size: The number of bathrooms in the house.
Parking Space Size: The size of the parking space available.
Price: The target variable, representing the price of the property.

Modeling

The model training process is defined in the model_train.py file. It involves loading the preprocessed data, splitting it into training and validation sets, training a machine learning model and saving the model to model registry. The trained model is saved in the models/ directory.

MLOps Pipeline

Data preprocessing and feature engineering.
Model training and evaluation.
Model versioning using MLflow.
Continuous Integration (CI) using GitHub Actions for code quality checks and tests.
Continuous Deployment (CD) using GitHub Actions to deploy the model in a containerized environment.
Workflow orchestration using Prefect to schedule and manage the entire pipeline.

Workflow Orchestration

The Workflow Orchestration phase of this project involves managing and automating the various steps of the machine learning pipeline using Prefect Cloud. It ensures that data preprocessing, model training, and deployment occur seamlessly and efficiently. The Prefect workflow orchestration tool is utilized to schedule, coordinate, and monitor these tasks.

visit Prefect Cloud to setup prefect cloud.

Prefect Deployment

```
prefect deployment build main.py:main_run \
  -n "main_pipeline" \
  -o "main_pipeline" \
  --apply
```

Monitoring and Visualization

Grafana is used to monitor various metrics and insights related to the model's performance, data quality, and more. It provides real-time visualization of key performance indicators and helps in identifying anomalies and trends.

Contributing

Contributions are welcome! If you would like to contribute to the project, please follow the standard GitHub workflow: fork the repository, create a feature branch, make your changes, and submit a pull request.

License

This project is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
.github/workflows		.github/workflows
.vscode		.vscode
config		config
dashboards		dashboards
data		data
images		images
mlruns/1		mlruns/1
models		models
notebooks		notebooks
tests		tests
.env		.env
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.prefectignore		.prefectignore
README.md		README.md
app.py		app.py
docker-compose.yml		docker-compose.yml
housing_price_EDA.ipynb		housing_price_EDA.ipynb
housing_price_modelling.ipynb		housing_price_modelling.ipynb
main.py		main.py
main_pipeline.yaml		main_pipeline.yaml
mlflow.db		mlflow.db
model_registry_update.py		model_registry_update.py
model_train.py		model_train.py
monitoring.ipynb		monitoring.ipynb
monitoring.py		monitoring.py
notes.md		notes.md
prepare_features.py		prepare_features.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
test.py		test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Nigerian House Price Prediction

Table of Contents

Introduction

Technologies Used

Project Structure

Getting Started

Problem Description

Dataset Description

Modeling

MLOps Pipeline

Workflow Orchestration

Prefect Deployment

Monitoring and Visualization

Contributing

License

About

Languages

GbotemiB/MLOps_zoomcamp

Folders and files

Latest commit

History

Repository files navigation

Nigerian House Price Prediction

Table of Contents

Introduction

Technologies Used

Project Structure

Getting Started

Problem Description

Dataset Description

Modeling

MLOps Pipeline

Workflow Orchestration

Prefect Deployment

Monitoring and Visualization

Contributing

License

About

Topics

Resources

Stars

Watchers

Forks

Languages