Skip to content

This repository houses an end-to-end Housing Price Prediction project, showcasing the integration of modern DevOps and MLOps practices. Leveraging a diverse set of technologies, the project encompasses data preprocessing, machine learning modeling, workflow orchestration, and real-time monitoring.

Notifications You must be signed in to change notification settings

GbotemiB/MLOps_zoomcamp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Nigerian House Price Prediction

This repository contains the code and resources for an end-to-end Housing Price Prediction MLOps project. The project utilizes a variety of tools and technologies to ensure efficient and robust development, deployment, and monitoring of a machine learning model for predicting housing prices.

show

Table of Contents

Introduction

This project aims to predict Nigerian housing prices using a machine learning model. It incorporates various DevOps and MLOps practices to ensure streamlined development, deployment, and monitoring of the model.

Technologies Used

  • Visual Studio Code Visual Studio Code: A powerful and versatile code editor with built-in debugging, version control, and an extensive extension ecosystem.

  • Jupyter Notebook Jupyter Notebook: An interactive, web-based environment for data analysis and scientific computing that supports code, visualizations, and narrative text.

  • PostgreSQL PostgreSQL: A powerful open-source relational database management system known for its extensibility, reliability, and advanced features.

  • Python Python: A widely-used high-level programming language known for its simplicity and readability, commonly used for data manipulation and machine learning.

  • Pandas Pandas: A Python library for data manipulation and analysis, providing data structures and functions to efficiently work with structured data.

  • Matplotlib Matplotlib: A comprehensive data visualization library in Python, used to create static, interactive, and animated visualizations.

  • scikit-learn scikit-learn: A machine learning library for Python that provides simple and efficient tools for data mining and data analysis.

  • Flask Flask: A lightweight web application framework in Python, suitable for building web applications and APIs.

  • MLflow MLflow: An open-source platform for managing the end-to-end machine learning lifecycle, including experiment tracking, reproducibility, and deployment.

  • Docker Docker: A platform for developing, shipping, and running applications in containers, ensuring consistency across various environments.

  • Anaconda Anaconda: A distribution of Python and R programming languages for data science and machine learning, providing a variety of packages and tools.

  • Linux Linux: An open-source operating system kernel widely used for server environments, development, and hosting.

  • AWS Amazon Web Services (AWS): A cloud computing platform offering a wide range of services for computing power, storage, and other functionalities.

  • Grafana Grafana: A monitoring and visualization tool used to track metrics, create dashboards, and gain insights from data.

  • Git Git: A distributed version control system used for tracking changes in code and collaborating with others.

  • Pylint Linter for ensuring code quality and adherence to coding standards.

  • Black Code formatter for maintaining consistent code style.

  • isort Tool for sorting and formatting Python imports.

  • Pre-commit Framework for managing and maintaining pre-commit hooks.

Project Structure

The project has been structured with the following folders and files:

  • .github: contains the CI/CD files (GitHub Actions)
  • config: contains grafana config files
  • dashboards: contains json format for monitoring dashboards
  • data: dataset and test sample for testing the model
  • model: full pipeline from preprocessing to prediction and monitoring using MLflow, Prefect, Grafana, Adminer, and docker-compose
  • notebooks: EDA and Modeling performed at the beginning of the project to establish a baseline
  • tests: unit tests
  • pyproject.toml: linting and formatting
  • requirements.txt: project requirements

Getting Started

  1. Clone the repository: git clone https://github.com/yourusername/your-repo.git
  2. Set up your environment and install dependencies: pip install -r requirements.txt
  3. Follow instructions in relevant sections below to run preprocessing, training, and deployment.

Problem Description

The goal of this project is to develop a machine learning model that can accurately predict housing prices in Nigeria for various types of houses across all 36 states. The model aims to take into account features such as the type of house, location, bedroom size, bathroom size, parking space size, and other relevant factors to make accurate predictions. This predictive model can be a valuable tool for real estate professionals, homeowners, and potential buyers to estimate property values.

Dataset Description

The dataset used for this project is a Nigeria House Price Dataset that covers all 36 states of the country. It is located in the data/ directory. It contains the following columns:

  • ID: A unique identifier for each property.
  • Type of House: The type of the house, such as apartment, duplex, bungalow, etc.
  • Location: The location of the property within a specific state.
  • Bedroom Size: The number of bedrooms in the house.
  • Bathroom Size: The number of bathrooms in the house.
  • Parking Space Size: The size of the parking space available.
  • Price: The target variable, representing the price of the property.

Modeling

The model training process is defined in the model_train.py file. It involves loading the preprocessed data, splitting it into training and validation sets, training a machine learning model and saving the model to model registry. The trained model is saved in the models/ directory.

MLOps Pipeline

  1. Data preprocessing and feature engineering.
  2. Model training and evaluation.
  3. Model versioning using MLflow.
  4. Continuous Integration (CI) using GitHub Actions for code quality checks and tests.
  5. Continuous Deployment (CD) using GitHub Actions to deploy the model in a containerized environment.
  6. Workflow orchestration using Prefect to schedule and manage the entire pipeline.

Workflow Orchestration

The Workflow Orchestration phase of this project involves managing and automating the various steps of the machine learning pipeline using Prefect Cloud. It ensures that data preprocessing, model training, and deployment occur seamlessly and efficiently. The Prefect workflow orchestration tool is utilized to schedule, coordinate, and monitor these tasks.

visit Prefect Cloud to setup prefect cloud.

Prefect Deployment

```
prefect deployment build main.py:main_run \
  -n "main_pipeline" \
  -o "main_pipeline" \
  --apply
```

show

Monitoring and Visualization

Grafana is used to monitor various metrics and insights related to the model's performance, data quality, and more. It provides real-time visualization of key performance indicators and helps in identifying anomalies and trends.

show

Contributing

Contributions are welcome! If you would like to contribute to the project, please follow the standard GitHub workflow: fork the repository, create a feature branch, make your changes, and submit a pull request.

GitHub Twitter LinkedIn Gmail

License

This project is licensed under the MIT License.

About

This repository houses an end-to-end Housing Price Prediction project, showcasing the integration of modern DevOps and MLOps practices. Leveraging a diverse set of technologies, the project encompasses data preprocessing, machine learning modeling, workflow orchestration, and real-time monitoring.

Topics

Resources

Stars

Watchers

Forks