Big Mart Sales Prediction Pipeline

This repository contains an automated data pipeline for cleaning, versioning, and training a machine learning model to predict sales. The pipeline leverages Apache Airflow for orchestration, DVC (Data Version Control) for data management, and MLflow for model tracking and versioning.

Overview

This project predicts sales for items in retail stores, using data cleaning, versioning, and machine learning training in a fully orchestrated pipeline.

The pipeline performs the following tasks:

Data Cleaning: Processes and cleans the data to prepare it for training.
Data Versioning: Uses DVC to save cleaned data versions on Google Drive.
Model Training and Versioning: Trains the model on cleaned data, tracks versions and metrics using MLflow.

Pipeline Structure

The pipeline is managed by an Apache Airflow Directed Acyclic Graph (DAG) with three main tasks:

ETL: Runs the data cleaning scripts.
DVC Versioning: Versions the cleaned dataset using DVC, storing it on Google Drive.
Model Training: Loads the dataset, trains the model, and logs the version in MLflow.

Versioning

DVC: All cleaned data versions are stored and managed with DVC on Google Drive. This enables reproducibility and allows historical data versions to be retrieved as needed.
MLflow: Models are tracked in MLflow, with metrics and parameters logged for each version. Access the MLflow tracking server to review model performance over time.

Airflow Graph

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.dvc		.dvc
assets		assets
dags		dags
.dvcignore		.dvcignore
.gitignore		.gitignore
README.md		README.md
dvc_commands.sh		dvc_commands.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Big Mart Sales Prediction Pipeline

Table of Contents

Overview

Pipeline Structure

Versioning

About

Releases

Packages

Languages

mostafa-fallaha/big-mart

Folders and files

Latest commit

History

Repository files navigation

Big Mart Sales Prediction Pipeline

Table of Contents

Overview

Pipeline Structure

Versioning

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages