Skip to content

data-engineering-helpers/mds-in-a-box

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 

Repository files navigation

Modern Data Stack (MDS) in a box

Table of Content (ToC)

Created by gh-md-toc

Overview

This project intends to gather referential material and to give practical hints on how to reproduce locally all-in-one modern data stack (MDS).

Among other use cases, we may think of training, onboarding newcomers, testing/benchmarking some new components (e.g., Delta Lake vs Iceberg vs Hudi), LakeFS.

Even though the members of the GitHub organization may be employed by some companies, they speak on their personal behalf and do not represent these companies.

References

Articles

The SwirlAI data engineering project

Frameworks

Minio

LakeFS

PostgreSQL

DuckDB

End-to-end projects

Building an End-to-end MLOps Project with Databricks

This blog post details a capstone project using Databricks for MLOps. It covers the end-to-end process of deploying a machine learning model, from data preprocessing and feature engineering to model monitoring and continuous integration/continuous deployment (CI/CD). Key learnings include:

  • Databricks for MLOps: Using Databricks for data preprocessing, feature engineering, model training, and deployment.
  • Feature Store: Leveraging Databricks Feature Store for consistent feature computation.
  • MLflow Tracking: Tracking experiments, logging parameters and metrics, and ensuring reproducibility.
  • Model Serving: Exploring different model serving architectures for efficient deployment.
  • A/B Testing: Implementing A/B testing for model comparison and performance-based routing.
  • Databricks Asset Bundles: Managing projects with Infrastructure-as-Code (IaC) principles.
  • Monitoring and Drift Detection: Setting up model monitoring, tracking metrics, and detecting drift.
  • CI/CD: Implementing CI/CD workflows for continuous model validation and deployment.
  • Scalability: Scaling models for production and real-time serving.

Bike sharing

Car price predictor

About

All-in-one Modern Data Stack (MDS) in a box

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published