From low-value inputs into high-value outputs - data value chain describes the full data lifecycle from collection to analysis and usage… and it’s not all about data transformation. An open-source dbt-based DP Framework has a goal to support the whole process in the spirit of data democratization, in a portable way to many of infrastructure choices and clouds.
Key characteristics of DP Framework:
- Single unified integration layer to stop "reinventing the wheel".
- Readiness for diverse environments - flexibility in component selection to use them interchangeably
- Ability to work on any environment and with any data storage.
- Standardization, simplification and unification across projects (through templating)
- Hides complexity form Analytical Engineers by grouping most of the interactions with a data platform into one user interface
data-pipelines-CLI
: Project on GitHub (documentation)
data-pipelines-CLI:
- Building and managing data pipelines
- Interaction with the whole data environment
- Abstraction layer hiding complexity from the end user
- Handling deployments and publications, automation support
Project Template Factory:
- Defining standardized templates for your organization’s data pipelines
- Differentiating config for different environments
- Creating projects out of templates with a handy cookie cutter
dbt-airflow-factory
: Project on GitHub (documentation)
- parses dbt manifest files and builds orchestrator (Apache Airflow, GCP Workflows, Databricks Workflows) jobs
- highly customizable, pluggable runtime
- DAG is built on-the-fly - without materialization
- task grouping, hiding ephemeral models, etc.
- sends DAG failure notification to Slack or Microsoft Teams channel
As GetInData we delivered a number of workshops on how to deploy dbt pipelines on production using best engineering practices with DP Framework.
- Data Mass 2023
- BigDataTechWarsaw 2023
- Data Mass 2022
- BigDataTecWarsaw 2022 https://github.com/getindata/gid-mdp-workshop
Short demo of our Modern Data Platform with DP Framework:
First Steps With DP Framework: GitHub
List of our publications on data platform architectures leveraging DP Framework:
-
Modern Data Platform - the what's, why's and how's? Demystifying the buzzword link
-
Announcing the GetInData Modern Data Platform - a self-service solution for Analytics Engineers link
-
GetInData Modern Data Platform - features & tools link
-
How we built a Modern Data Platform in 4 months for Volt.io, a FinTech scale-up. link
Presentations on various conferences about DP Framework:
- Providing end-to-end data value chain with open-source dbt-based DP Framework (GoDataFest, Amsterdam, 2023)
- Data Platform - a modern one. A new stack that promotes self-service with well-known best DataOps practices (Big Data Tech Warsaw, Warsaw, 2023)
- Data Platform - what does it take to be called a modern one? A new stack with well-known best practices (Data Science Summit, Warsaw, 2022)
All the components of DP Framework are open-source. Pull requests are welcome. Please check out detailed contribution instructions on particular project's repository.
Contact us & sign up for DP Framework demo!