This repository documents the deployment of a production-ready lakehouse on kubernetes via stackable. The deployment is helm-based and all corresponding files are located in the helm-deployment folder. A step-by-step-guide is located in the guide folder.
You need a k8s-cluster with the following resources:
- 10 nodes with 4 cores/8 threads, 20GB RAM and 30GB HDD, i.e. Standard_D4_v2 in Azure
- multiple persistent volumes, 1TB total For a guide how to setup this in Azure/the evoila lab, see here.
Helm and kubectl is needed, to install it follow these official instructions for kubectl and those official instructions for helm. You also need stackablectl, for the installation follow this official guide.
The necessary operators and everything else is explained in the mentioned guide folder.
We try to document as much as possible, so that even users with minimal knowledge can deploy this lakehouse. However, we cannot explain every technology that is used in detail.
The following technologies are used and you can search for the corresponding documentation online:
- kubernetes
- stackable
- minIO
- Hive Metastore
- Trino
- Superset
- Spark
- Kafka
- NiFi
- OPA
- TLS/SSL