This project contains applications required to load Snowplow data into Open Table Formats.
Lake Loader 0.5.0 supports Delta, Iceberg and Hudi as output formats.
Check out the example config files for how to configure your lake loader.
The Azure lake loader reads the stream of enriched events from Event Hubs and writes them to Azure Data Lake Storage Gen2. This enables you to query your data lake via Databricks or Microsoft Fabric.
Basic usage: `
docker run \
-v /path/to/config.hocon:/var/config.hocon \
-v /path/to/iglu.json:/var/iglu.json \
snowplow/lake-loader-azure:0.5.0 \
--config /var/config.hocon \
--iglu-config /var/iglu.json
The GCP lake loader reads the stream of enriched events from Pubsub and writes them to GCS. This enables you to query your events in Databricks.
docker run \
-v /path/to/config.hocon:/var/config.hocon \
-v /path/to/iglu.json:/var/iglu.json \
snowplow/lake-loader-gcp:0.5.0 \
--config /var/config.hocon \
--iglu-config /var/iglu.json
The AWS lake loader reads the stream of enriched events from Kinesis and writes them to S3. This enables you to query your events in Databricks or AWS Athena.
docker run \
-v /path/to/config.hocon:/var/config.hocon \
-v /path/to/iglu.json:/var/iglu.json \
snowplow/lake-loader-aws:0.5.0 \
--config /var/config.hocon \
--iglu-config /var/iglu.json
Technical Docs | Setup Guide | Roadmap & Contributing |
---|---|---|
Technical Docs | Setup Guide | Roadmap |
Copyright (c) 2014-present Snowplow Analytics Ltd. All rights reserved.
Licensed under the Snowplow Limited Use License Agreement. (If you are uncertain how it applies to your use case, check our answers to [frequently asked questions][faq].)