Skip to content

snowplow-incubator/snowplow-lake-loader

Repository files navigation

Snowplow Lake Loader

Build Status Release License

Introduction

This project contains applications required to load Snowplow data into Open Table Formats.

Lake Loader 0.5.0 supports Delta, Iceberg and Hudi as output formats.

Check out the example config files for how to configure your lake loader.

Azure

The Azure lake loader reads the stream of enriched events from Event Hubs and writes them to Azure Data Lake Storage Gen2. This enables you to query your data lake via Databricks or Microsoft Fabric.

Basic usage: `

docker run \
  -v /path/to/config.hocon:/var/config.hocon \
  -v /path/to/iglu.json:/var/iglu.json \
  snowplow/lake-loader-azure:0.5.0 \
  --config /var/config.hocon \
  --iglu-config /var/iglu.json

GCP

The GCP lake loader reads the stream of enriched events from Pubsub and writes them to GCS. This enables you to query your events in Databricks.

docker run \
  -v /path/to/config.hocon:/var/config.hocon \
  -v /path/to/iglu.json:/var/iglu.json \
  snowplow/lake-loader-gcp:0.5.0 \
  --config /var/config.hocon \
  --iglu-config /var/iglu.json

AWS

The AWS lake loader reads the stream of enriched events from Kinesis and writes them to S3. This enables you to query your events in Databricks or AWS Athena.

docker run \
  -v /path/to/config.hocon:/var/config.hocon \
  -v /path/to/iglu.json:/var/iglu.json \
  snowplow/lake-loader-aws:0.5.0 \
  --config /var/config.hocon \
  --iglu-config /var/iglu.json

Find out more

Technical Docs Setup Guide Roadmap & Contributing
i1 i2 i3
Technical Docs Setup Guide Roadmap

Copyright and License

Copyright (c) 2014-present Snowplow Analytics Ltd. All rights reserved.

Licensed under the Snowplow Limited Use License Agreement. (If you are uncertain how it applies to your use case, check our answers to [frequently asked questions][faq].)