Welcome to the "API to Warehouse Stack" repository! This repository offers a simple template to help you get data from various APIs and put it into your data warehouse for further analysis using Airbyte. You can use several supported APIs as your data sources for this process. Here are a few examples of the APIs you can set up to extract data using Airbyte.
Here are some data warehouses that users can choose as a destination to load the data extracted from APIs.
- Snowflake
- BigQuery
- Amazon Redshift
- Many more
In this process, we'll use the Github API to get data and Snowflake as the data warehouse to store the data.
- API to Warehouse Stack
- Table of Contents
- Infrastructure Layout
- Setting an environment for your project
- There are two ways to setup the connectors of airbyte.
- Next Steps
Before you embark on this integration, ensure you have the following set up and ready:
-
Python 3.10 or later: If not installed, download and install it from Python's official website.
-
Docker and Docker Compose (Docker Desktop): Install Docker following the official documentation for your specific OS.
-
Airbyte OSS version: Deploy the open-source version of Airbyte. Follow the installation instructions from the Airbyte Documentation.
-
Terraform: Terraform will help you provision and manage the Airbyte resources. If you haven't installed it, follow the official Terraform installation guide.
Get the project up and running on your local machine by following these steps:
-
Clone the repository (Clone only this quickstart):
git clone --filter=blob:none --sparse https://github.com/airbytehq/quickstarts.git
cd quickstarts
git sparse-checkout add api_to_warehouse
-
Navigate to the directory:
cd api_to_warehouse
-
Set Up a Virtual Environment:
- For Mac:
python3 -m venv venv source venv/bin/activate
- For Windows:
python -m venv venv .\venv\Scripts\activate
- For Mac:
-
Install Dependencies:
pip install -e ".[dev]"
To establish the connection and import data from the Github API into the Snowflake warehouse, kindly proceed by utilizing the Airbyte user interface. The following steps should be adhered to:
-
Run the Airbyte OSS version by following the documentation.
-
Setup the Github API as source by following these steps.
-
Setup the Snowflake as destination by following these steps
-
Please proceed to configure the synchronization time and select the specific tables you wish to load into Snowflake from GitHub. You can make your selection from the list of available streams.
-
Enjoy 😄, your data loaded into Snowflake data warehouse from Github API.
Airbyte enables you to make connections between different platforms by creating connectors for sources and destinations. In this project, we're using Terraform to automate the setup of these connectors and their connections. Here's how you can do it:
-
Navigate to the Airbyte Configuration Directory:
Change to the relevant directory containing the Terraform configuration for Airbyte:
cd infra/airbyte
-
Modify Configuration Files:
Within the
infra/airbyte
directory, you'll find three crucial Terraform files:provider.tf
: Defines the Airbyte provider.main.tf
: Contains the main configuration for creating Airbyte resources.variables.tf
: Holds various variables, including credentials.
Adjust the configurations in these files to suit your project's needs. Specifically, provide credentials for your Postgres connections. You can utilize the
variables.tf
file to manage these credentials. -
Initialize Terraform:
This step prepares Terraform to create the resources defined in your configuration files.
terraform init
-
Review the Plan:
Before applying any changes, review the plan to understand what Terraform will do.
terraform plan
-
Apply Configuration:
After reviewing and confirming the plan, apply the Terraform configurations to create the necessary Airbyte resources.
terraform apply
-
Verify in Airbyte UI:
After Terraform finishes its tasks, go to the Airbyte user interface. You will find your source and destination connectors already set up, along with the connection between them, all ready to use.
After you extract and load data from an API into a data warehouse, you can analyze the data. For example, we used Snowflake data warehouse, which supports analytical tools like Tableau, Talend and Sigma.