-
Notifications
You must be signed in to change notification settings - Fork 24
Orchestration
Cloud Composer is an open source, fully managed version of Apache Airflow on Google Cloud Platform. Operations are orchestrated, scheduled, and run on composer through Directed Acyclic Graphs (DAGs). The graphs are a collection of organized tasks that you want to schedule and run. A single organized task is also called an operator.
After ingest pipelines are completed, a message is published to a Pub / Sub topic. A Cloud Function is listening to this topic and triggers the DAG to run. This DAG orchestrates the calculation pipelines to run in parallel. Once all the pipelines for a particular state are finished running, then a state specific HTTP request is made which triggers the export from Big Query to GCS of the state related files. This ensures that even if pipelines for one state fails, a data export is still made for the other state.
The DAG we have in production and staging is called calculation_pipeline_dag
and Airflow UIs are available for administration in both staging and production.
- Home
- Architecture
- Schemas
- Methodology
- Data Extraction
- Data Normalization
- Entity Matching
- Recidivism Measurement
- Development
- Local Development
- Create a Scraper
- Add a New Schema
- Update BigQuery Views
- Continuous Integration
- Operations