Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge Delete from BQ and Insert from GCS to BQ into a Single Atomic Operation for achieving Idempotency #421

Merged
merged 16 commits into from
Jul 12, 2024
Merged
Show file tree
Hide file tree
Changes from 8 commits
Commits
Show all changes
16 commits
Select commit Hold shift + click to select a range
7508571
Test commit
harsha-stellar-data Jul 8, 2024
c524e46
Addressed the PR Pre-Commit Comments from GH Actions
harsha-stellar-data Jul 8, 2024
aa97c03
Addressed the pre-commit review comments
harsha-stellar-data Jul 8, 2024
fa95305
Merge branch 'master' into Hubble-398-Combine-Del-Ins-Tasks
harsha-stellar-data Jul 8, 2024
3efd5c7
Modified the code to account for the pre-commit changes
harsha-stellar-data Jul 8, 2024
aa1d66a
Merge branch 'master' into Hubble-398-Combine-Del-Ins-Tasks
harsha-stellar-data Jul 8, 2024
3f31dca
Added Utilities.py for reusable templates to serve both history_table…
harsha-stellar-data Jul 10, 2024
5bfd407
Mofified the reusable script name and updated the documentation image…
harsha-stellar-data Jul 10, 2024
fb019a0
Updated the code to use Kwargs, Added Comments
harsha-stellar-data Jul 10, 2024
faeee88
Updated formatting to match the linting standards as well as moved ta…
harsha-stellar-data Jul 10, 2024
aadd04a
Removed task_id param and fetching it directly from task_vars
harsha-stellar-data Jul 10, 2024
526f4be
Merge branch 'master' into Hubble-398-Combine-Del-Ins-Tasks
harsha-stellar-data Jul 11, 2024
4f49289
updated table names parameter to get values from table_ids airflow va…
harsha-stellar-data Jul 11, 2024
956c052
- Modified the DAG scripts to account for the naming convention diffe…
harsha-stellar-data Jul 11, 2024
f30f9ce
Updated the Documentation images
harsha-stellar-data Jul 11, 2024
82922cb
Merge branch 'master' into Hubble-398-Combine-Del-Ins-Tasks
harsha-stellar-data Jul 12, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@ This repository contains the Airflow DAGs for the [Stellar ETL](https://github.c
- [build_time_task](#build_time_task)
- [build_export_task](#build_export_task)
- [build_gcs_to_bq_task](#build_gcs_to_bq_task)
- [build_del_ins_from_gcs_to_bq_task](#build_del_ins_from_gcs_to_bq_task)
- [build_apply_gcs_changes_to_bq_task](#build_apply_gcs_changes_to_bq_task)
- [build_batch_stats](#build_batch_stats)
- [bq_insert_job_task](#bq_insert_job_task)
Expand Down Expand Up @@ -542,6 +543,7 @@ This section contains information about the Airflow setup. It includes our DAG d
- [build_export_task](#build_export_task)
- [build_gcs_to_bq_task](#build_gcs_to_bq_task)
- [build_apply_gcs_changes_to_bq_task](#build_apply_gcs_changes_to_bq_task)
- [build_del_ins_from_gcs_to_bq_task](#build_del_ins_from_gcs_to_bq_task)
- [build_batch_stats](#build_batch_stats)
- [bq_insert_job_task](#bq_insert_job_task)
- [cross_dependency_task](#cross_dependency_task)
Expand Down Expand Up @@ -668,6 +670,10 @@ This section contains information about the Airflow setup. It includes our DAG d

[This file](https://github.com/stellar/stellar-etl-airflow/blob/master/dags/stellar_etl_airflow/build_gcs_to_bq_task.py) contains methods for creating tasks that appends information from a Google Cloud Storage file to a BigQuery table. These tasks will create a new table if one does not exist. These tasks are used for history archive data structures, as Stellar wants to keep a complete record of the ledger's entire history.

### **build_del_ins_from_gcs_to_bq_task**

[This file](https://github.com/stellar/stellar-etl-airflow/blob/master/dags/stellar_etl_airflow/build_del_ins_from_gcs_to_bq_task.py) contains methods for deleting data from a specified BigQuery table according to the batch interval and also imports data from gcs to the corresponding BigQuery table. These tasks will create a new table if one does not exist. These tasks are used for history and state data structures, as Stellar wants to keep a complete record of the ledger's entire history.

### **build_apply_gcs_changes_to_bq_task**

[This file](https://github.com/stellar/stellar-etl-airflow/blob/master/dags/stellar_etl_airflow/build_apply_gcs_changes_to_bq_task.py) contains methods for creating apply tasks. Apply tasks are used to merge a file from Google Cloud Storage into a BigQuery table. Apply tasks differ from the other task that appends in that they apply changes. This means that they update, delete, and insert rows. These tasks are used for accounts, offers, and trustlines, as the BigQuery table represents the point in time state of these data structures. This means that, for example, a merge task could alter the account balance field in the table if a user performed a transaction, delete a row in the table if a user deleted their account, or add a new row if a new account was created.
Expand Down
2 changes: 2 additions & 0 deletions airflow_variables_dev.json
Original file line number Diff line number Diff line change
Expand Up @@ -334,6 +334,7 @@
"asset_stats": 720,
"build_batch_stats": 840,
"build_bq_insert_job": 1080,
"build_del_ins_from_gcs_to_bq_task": 2000,
"build_delete_data_task": 1020,
"build_export_task": 840,
"build_gcs_to_bq_task": 960,
Expand Down Expand Up @@ -367,6 +368,7 @@
"build_bq_insert_job": 180,
"build_copy_table": 180,
"build_dbt_task": 960,
"build_del_ins_from_gcs_to_bq_task": 400,
"build_delete_data_task": 180,
"build_export_task": 420,
"build_gcs_to_bq_task": 300,
Expand Down
2 changes: 2 additions & 0 deletions airflow_variables_prod.json
Original file line number Diff line number Diff line change
Expand Up @@ -332,6 +332,7 @@
"asset_stats": 420,
"build_batch_stats": 600,
"build_bq_insert_job": 840,
"build_del_ins_from_gcs_to_bq_task": 2000,
"build_delete_data_task": 780,
"build_export_task": 600,
"build_gcs_to_bq_task": 660,
Expand Down Expand Up @@ -365,6 +366,7 @@
"build_bq_insert_job": 180,
"build_copy_table": 180,
"build_dbt_task": 1800,
"build_del_ins_from_gcs_to_bq_task": 400,
"build_delete_data_task": 180,
"build_export_task": 300,
"build_gcs_to_bq_task": 300,
Expand Down
Loading
Loading