Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zero-Downtime option for CTAS recreation #95

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
32 changes: 32 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -73,6 +73,38 @@ _Additional information_

### Usage notes

**Zero-Downtime Tables**
Starting from adapter version `1.0.3` there is a way to do `dbt run` without downtime on table update
It creates a new CTAS table with name `ctas_{{model.name}}_{{timestamp}}` + creates or updates extra view `{{model.name}}` that selects from this CTAS

It works in the following way:
1. Enable zero-downtime for tables
a) Add var `table_zero_downtime` to the `dbt_project.yml`
```
vars:
table_zero_downtime: true
```
b) Alternatively add tag `table_zero_downtime` to the specific model with `table` materialization
```
{{config(tags=['table_zero_downtime'])}}
```

2. Cleaning up stale objects like ctas
a) We've added a complimentary set of macroses to cleanup objects that not exist in Git
Checkout these links:
- https://github.com/SOVALINUX/dbt-utils/blob/main/macros/sql/delete_stale_objects.sql
- https://github.com/SOVALINUX/athena-utils/blob/main/macros/dbt_utils/sql/delete_stale_objects.sql
And `on-run-end` hook I can do the following trick:
```
on-run-end: "{% do athena_utils.delete_stale_ctas_run_end([target.schema, generate_schema_name('some_extra_schema', '')], False, '') %}"
```

b) For our projects for development purposes I've added wrapper for `on-run-end` hook that will not trigger on single model run on developer machines

### Notes on Docker
If you ever going to add this connector to the Docker, please use Dockerfile from dbt v1.1 or higher
https://github.com/dbt-labs/dbt-core/blob/1.1.latest/docker/Dockerfile

### Models

#### Table Configuration
Expand Down
32 changes: 23 additions & 9 deletions dbt/include/athena/macros/materializations/models/table/table.sql
Original file line number Diff line number Diff line change
@@ -1,22 +1,36 @@
{% materialization table, adapter='athena' -%}
{%- set identifier = model['alias'] -%}

{{ run_hooks(pre_hooks) }}

{%- set old_relation = adapter.get_relation(database=database, schema=schema, identifier=identifier) -%}
{% if not var('table_zero_downtime', false) and 'table_zero_downtime' not in config.get('tags') %}
{%- if old_relation is not none -%}
{{ adapter.drop_relation(old_relation) }}
{%- endif -%}
{%- endif -%}
{%- set target_relation = api.Relation.create(identifier=identifier,
schema=schema,
database=database,
type='table') -%}

{{ run_hooks(pre_hooks) }}

{%- if old_relation is not none -%}
{{ adapter.drop_relation(old_relation) }}
{%- endif -%}

-- build model
{% call statement('main') -%}
{{ create_table_as(False, target_relation, sql) }}
{% endcall -%}
{% if var('table_zero_downtime', false) or 'table_zero_downtime' in config.get('tags') %}
{%- set current_ts = (modules.datetime.datetime.utcnow() - modules.datetime.datetime.utcfromtimestamp(0)).total_seconds() * 1000 -%}
{%- set ctas_id_str = "ctas_{0}_{1}".format(identifier, current_ts) -%}
{%- set ctas_id = ctas_id_str[0:ctas_id_str.index('.')] -%}
{%- set ctas_relation = '"{0}"."{1}"."{2}"'.format(database, schema, ctas_id) -%}
{% call statement('main') -%}
{{ create_table_as(False, ctas_relation, sql) }}
{% endcall -%}
{% call statement('main') -%}
{{ create_view_as(target_relation, "SELECT * FROM " ~ ctas_relation) }}
{% endcall -%}
{%- else -%}
{% call statement('main') -%}
{{ create_table_as(False, target_relation, sql) }}
{% endcall -%}
{%- endif -%}

-- set table properties
{{ set_table_classification(target_relation, 'parquet') }}
Expand Down
2 changes: 2 additions & 0 deletions test/integration/athena.dbtspec
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@ target:
schema: "{{ env_var('DBT_TEST_ATHENA_SCHEMA', 'dbt_integration_tests') }}"
region_name: "{{ env_var('DBT_TEST_ATHENA_REGION', 'eu-west-1') }}"
s3_staging_dir: "{{ env_var('DBT_TEST_ATHENA_S3_STAGING_DIR') }}"
aws_profile_name: "{{ env_var('DBT_TEST_AWS_PROFILE') }}"
work_group: "{{ env_var('DBT_TEST_ATHENA_WORKGROUP', 'primary') }}"
sequences:
test_dbt_empty: empty
test_dbt_base: base
Expand Down