Skip to content

Commit

Permalink
Merge pull request #71 from Snowflake-Labs/create-constraints-for-fai…
Browse files Browse the repository at this point in the history
…led-skipped-tests

Create constraints for failed skipped tests
  • Loading branch information
sfc-gh-dflippo authored Jul 25, 2024
2 parents 3669830 + c36ed01 commit 94cefff
Show file tree
Hide file tree
Showing 13 changed files with 531 additions and 326 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -3,3 +3,4 @@ target/
dbt_packages/
logs/
.DS_Store
.gitconfig
84 changes: 41 additions & 43 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ vars:
```yml
packages:
- package: Snowflake-Labs/dbt_constraints
version: [">=0.6.0", "<0.7.0"]
version: [">=1.0.0", "<1.1.0"]
# <see https://github.com/Snowflake-Labs/dbt_constraints/releases/latest> for the latest version tag.
# You can also pull the latest changes from Github with the following:
# - git: "https://github.com/Snowflake-Labs/dbt_constraints.git"
Expand Down Expand Up @@ -119,66 +119,64 @@ packages:
<ADAPTER_NAME>__have_ownership_priv(table_relation, verify_permissions, lookup_cache=none)
```

## dbt_constraints Limitations
## RELY and NORELY Properties

Generally, if you don't meet a requirement, tests are still executed but the constraint is skipped rather than producing an error.
Version 1.0.0 introduces the ability to create constraints with the RELY and NORELY properties on Snowflake. Executed tests with zero failures are created with the `RELY` property. Tests with any failures will generate `NORELY` constraints and constraints will be altered to `RELY` or `NORELY` based on subsequent executions of the test. When the `always_create_constraint` feature is enabled, it is now also possible to create `NORELY` constraints using `dbt run` and then have those constraints become RELY constraints using `dbt test`.

* All models involved in a constraint must be materialized as table, incremental, snapshot, or seed.
## Determining the Constraints to Generate

* If source constraints are enabled, the source must be a table. You must also have the `OWNERSHIP` table privilege to add a constraint. For foreign keys you also need the `REFERENCES` privilege on the parent table with the primary or unique key. The package will identify when you lack these privileges on Snowflake and PostgreSQL. Oracle does not provide an easy way to look up your effective privileges so it has an exception handler and will display Oracle's error messages.
Version 1.0.0 introduces a more advanced set of criteria for selecting tests to turn into constraints.

* The test must be one of the following: `primary_key`, `unique_key`, `unique_combination_of_columns`, `unique`, `foreign_key`, `relationships`, or `not_null`
* The test executed and had zero failures (RELY) or the database has support for NORELY constraints
* The test executed (RELY/NORELY), we need the primary/unique key constraint for a foreign key, or we have the `always_create_constraint` parameter turned on.
* If you are using `build`, `run`, or `test` for only part of a project using the `--select` parameter, either the test or its model was selected to run, or the test is a primary/unique key that is needed for a selected foreign key. If a primary/unique key is created for a foreign key, and its test was not executed, the primary/unique key will be created as a NORELY constraint.
* All models involved in a constraint must **not** be a view or ephemeral materialization. Version 1.0.0 now allows custom materializations.
* If source constraints are enabled, the source must be a table. You must also have the `OWNERSHIP` table privilege to add a constraint. For foreign keys you also need the `REFERENCES` privilege on the parent table with the primary or unique key. The package will identify when you lack these privileges on Snowflake and PostgreSQL. Oracle does not provide an easy way to look up your effective privileges so it has an exception handler and will display Oracle's error messages.
* All columns on constraints must be individual column names, not expressions. You can reference columns on a model that come from an expression.

* Constraints are not created for failed tests. See how to get around this using severity and `config: always_create_constraint: true` in the next section.

* `primary_key`, `unique_key`, and `foreign_key` tests are considered first and duplicate constraints are skipped. One exception is that you will get an error if you add two different `primary_key` tests to the same model.

* Foreign keys require that the parent table have a primary key or unique key on the referenced columns. Unique keys generated from standard `unique` tests are sufficient.

* The order of columns on a foreign key test must match between the FK columns and PK columns
* Referential constraints must apply to all the rows in a table so any tests with a `config: where:`, `config: warn_if:`, or `config: fail_calc:` property will be set as `NORELY` when creating constraints.

Additional notes:
* The `foreign_key` test will ignore any rows with a null column, even if only one of two columns in a compound key is null. If you also want to ensure FK columns are not null, you should add standard `not_null` tests to your model which will add not null constraints to the table.
* You may need to manually drop a primary key constraint from a table if you change the columns in the constraint. This is not necessary for table materializations or if you do a full-refresh of an incremental model.

## Advanced: `always_create_constraint` Property

* Referential constraints must apply to all the rows in a table so any tests with a `config: where:` property will be skipped when creating constraints. See how to disable this rule using `config: always_create_constraint: true` in the next section.
There is an advanced option for Snowflake users to force a constraint to be generated even when the test was not executed. When this setting is in effect, constraints on Snowflake will have the `NORELY` property until the associated test is executed with zero failures. Snowflake does not support `NORELY` for not null constraints so those constraints will still be skipped. You activate this feature in your dbt_project.yml under the `models:` or `tests:` sections. You can set it to be true for your entire project or you can specify specific folders that should use this feature. You can also set this in a specific model's header.

## Advanced: `config: always_create_constraint: true` property
__[Caveat Emptor](https://en.wikipedia.org/wiki/Caveat_emptor):__

There is an advanced option to force a constraint to be generated when there is a `config: where:` property or if the constraint has a threshold. The `config: always_create_constraint: true` property will override those exclusions. When this setting is in effect, you can create constraints even when you have excluded some records or have a number of failures below a threshold. If your test has a status of 'failed', it will still be skipped. Please see [dbt's documentation on how to set a threshold for failures](https://docs.getdbt.com/reference/resource-configs/severity).
* You will get an error if you try to force constraints to be generated that are enforced by your database. On Snowflake that is only a not_null constraint but on databases like Oracle, all the generated constraints are enforced. This is why, at present, only the Snowflake macros implement this feature.
* This feature can still cause unexpected query results on Snowflake due to [join elimination](https://docs.snowflake.com/en/user-guide/join-elimination). Although executing tests on Snowflake will correctly set the `RELY` or `NORELY` property based on whether the tests pass and fail, activating this feature and **skipping the execution of tests** will not cause a `RELY` constraint to become a `NORELY` constraint. A `RELY` constraint only becomes a `NORELY` constraint **if a test is executed** and has failures. If you create a `RELY` constraint by running `dbt build` and subsequently only execute `dbt run` without eventually following up with `dbt test`, you could have constraints that still have the `RELY` property but now have referential integrity issues. Snowflake users are encouraged to frequently or always execute their tests so that the `RELY` property is kept up to date.

__Caveat Emptor:__
These are examples from a dbt_project.yml using the feature in models or tests:

* You will get an error if you try to force constraints to be generated that are enforced by your database. On Snowflake that is only a not_null constraint but on databases like Oracle, all the generated constraints are enforced.
* This feature could cause unexpected query results on Snowflake due to [join elimination](https://docs.snowflake.com/en/user-guide/join-elimination).
```yml
models:
your_project_name:
+always_create_constraint: true
tests:
your_project_name:
+always_create_constraint: true
```

This is an example using the feature:
This is an example from a model schema.yml using the feature. Setting the property in the `config:` section of a test does not work so you should set it in the model's `config:` section.

```yml
- name: dim_duplicate_orders
description: "Test that we do not try to create PK/UK on failed tests"
columns:
- name: o_orderkey
description: "The primary key for this table"
- name: o_orderkey_seq
description: "duplicate seq column to test UK"
tests:
# This constraint should be skipped because it has failures
- dbt_constraints.primary_key:
column_name: o_orderkey
config:
severity: warn
# This constraint should be still generated because always_create_constraint=true
- dbt_constraints.unique_key:
column_name: o_orderkey
config:
warn_if: ">= 5000"
error_if: ">= 10000"
always_create_constraint: true
# This constraint should be still generated because always_create_constraint=true
- dbt_constraints.unique_key:
column_name: o_orderkey_seq
config:
severity: warn
always_create_constraint: true
version: 2
models:
- name: your_model_name
config:
always_create_constraint: true
```

This is an example of activating the feature in the header of a model:
```jinja
{{ config(always_create_constraint = true) }}
```

## Primary Maintainers
Expand Down
2 changes: 1 addition & 1 deletion dbt_project.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@

name: 'dbt_constraints'
version: '0.6.3'
version: '1.0.0'
config-version: 2

# These macros depend on the results and graph objects in dbt >=0.19.0
Expand Down
7 changes: 7 additions & 0 deletions integration_tests/dbt_project.yml
Original file line number Diff line number Diff line change
Expand Up @@ -50,3 +50,10 @@ seeds:
+quote_columns: false
+post-hook: "{{ clone_table('source_') }}"
#+full_refresh: false

tests:
+dbt_constraints_integration_tests:
#+always_create_constraint: true
# These configuration settings disable running tests or just constraints by path
# +enabled: false
#+dbt_constraints_enabled: false
2 changes: 2 additions & 0 deletions integration_tests/models/dim_part.sql
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@
All Parts
Additional unique keys generated by sequence and hash
*/
{{ config(always_create_constraint = true) }}

SELECT
P.*,
DENSE_RANK() over (order by p_partkey) as p_partkey_seq
Expand Down
10 changes: 6 additions & 4 deletions integration_tests/models/schema.yml
Original file line number Diff line number Diff line change
Expand Up @@ -152,19 +152,17 @@ models:
column_name: o_orderkey
config:
severity: warn
# This constraint can be generated if you uncomment always_create_constraint=true

- dbt_constraints.unique_key:
column_name: o_orderkey
config:
warn_if: ">= 5000"
error_if: ">= 10000"
# always_create_constraint: true
# This constraint can be generated if you uncomment always_create_constraint=true

- dbt_constraints.unique_key:
column_name: o_orderkey_seq
config:
severity: warn
# always_create_constraint: true

- name: fact_order_line_missing_orders
description: "Test that we do not create FK on failed tests"
Expand Down Expand Up @@ -202,6 +200,8 @@ models:

- name: dim_orders_null_keys
description: "All Orders"
config:
always_create_constraint: true
columns:
- name: o_custkey
tests:
Expand All @@ -215,10 +215,12 @@ models:
column_name: o_orderkey
config:
severity: warn

# test that we still create this valid unique key
- dbt_constraints.unique_key:
column_name: o_orderkey_seq


- name: dim_part_supplier
description: "Multi column UK"
columns:
Expand Down
2 changes: 1 addition & 1 deletion integration_tests/models/sources.yml
Original file line number Diff line number Diff line change
Expand Up @@ -129,7 +129,7 @@ sources:
# How to validate a compound foreign key
- relationships:
column_name: "coalesce(cast(l_partkey as varchar(100)), '') || '~' || coalesce(cast(l_suppkey as varchar(100)), '')"
to: source('tpc_h', 'partsupp')
to: source('tpc_h', 'source_partsupp')
field: "coalesce(cast(ps_partkey as varchar(100)), '') || '~' || coalesce(cast(ps_suppkey as varchar(100)), '')"

# multi-column FK
Expand Down
Loading

0 comments on commit 94cefff

Please sign in to comment.