Incremental Models - `unique_key` possibly has different behavior than stated #4355

fathom-parth · 2023-10-27T15:19:32Z

Contributions

I have read the contribution docs, and understand what's expected of me.

Link to the page on docs.getdbt.com requiring updates

https://docs.getdbt.com/docs/build/incremental-models#defining-a-unique-key-optional

What part(s) of the page would you like to see updated?

Context

I'd like to use incremental models to replace all records with a specific timestamp (an ingest timestamp) with the new rows coming in.

When reading over the dbt incremental docs, it seems the merge strategy would be the best for this; however, this strategy requires a unique_key.

The Problem

In this case, there's no unique_key that I want to define per record and I'd rather have the incremental strat remove every row related to a specific column value (in my case, an ingest timestamp).

I asked this question on slack here:
https://getdbt.slack.com/archives/CBSQTAPLG/p1698417729305499

And some people mentioned that they do this by defining a unique_key that's not actually unique per-record and that this works fine for them.

One of the users on slack also mention:

The docs are misleading.. and the naming of the config is also confusing. I’m sure if you search slack for unique_key you’ll see many people mention what Daron and I suggested.

The documentation seems to conflict with the above suggestion however by stating:

The optional unique_key parameter specifies a field (or combination of fields) that define the grain of your model. That is, the field(s) identify a single unique row. You can define unique_key in a configuration block at the top of your model, and it can be a single column name or a list of column names.

and

Please note that if there's a unique_key with more than one row in either the existing target table or the new incremental rows, the incremental model may fail depending on your database and incremental strategy.

This conflict is confusing to me and I'd like to know if the docs need to be updated or if the users' experience is undefined.

Additional information

For reference, I'm using the bigquery adapter.

The text was updated successfully, but these errors were encountered:

fathom-parth · 2023-12-11T17:09:47Z

@mirnawong1 Is this the right place for me to open this issue? Is there a different repo that would get this more traction? Trying to see if my organization can use this feature or if this is undefined behavior

mirnawong1 · 2023-12-12T18:26:15Z

hey @fathom-parth this is def the right place. I'm going to consult with our developer experience team to see how to best address this for you. thank you for flagging this and apologies for the delay here!

dbeatty10 · 2023-12-13T03:49:09Z

Take a look at the `insert_overwrite` strategy

@fathom-parth If you are using dbt-bigquery, what you are describing sounds like a good use-case for the insert_overwrite strategy.

You can read commentary about the insert_overwrite strategy from other practitioners here:

Reports from other users on Slack

We'd need to learn more about their specific details before making updates to our documentation.

I don't know one way or the other how defining a unique_key that's not actually unique would behave without knowing which dbt adapter they are using and seeing an example of their source data + model configs + model definitions.

Our docs for dbt-snowflake say that:

Snowflake's merge statement fails with a "nondeterministic merge" error if the unique_key specified in your model config is not actually unique. If you encounter this error, you can instruct dbt to use a two-step incremental approach by setting the incremental_strategy config for your model to delete+insert.

So possibly they are using the delete+insert strategy with a platform that supports it. Or possibly the platform they are using has a merge statement that allows for non-unique keys.

fathom-parth · 2023-12-13T04:42:58Z

@dbeatty10
Thanks for all the links!

I did look into the insert_overwrite strategy but that wouldn’t work for us since that overwrites entire partitions.

We currently set partitioning on the ingest timestamp column (a datetime); however, the partitioning is set to the granularity of a month in accordance to BQ’s best practices so we don’t run into the maximum number of allowed partitions and we follow BQ's guidelines around the most optimal size per partition.

Overwriting the entire month of data would be expensive and unideal. Ideally we’d only overwrite rows from the same exact timestamp

Lmk if I’ve grossly misunderstood insert overwrite or if I’m missing something obvious!!

Edited at 2023-12-13 11:13am ET for clarity (wrote the above half-asleep)

daronjp · 2023-12-13T16:04:40Z

@dbeatty10 responding to your question from Slack:

Using the Snowflake adapter

-- my_model.sql
{{
    config(
        ...
        materialized='incremental',
        incremental_strategy='delete+insert',
        unique_key='data_date'
        ...
    )
}}

SELECT
    data_date,
    col_1,
    col_2
FROM {{ ref('my_intermediate_model' }}
WHERE
    ...
    {% if is_incremental() %}
        AND data_date >= CURRENT_DATE - 10
    {% endif %}

fathom-parth · 2023-12-13T16:20:54Z

Interesting, I guess the BQ specific documentation doesn't mention the unique_key needing to be unique:
https://docs.getdbt.com/reference/resource-configs/bigquery-configs#the-merge-strategy

fathom-parth · 2023-12-13T16:30:48Z

Ah though looking through the code on how the merge statement gets generated by the BQ adapter (seems to use the default merge statement from dbt-core) and how BQ handles merge statements, I'm pretty sure it'll fail if the predicate matches multiple rows.

dbeatty10 · 2024-02-05T19:01:33Z

There are two different use cases in which unique_key is utilized today for incremental materializations:

“merge on an actually unique key”
“replace partitions”

For (1), the upstream source would benefit from adding unique and not_null checks since it must be unique to behave correctly.

For (2), there should not be any uniqueness check, because we'd expect many rows to have the same value for unique_key. This case is a flagrant misnomer, of course 😱

It would be nice to formally distinguish between each of those use-cases, and dbt-labs/dbt-core#9490 might be a good way to accomplish that.

matthewshaver · 2024-12-05T20:58:43Z

I appears this may have been resolved via other methods. If any updates to the docs are required, please feel free to re-open

fathom-parth added content Improvements or additions to content improvement Use this when an area of the docs needs improvement as it's currently unclear labels Oct 27, 2023

dbeatty10 changed the title ~~Incremental Models - merge strategy with unique_key possibly has different behavior than stated~~ Incremental Models - unique_key possibly has different behavior than stated Dec 13, 2023

dbeatty10 mentioned this issue Dec 13, 2023

[CT-3493] [Bug] unique_key list incremental model has performance issues on the delete phase dbt-labs/dbt-adapters#150

Open

2 tasks

matthewshaver closed this as completed Dec 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incremental Models - `unique_key` possibly has different behavior than stated #4355

Incremental Models - `unique_key` possibly has different behavior than stated #4355

fathom-parth commented Oct 27, 2023

fathom-parth commented Dec 11, 2023

mirnawong1 commented Dec 12, 2023

dbeatty10 commented Dec 13, 2023

fathom-parth commented Dec 13, 2023 •

edited

Loading

daronjp commented Dec 13, 2023

fathom-parth commented Dec 13, 2023

fathom-parth commented Dec 13, 2023

dbeatty10 commented Feb 5, 2024

matthewshaver commented Dec 5, 2024

Incremental Models - unique_key possibly has different behavior than stated #4355

Incremental Models - unique_key possibly has different behavior than stated #4355

Comments

fathom-parth commented Oct 27, 2023

Contributions

Link to the page on docs.getdbt.com requiring updates

What part(s) of the page would you like to see updated?

Context

The Problem

Additional information

fathom-parth commented Dec 11, 2023

mirnawong1 commented Dec 12, 2023

dbeatty10 commented Dec 13, 2023

Take a look at the insert_overwrite strategy

Reports from other users on Slack

fathom-parth commented Dec 13, 2023 • edited Loading

daronjp commented Dec 13, 2023

fathom-parth commented Dec 13, 2023

fathom-parth commented Dec 13, 2023

dbeatty10 commented Feb 5, 2024

matthewshaver commented Dec 5, 2024

Incremental Models - `unique_key` possibly has different behavior than stated #4355

Incremental Models - `unique_key` possibly has different behavior than stated #4355

Take a look at the `insert_overwrite` strategy

fathom-parth commented Dec 13, 2023 •

edited

Loading