-
Notifications
You must be signed in to change notification settings - Fork 960
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clone incremental models as the first step of your CI job #4359
Comments
@graciegoheen I think this information would make a great Best Practices guide. @joellabes would you be able to review once the docs team gets it ready? Like you suggested, I also think we should also talk about how |
Just a comment that I read this excellent post and fully agree with all points made. |
Yes lmk when you're ready for me! excited to get this up into the world |
## What are you changing in this pull request and why? Closes issue #4359 ## Checklist - [x] Review the [Content style guide](https://github.com/dbt-labs/docs.getdbt.com/blob/current/contributing/content-style-guide.md) so my content adheres to these guidelines. - [x] For [docs versioning](https://github.com/dbt-labs/docs.getdbt.com/blob/current/contributing/single-sourcing-content.md#about-versioning), review how to [version a whole page](https://github.com/dbt-labs/docs.getdbt.com/blob/current/contributing/single-sourcing-content.md#adding-a-new-version) and [version a block of content](https://github.com/dbt-labs/docs.getdbt.com/blob/current/contributing/single-sourcing-content.md#versioning-blocks-of-content). - [x] Add a checklist item for anything that needs to happen before this PR is merged, such as "needs technical review" or "change base branch." Adding new pages (delete if not applicable): - [x] Add page to `website/sidebars.js` - [x] Provide a unique filename for the new page
@matthewshaver is this closed? |
hey team, looks like this was addressed in #4542. closing this issue but let me know if this is wrong! thank you! |
Contributions
Link to the page on docs.getdbt.com requiring updates
Probably this one makes the most sense https://docs.getdbt.com/docs/deploy/ci-jobs
Could also be a blog post!
Or part of some best practice guide?
Main content
Imagine that you've created a Slim CI job in dbt Cloud.
Your CI job:
dbt build --select state:modified+
to run and test all of the models you've modified and their downstream dependenciesNow imagine your dbt project looks like this:
When you open a PR that modifies
dim_wizards
, your CI job will kickoff and build only the modified models and their downstream dependencies (in this case:dim_wizards
andfct_orders
) into a temporary schema that's unique to your PR.This build mimics the behavior of what will happen once the PR is merged into the main branch (so you have confidence that you're not introducing breaking changes), without requiring a build of your entire dbt project.
But what happens when one of the modified models (or one of their downstream dependencies) is an incremental model?
Because your CI job is building modified models into a PR-specific schema, on the first execution of
dbt build --select state:modified+
the modified incremental model will be built in its entirety because it does not yet exist in the PR-specific schema aka is_incremental will be false. You're running infull-refresh
mode.This can be suboptimal because:
full-refresh
of the incremental model passes successfully in your CI job but an incremental build of that same table in prod would fail when the PR is merged into main (think schema drift where on_schema_change config is set tofail
)We can alleviate the above problems by zero copy cloning the relevant, pre-exisitng incremental models into our PR-specific schema as the first step of the CI job using the
dbt clone
command. This way, the incremental models already exist in the PR-specific schema when you first execute the commanddbt build --select state:modified+
so theis_incremental
flag will betrue
.Now, we'll have 2 commands for our dbt Cloud CI check to execute:
dbt clone --select state:modified+,config.materialized:incremental,state:old
dbt build --select state:modified+
Because of our first clone step, the incremental models selected in our
dbt build
in the second step will run in incremental mode.Your CI jobs will run faster, and you're more accurately mimicking the behavior of "what will happen once the PR has been merged into main".
Disclaimers:
dbt clone
is only available with dbt version 1.6+dbt clone
will just create pointer views)Additional information
Relevant slack thread: https://dbt-labs.slack.com/archives/C05FWBP9X1U/p1692830261651829
From my "Better CI for better data quality coalesce talk:
Expansion on "think schema drift where on_schema_change config is set to
fail
" from above:Let’s imagine you have an incremental model my_incremental_model with the following config:
Now, let’s say I open up a PR that adds a new column to
my_incremental_model
- in this case:full-refresh
will succeedIf you have a daily production job that just executes a
dbt build
(without a--full-refresh
flag), once the PR is merged into main and the job kicks off, you will get a failure. So the question is - what do you want to happen in CI?dbt build --full-refresh --select my_incremental_model
in production in order to avoid a failure in prod? This will block your CI check from passing.full-refresh
for this model in prod you will be in a successful state? This may lead to you being surprised that your production job is suddenly failing when you merge this PR into main because you didn’t realize you would need to execute adbt build --full-refresh --select my_incremental_model
in production.Probably not a perfect solution here, it’s all just tradeoffs! Personally, I'd rather have the failing CI job and have to manually override the blocking branch protection rule so that I'm not surprised and can proactively run the appropriate command in production once I merge the PR in.
Expansion on "why
state:old
":For brand new incremental models we actually want those to run in
full-refresh
mode in CI, because they will run infull-refresh
mode in production when the PR is merged intomain
because they also don't exist yet in the production environment... they're brand new!If you don't specify this, you won't get an error just a “No relation found in state manifest for…” - so it technically works with our without specifying
state:old
. But addingstate:old
is more explicit and means it won't even try to clone the brand new incremental models.The text was updated successfully, but these errors were encountered: