Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Moving the guide content to the new format #4361

Merged
merged 67 commits into from
Nov 11, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
67 commits
Select commit Hold shift + click to select a range
c39e9cb
adding guides to cool new section
runleonarun Oct 28, 2023
7cd0cf9
fixing best-practice links
runleonarun Oct 28, 2023
b576ae0
Merge branch 'quickstarts-q32023-update' into quickstart-content-updates
runleonarun Oct 30, 2023
98a5c3b
moving more guides
runleonarun Oct 31, 2023
ab51ac6
Merge branch 'quickstarts-q32023-update' into quickstart-content-updates
runleonarun Oct 31, 2023
c710e04
Merge branch 'quickstarts-q32023-update' into quickstart-content-updates
runleonarun Nov 1, 2023
9acee7b
Merge branch 'quickstarts-q32023-update' into quickstart-content-updates
runleonarun Nov 1, 2023
71502f1
moving a few more guides
runleonarun Nov 2, 2023
1b664b1
fixing links
runleonarun Nov 2, 2023
1b75021
Merge branch 'quickstarts-q32023-update' into quickstart-content-updates
runleonarun Nov 2, 2023
c8e1cb1
moving more guides
runleonarun Nov 4, 2023
1b148d5
adding feedback from @nghi-ly
runleonarun Nov 7, 2023
bfd288f
Merge branch 'quickstarts-q32023-update' into quickstart-content-updates
runleonarun Nov 7, 2023
0547922
commenting out time_to_complete
runleonarun Nov 7, 2023
5578d03
adding some consistency
runleonarun Nov 7, 2023
1fa986c
moving webhooks guide content
runleonarun Nov 7, 2023
e032c5b
movign more guide content
runleonarun Nov 8, 2023
7c98a98
movign snowpark
runleonarun Nov 8, 2023
7851eb4
Merge branch 'quickstarts-q32023-update' into quickstart-content-updates
runleonarun Nov 8, 2023
a16454e
move more guides
runleonarun Nov 8, 2023
cef4f21
moving and deleting adapter stuff
runleonarun Nov 8, 2023
6be2182
adding tags
runleonarun Nov 8, 2023
4808d3e
fixing details
runleonarun Nov 9, 2023
a6c7359
fixing links
runleonarun Nov 9, 2023
df294e4
fixing merge conflict
runleonarun Nov 9, 2023
59b5061
fixing links
runleonarun Nov 9, 2023
26338c0
fixing more links
runleonarun Nov 9, 2023
bab0aa6
fix a link o'rama
runleonarun Nov 9, 2023
550d333
fix a link o'rama
runleonarun Nov 9, 2023
78f22b5
fix a typo
runleonarun Nov 9, 2023
07f9a24
adding the forwarders
runleonarun Nov 9, 2023
ea3f927
Update website/docs/docs/dbt-versions/release-notes/07-June-2023/prod…
runleonarun Nov 10, 2023
a68ca2e
Update website/docs/guides/dbt-models-on-databricks.md
runleonarun Nov 10, 2023
51ac90f
Update website/docs/guides/debug-schema-names.md
runleonarun Nov 10, 2023
5358095
Update website/docs/guides/debug-schema-names.md
runleonarun Nov 10, 2023
2f36990
Update website/docs/guides/debug-schema-names.md
runleonarun Nov 10, 2023
a70225d
Update website/docs/guides/serverless-datadog.md
runleonarun Nov 10, 2023
0129c68
Update website/docs/guides/debug-schema-names.md
runleonarun Nov 10, 2023
40aabfa
Apply suggestions from code review
runleonarun Nov 10, 2023
875eb0c
Update website/docs/best-practices/best-practice-workflows.md
nghi-ly Nov 10, 2023
164a80c
Update website/blog/2023-04-24-framework-refactor-alteryx-dbt.md
nghi-ly Nov 10, 2023
96c344f
Update website/blog/2023-04-24-framework-refactor-alteryx-dbt.md
nghi-ly Nov 10, 2023
ebbf184
Update website/docs/best-practices/how-we-structure/5-semantic-layer-…
nghi-ly Nov 10, 2023
c0e8253
Update website/docs/best-practices/how-we-structure/5-semantic-layer-…
nghi-ly Nov 10, 2023
9f2b122
Update website/docs/best-practices/how-we-style/6-how-we-style-conclu…
nghi-ly Nov 10, 2023
09a8593
Update website/docs/docs/cloud/dbt-cloud-ide/dbt-cloud-tips.md
nghi-ly Nov 10, 2023
17bc23a
Update website/docs/docs/dbt-versions/release-notes/24-Nov-2022/dbt-d…
nghi-ly Nov 10, 2023
830da91
Update website/docs/docs/deploy/deploy-environments.md
nghi-ly Nov 10, 2023
3d3da06
Update website/docs/docs/deploy/deploy-environments.md
nghi-ly Nov 10, 2023
4d0ba52
Update website/docs/docs/environments-in-dbt.md
nghi-ly Nov 10, 2023
29e0de8
Update website/docs/faqs/Project/multiple-resource-yml-files.md
nghi-ly Nov 10, 2023
48ad5e2
Update website/docs/faqs/Project/resource-yml-name.md
nghi-ly Nov 10, 2023
6d4959f
Update website/docs/faqs/Project/structure-a-project.md
nghi-ly Nov 10, 2023
3b4e603
Update website/docs/guides/debug-schema-names.md
nghi-ly Nov 10, 2023
c77fe1e
Update website/docs/guides/debug-schema-names.md
nghi-ly Nov 10, 2023
5013b67
Update website/docs/sql-reference/aggregate-functions/sql-array-agg.md
nghi-ly Nov 10, 2023
b49f8c4
Update website/docs/sql-reference/aggregate-functions/sql-avg.md
nghi-ly Nov 10, 2023
ace443f
Update website/docs/sql-reference/aggregate-functions/sql-round.md
nghi-ly Nov 10, 2023
f163894
Update website/docs/sql-reference/clauses/sql-limit.md
nghi-ly Nov 10, 2023
4036a14
Update website/docs/sql-reference/clauses/sql-order-by.md
nghi-ly Nov 10, 2023
ec705a2
Update website/docs/sql-reference/joins/sql-self-join.md
nghi-ly Nov 10, 2023
2a07b7b
Update website/docs/sql-reference/joins/sql-left-join.md
nghi-ly Nov 10, 2023
7c065e1
Update website/docs/sql-reference/joins/sql-inner-join.md
nghi-ly Nov 10, 2023
d866035
Update website/docs/guides/custom-cicd-pipelines.md
nghi-ly Nov 10, 2023
0fe005a
Update website/docs/docs/cloud/billing.md
nghi-ly Nov 10, 2023
7de9d72
Update website/docs/docs/dbt-versions/release-notes/07-June-2023/prod…
nghi-ly Nov 10, 2023
6d14731
Fix old redirects, remove duplicate steps
nghi-ly Nov 11, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions website/blog/2021-02-05-dbt-project-checklist.md
Original file line number Diff line number Diff line change
Expand Up @@ -139,7 +139,7 @@ This post is the checklist I created to guide our internal work, and I’m shari
* [Sources](/docs/build/sources/)
* [Refs](/reference/dbt-jinja-functions/ref/)
* [tags](/reference/resource-configs/tags/)
* [Jinja docs](/guides/advanced/using-jinja)
* [Jinja docs](/guides/using-jinja)

## ✅ Testing & Continuous Integration
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Expand All @@ -156,7 +156,7 @@ This post is the checklist I created to guide our internal work, and I’m shari

**Useful links**

* [Version control](/guides/legacy/best-practices#version-control-your-dbt-project)
* [Version control](/best-practices/best-practice-workflows#version-control-your-dbt-project)
* [dbt Labs' PR Template](/blog/analytics-pull-request-template)

## ✅ Documentation
Expand Down Expand Up @@ -252,7 +252,7 @@ Thanks to Christine Berger for her DAG diagrams!

**Useful links**

* [How we structure our dbt Project](/guides/best-practices/how-we-structure/1-guide-overview)
* [How we structure our dbt Project](/best-practices/how-we-structure/1-guide-overview)
* [Coalesce DAG Audit Talk](https://www.youtube.com/watch?v=5W6VrnHVkCA&t=2s)
* [Modular Data Modeling Technique](https://getdbt.com/analytics-engineering/modular-data-modeling-technique/)
* [Understanding Threads](/docs/running-a-dbt-project/using-threads)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -159,4 +159,4 @@ All of the above configurations “work”. And as detailed, they each solve for
2. Figure out what may be a pain point in the future and try to plan for it from the beginning.
3. Don’t over-complicate things until you have the right reason. As I said in my Coalesce talk: **don’t drag your skeletons from one closet to another** 💀!

**Note:** Our attempt in writing guides like this and [How we structure our dbt projects](/guides/best-practices/how-we-structure/1-guide-overview) aren’t to try to convince you that our way is right; it is to hopefully save you the hundreds of hours it has taken us to form those opinions!
**Note:** Our attempt in writing guides like this and [How we structure our dbt projects](/best-practices/how-we-structure/1-guide-overview) aren’t to try to convince you that our way is right; it is to hopefully save you the hundreds of hours it has taken us to form those opinions!
2 changes: 1 addition & 1 deletion website/blog/2021-11-23-how-to-upgrade-dbt-versions.md
Original file line number Diff line number Diff line change
Expand Up @@ -156,7 +156,7 @@ Once your compilation issues are resolved, it's time to run your job for real, t

After that, make sure that your CI environment in dbt Cloud or your orchestrator is on the right dbt version, then open a PR.

If you're using [Slim CI](https://docs.getdbt.com/docs/guides/best-practices#run-only-modified-models-to-test-changes-slim-ci), keep in mind that artifacts aren't necessarily compatible from one version to another, so you won't be able to use it until the job you defer to has completed a run with the upgraded dbt version. This doesn't impact our example because support for Slim CI didn't come out until 0.18.0.
If you're using [Slim CI](https://docs.getdbt.com/docs/best-practices#run-only-modified-models-to-test-changes-slim-ci), keep in mind that artifacts aren't necessarily compatible from one version to another, so you won't be able to use it until the job you defer to has completed a run with the upgraded dbt version. This doesn't impact our example because support for Slim CI didn't come out until 0.18.0.

## Step 7. Merge and communicate

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ So let’s all commit to sharing our hard won knowledge with each other—and in

The purpose of this blog is to double down on our long running commitment to contributing to the knowledge loop.

From early posts like ‘[The Startup Founders Guide to Analytics’](https://thinkgrowth.org/the-startup-founders-guide-to-analytics-1d2176f20ac1) to foundational guides like [‘How We Structure Our dbt Projects](/guides/best-practices/how-we-structure/1-guide-overview)’, we’ve had a long standing goal of working with the community to create practical, hands-on tutorials and guides which distill the knowledge we’ve been able to collectively gather.
From early posts like ‘[The Startup Founders Guide to Analytics’](https://thinkgrowth.org/the-startup-founders-guide-to-analytics-1d2176f20ac1) to foundational guides like [‘How We Structure Our dbt Projects](/best-practices/how-we-structure/1-guide-overview)’, we’ve had a long standing goal of working with the community to create practical, hands-on tutorials and guides which distill the knowledge we’ve been able to collectively gather.

dbt as a product is based around the philosophy that even the most complicated problems can be broken down into modular, reusable components, then mixed and matched to create something novel.

Expand Down
4 changes: 2 additions & 2 deletions website/blog/2021-11-29-dbt-airflow-spiritual-alignment.md
Original file line number Diff line number Diff line change
Expand Up @@ -91,7 +91,7 @@ The common skills needed for implementing any flavor of dbt (Core or Cloud) are:

* SQL: ‘nuff said
* YAML: required to generate config files for [writing tests on data models](/docs/build/tests)
* [Jinja](/guides/advanced/using-jinja): allows you to write DRY code (using [macros](/docs/build/jinja-macros), for loops, if statements, etc)
* [Jinja](/guides/using-jinja): allows you to write DRY code (using [macros](/docs/build/jinja-macros), for loops, if statements, etc)

YAML + Jinja can be learned pretty quickly, but SQL is the non-negotiable you’ll need to get started.

Expand Down Expand Up @@ -176,7 +176,7 @@ Instead you can now use the following command:
`dbt build –select result:error+ –defer –state <previous_state_artifacts>` … and that’s it!


You can see more examples [here](https://docs.getdbt.com/docs/guides/best-practices#run-only-modified-models-to-test-changes-slim-ci).
You can see more examples [here](https://docs.getdbt.com/docs/best-practices#run-only-modified-models-to-test-changes-slim-ci).


This means that whether you’re actively developing or you simply want to rerun a scheduled job (because of, say, permission errors or timeouts in your database), you now have a unified approach to doing both.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,7 @@ In addition to learning the basic pieces of dbt, we're familiarizing ourselves w

If we decide not to do this, we end up missing out on what the dbt workflow has to offer. If you want to learn more about why we think analytics engineering with dbt is the way to go, I encourage you to read the [dbt Viewpoint](/community/resources/viewpoint#analytics-is-collaborative)!

In order to learn the basics, we’re going to [port over the SQL file](/guides/migration/tools/refactoring-legacy-sql) that powers our existing "patient_claim_summary" report that we use in our KPI dashboard in parallel to our old transformation process. We’re not ripping out the old plumbing just yet. In doing so, we're going to try dbt on for size and get used to interfacing with a dbt project.
In order to learn the basics, we’re going to [port over the SQL file](/guides/refactoring-legacy-sql) that powers our existing "patient_claim_summary" report that we use in our KPI dashboard in parallel to our old transformation process. We’re not ripping out the old plumbing just yet. In doing so, we're going to try dbt on for size and get used to interfacing with a dbt project.

**Project Appearance**

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -157,7 +157,7 @@ These 3 parts go from least granular (general) to most granular (specific) so yo

### Coming up...

In this part of the series, we talked about why the model name is the center of understanding for the purpose and content within a model. In the in the upcoming ["How We Structure Our dbt Projects"](https://docs.getdbt.com/guides/best-practices/how-we-structure/1-guide-overview) guide, you can explore how to use this naming pattern with more specific examples in different parts of your dbt DAG that cover regular use cases:
In this part of the series, we talked about why the model name is the center of understanding for the purpose and content within a model. In the in the upcoming ["How We Structure Our dbt Projects"](https://docs.getdbt.com/best-practices/how-we-structure/1-guide-overview) guide, you can explore how to use this naming pattern with more specific examples in different parts of your dbt DAG that cover regular use cases:

- How would you name a model that is filtered on some columns
- Do we recommend naming snapshots in a specific way
Expand Down
2 changes: 1 addition & 1 deletion website/blog/2022-06-30-lower-sql-function.md
Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,7 @@ After running this query, the `customers` table will look a little something lik
Now, all characters in the `first_name` and `last_name` columns are lowercase.

> **Where do you lower?**
> Changing all string columns to lowercase to create uniformity across data sources typically happens in our dbt project’s [staging models](https://docs.getdbt.com/guides/best-practices/how-we-structure/2-staging). There are a few reasons for that: data cleanup and standardization, such as aliasing, casting, and lowercasing, should ideally happen in staging models to create downstream uniformity. It’s also more performant in downstream models that join on string values to join on strings that are of all the same casing versus having to join and perform lowercasing at the same time.
> Changing all string columns to lowercase to create uniformity across data sources typically happens in our dbt project’s [staging models](https://docs.getdbt.com/best-practices/how-we-structure/2-staging). There are a few reasons for that: data cleanup and standardization, such as aliasing, casting, and lowercasing, should ideally happen in staging models to create downstream uniformity. It’s also more performant in downstream models that join on string values to join on strings that are of all the same casing versus having to join and perform lowercasing at the same time.

## Why we love it

Expand Down
2 changes: 1 addition & 1 deletion website/blog/2022-07-19-migrating-from-stored-procs.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@ With dbt, we work towards creating simpler, more transparent data pipelines like

![Diagram of what data flows look like with dbt. It's easier to trace lineage in this setup.](/img/blog/2022-07-19-migrating-from-stored-procs/dbt-diagram.png)

Tight [version control integration](https://docs.getdbt.com/docs/guides/best-practices#version-control-your-dbt-project) is an added benefit of working with dbt. By leveraging the power of git-based tools, dbt enables you to integrate and test changes to transformation pipelines much faster than you can with other approaches. We often see teams who work in stored procedures making changes to their code without any notion of tracking those changes over time. While that’s more of an issue with the team’s chosen workflow than a problem with stored procedures per se, it does reflect how legacy tooling makes analytics work harder than necessary.
Tight [version control integration](https://docs.getdbt.com/docs/best-practices#version-control-your-dbt-project) is an added benefit of working with dbt. By leveraging the power of git-based tools, dbt enables you to integrate and test changes to transformation pipelines much faster than you can with other approaches. We often see teams who work in stored procedures making changes to their code without any notion of tracking those changes over time. While that’s more of an issue with the team’s chosen workflow than a problem with stored procedures per se, it does reflect how legacy tooling makes analytics work harder than necessary.

## Methodologies for migrating from stored procedures to dbt

Expand Down
2 changes: 1 addition & 1 deletion website/blog/2022-07-26-pre-commit-dbt.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ is_featured: true

*Editor's note — since the creation of this post, the package pre-commit-dbt's ownership has moved to another team and it has been renamed to [dbt-checkpoint](https://github.com/dbt-checkpoint/dbt-checkpoint). A redirect has been set up, meaning that the code example below will still work. It is also possible to replace `repo: https://github.com/offbi/pre-commit-dbt` with `repo: https://github.com/dbt-checkpoint/dbt-checkpoint` in your `.pre-commit-config.yaml` file.*

At dbt Labs, we have [best practices](https://docs.getdbt.com/docs/guides/best-practices) we like to follow for the development of dbt projects. One of them, for example, is that all models should have at least `unique` and `not_null` tests on their primary key. But how can we enforce rules like this?
At dbt Labs, we have [best practices](https://docs.getdbt.com/docs/best-practices) we like to follow for the development of dbt projects. One of them, for example, is that all models should have at least `unique` and `not_null` tests on their primary key. But how can we enforce rules like this?

That question becomes difficult to answer in large dbt projects. Developers might not follow the same conventions. They might not be aware of past decisions, and reviewing pull requests in git can become more complex. When dbt projects have hundreds of models, it's hard to know which models do not have any tests defined and aren't enforcing your conventions.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -286,7 +286,7 @@ Developing an analytic code base is an ever-evolving process. What worked well w

4. **Test on representative data**

Testing on a [subset of data](https://docs.getdbt.com/guides/legacy/best-practices#limit-the-data-processed-when-in-development) is a great general practice. It allows you to iterate quickly, and doesn’t waste resources. However, there are times when you need to test on a larger dataset for problems like disk spillage to come to the fore. Testing on large data is hard and expensive, so make sure you have a good idea of the solution before you commit to this step.
Testing on a [subset of data](https://docs.getdbt.com/best-practices/best-practice-workflows#limit-the-data-processed-when-in-development) is a great general practice. It allows you to iterate quickly, and doesn’t waste resources. However, there are times when you need to test on a larger dataset for problems like disk spillage to come to the fore. Testing on large data is hard and expensive, so make sure you have a good idea of the solution before you commit to this step.

5. **Repeat**

Expand Down
2 changes: 1 addition & 1 deletion website/blog/2022-08-22-narrative-modeling.md
Original file line number Diff line number Diff line change
Expand Up @@ -177,7 +177,7 @@ To that final point, if presented with the DAG from the narrative modeling appro

### Users can tie business concepts to source data

- While the schema structure above is focused on business entities, there are still ample use cases for [staging and intermediate tables](https://docs.getdbt.com/guides/best-practices/how-we-structure/1-guide-overview).
- While the schema structure above is focused on business entities, there are still ample use cases for [staging and intermediate tables](https://docs.getdbt.com/best-practices/how-we-structure/1-guide-overview).
- After cleaning up source data with staging tables, use the same “what happened” approach to more technical events, creating a three-node dependency from `stg_snowplow_events` to `int_page_click_captured` to `user_refreshed_cart` and thus answering the question “where do we get online user behavior information?” in a quick visit to the DAG in dbt docs.

# Should your team use it?
Expand Down
2 changes: 1 addition & 1 deletion website/blog/2022-09-08-konmari-your-query-migration.md
Original file line number Diff line number Diff line change
Expand Up @@ -108,7 +108,7 @@ Here are a few things to look for:

## Steps 4 & 5: Tidy by category and follow the right order—upstream to downstream

We are ready to unpack our kitchen. Use your design as a guideline for [modularization](/guides/best-practices/how-we-structure/1-guide-overview).
We are ready to unpack our kitchen. Use your design as a guideline for [modularization](/best-practices/how-we-structure/1-guide-overview).

- Build your staging tables first, and then your intermediate tables in your pre-planned buckets.
- Important, reusable joins that are performed in the final query should be moved upstream into their own modular models, as well as any joins that are repeated in your query.
Expand Down
4 changes: 2 additions & 2 deletions website/blog/2022-11-22-move-spreadsheets-to-your-dwh.md
Original file line number Diff line number Diff line change
Expand Up @@ -102,7 +102,7 @@ Instead of syncing all cells in a sheet, you create a [named range](https://five

<Lightbox src="/img/blog/2022-11-22-move-spreadsheets-to-your-dwh/google-sheets-uploader.png" title="Creating a named range in Google Sheets to sync via the Fivetran Google Sheets Connector" />

Beware of inconsistent data types though—if someone types text into a column that was originally numeric, Fivetran will automatically convert the column to a string type which might cause issues in your downstream transformations. [The recommended workaround](https://fivetran.com/docs/files/google-sheets#typetransformationsandmapping) is to explicitly cast your types in [staging models](https://docs.getdbt.com/guides/best-practices/how-we-structure/2-staging) to ensure that any undesirable records are converted to null.
Beware of inconsistent data types though—if someone types text into a column that was originally numeric, Fivetran will automatically convert the column to a string type which might cause issues in your downstream transformations. [The recommended workaround](https://fivetran.com/docs/files/google-sheets#typetransformationsandmapping) is to explicitly cast your types in [staging models](https://docs.getdbt.com/best-practices/how-we-structure/2-staging) to ensure that any undesirable records are converted to null.

#### Good fit for:

Expand Down Expand Up @@ -192,4 +192,4 @@ Databricks also supports [pulling in data, such as spreadsheets, from external c

Beyond the options we’ve already covered, there’s an entire world of other tools that can load data from your spreadsheets into your data warehouse. This is a living document, so if your preferred method isn't listed then please [open a PR](https://github.com/dbt-labs/docs.getdbt.com) and I'll check it out.

The most important things to consider are your files’ origins and formats—if you need your colleagues to upload files on a regular basis then try to provide them with a more user-friendly process; but if you just need two computers to talk to each other, or it’s a one-off file that will hardly ever change, then a more technical integration is totally appropriate.
The most important things to consider are your files’ origins and formats—if you need your colleagues to upload files on a regular basis then try to provide them with a more user-friendly process; but if you just need two computers to talk to each other, or it’s a one-off file that will hardly ever change, then a more technical integration is totally appropriate.
Loading