Merge branch 'current' into revert-6466-revert-6455-add-microbatch-flag

dbt-labs · Nov 22, 2024 · 69c9d18 · 69c9d18
2 parents e78e968 + 23e89a4
commit 69c9d18
Show file tree

Hide file tree

Showing 88 changed files with 1,663 additions and 982 deletions.
diff --git a/.github/ISSUE_TEMPLATE/internal-orch-team.yml b/.github/ISSUE_TEMPLATE/internal-orch-team.yml
diff --git a/website/blog/2024-10-04-hybrid-mesh.md b/website/blog/2024-10-04-hybrid-mesh.md
@@ -59,7 +59,7 @@ This allows dbt Cloud to know about the contents and metadata of your project, w
   - Note: If you have [environment variables](/docs/build/environment-variables) in your project, dbt Cloud environment variables must be prefixed with `DBT_ `(including `DBT_ENV_CUSTOM_ENV_` or `DBT_ENV_SECRET`). Follow the instructions in [this guide](https://docs.getdbt.com/guides/core-to-cloud-1?step=8#environment-variables) to convert them for dbt Cloud.
 - Each upstream Core project has to have a production [environment](/docs/dbt-cloud-environments) in dbt Cloud. You need to configure credentials and environment variables in dbt Cloud just so that it will resolve relation names to the same places where your dbt Core workflows are deploying those models.
 - Set up a [merge job](/docs/deploy/merge-jobs) in a production environment to run `dbt parse`. This will enable connecting downstream projects in dbt Mesh by producing the necessary [artifacts](/reference/artifacts/dbt-artifacts) for cross-project referencing.
-  - Note: Set up a regular job to run `dbt build` instead of using a merge job for `dbt parse`, and centralize your dbt orchestration by moving production runs to dbt Cloud. Check out [this guide](/guides/core-to-cloud-1?step=9) for more details on converting your production runs to dbt Cloud.
+  - Optional: Set up a regular job to run `dbt build` instead of using a merge job for `dbt parse`, and centralize your dbt orchestration by moving production runs to dbt Cloud. Check out [this guide](/guides/core-to-cloud-1?step=9) for more details on converting your production runs to dbt Cloud.
 - Optional: Set up a regular job (for example, daily) to run `source freshness` and `docs generate`. This will hydrate dbt Cloud with additional metadata and enable features in [dbt Explorer](/docs/collaborate/explore-projects) that will benefit both teams, including [Column-level lineage](/docs/collaborate/column-level-lineage).
 
 ### Step 3: Create and connect your downstream projects to your Core project using dbt Mesh

diff --git a/website/docs/best-practices/how-we-structure/2-staging.md b/website/docs/best-practices/how-we-structure/2-staging.md
@@ -223,4 +223,4 @@ This is a welcome change for many of us who have become used to applying the sam
 
 :::info Development flow versus DAG order.
 This guide follows the order of the DAG, so we can get a holistic picture of how these three primary layers build on each other towards fueling impactful data products. It’s important to note though that developing models does not typically move linearly through the DAG. Most commonly, we should start by mocking out a design in a spreadsheet so we know we’re aligned with our stakeholders on output goals. Then, we’ll want to write the SQL to generate that output, and identify what tables are involved. Once we have our logic and dependencies, we’ll make sure we’ve staged all the necessary atomic pieces into the project, then bring them together based on the logic we wrote to generate our mart. Finally, with a functioning model flowing in dbt, we can start refactoring and optimizing that mart. By splitting the logic up and moving parts back upstream into intermediate models, we ensure all of our models are clean and readable, the story of our DAG is clear, and we have more surface area to apply thorough testing.
-:::info
+:::
diff --git a/website/docs/docs/build/data-tests.md b/website/docs/docs/build/data-tests.md
@@ -66,7 +66,9 @@ having total_amount < 0
 
 </File>
 
-The name of this test is the name of the file: `assert_total_payment_amount_is_positive`.
+The name of this test is the name of the file: `assert_total_payment_amount_is_positive`. 
+
+Note, you won't need to include semicolons (;) at the end of the SQL statement in your singular test files as it can cause your test to fail.
 
 To add a description to a singular test in your project, add a `.yml` file to your `tests` directory, for example, `tests/schema.yml` with the following content:
 

diff --git a/website/docs/docs/build/environment-variables.md b/website/docs/docs/build/environment-variables.md
@@ -32,7 +32,7 @@ There are four levels of environment variables:
 
 To set environment variables at the project and environment level, click **Deploy** in the top left, then select **Environments**.  Click **Environments Variables** to add and update your environment variables.
 
-<Lightbox src="/img/docs/dbt-cloud/using-dbt-cloud/Environment Variables/navigate-to-env-vars.gif" title="Environment variables tab"/>
+<Lightbox src="/img/docs/dbt-cloud/using-dbt-cloud/Environment Variables/navigate-to-env-vars.png" title="Environment variables tab"/>
 
 
 
@@ -62,7 +62,10 @@ Every job runs in a specific, deployment environment, and by default, a job will
 **Overriding environment variables at the personal level**
 
 
-You can also set a personal value override for an environment variable when you develop in the dbt-integrated developer environment (IDE). By default, dbt Cloud uses environment variable values set in the project's development environment. To see and override these values, click the gear icon in the top right. Under "Your Profile," click **Credentials** and select your project. Click **Edit** and make any changes in "Environment Variables."
+You can also set a personal value override for an environment variable when you develop in the dbt-integrated developer environment (IDE). By default, dbt Cloud uses environment variable values set in the project's development environment. To see and override these values, from dbt Cloud:
+- Click on your account name in the left side menu and select **Account settings**. 
+- Under the **Your profile** section, click **Credentials** and then select your project. 
+- Scroll to the **Environment variables** section and click **Edit** to make the necessary changes.
 
 <Lightbox src="/img/docs/dbt-cloud/using-dbt-cloud/Environment Variables/personal-override.gif" title="Navigating to environment variables personal override settings"/>
 
@@ -115,7 +118,7 @@ The following environment variables are set automatically:
 
 - `DBT_ENV` &mdash; This key is reserved for the dbt Cloud application and will always resolve to 'prod'. For deployment runs only.
 - `DBT_CLOUD_ENVIRONMENT_NAME` &mdash; The name of the dbt Cloud environment in which `dbt` is running. 
-- `DBT_CLOUD_ENVIRONMENT_TYPE` &mdash; The type of dbt Cloud environment in which `dbt` is running. The valid values are `development` or `deployment`.
+- `DBT_CLOUD_ENVIRONMENT_TYPE` &mdash; The type of dbt Cloud environment in which `dbt` is running. The valid values are `dev`, `staging`, or `prod`. It can be unset, so use a default like `{{env_var('DBT_CLOUD_ENVIRONMENT_TYPE', '')}}`.
 
 #### Run details
 

diff --git a/website/docs/docs/build/incremental-microbatch.md b/website/docs/docs/build/incremental-microbatch.md
@@ -22,7 +22,7 @@ Refer to [Supported incremental strategies by adapter](/docs/build/incremental-s
 
 Incremental models in dbt are a [materialization](/docs/build/materializations) designed to efficiently update your data warehouse tables by only transforming and loading _new or changed data_ since the last run. Instead of reprocessing an entire dataset every time, incremental models process a smaller number of rows, and then append, update, or replace those rows in the existing table. This can significantly reduce the time and resources required for your data transformations.
 
-Microbatch incremental models make it possible to process transformations on very large time-series datasets with efficiency and resiliency. When dbt runs a microbatch model — whether for the first time, during incremental runs, or in specified backfills — it will split the processing into multiple queries (or "batches"), based on the `event_time` and `batch_size` you configure.
+Microbatch incremental models make it possible to process transformations on very large time-series datasets with efficiency and resiliency. When dbt runs a microbatch model — whether for the first time, during incremental runs, or in specified backfills — it will split the processing into multiple queries (or "batches"), based on the [`event_time`](/reference/resource-configs/event-time) and `batch_size` you configure.
 
 Each "batch" corresponds to a single bounded time period (by default, a single day of data). Where other incremental strategies operate only on "old" and "new" data, microbatch models treat every batch as an atomic unit that can be built or replaced on its own. Each batch is independent and <Term id="idempotent" />. This is a powerful abstraction that makes it possible for dbt to run batches separately — in the future, concurrently — and to retry them independently.
 
@@ -50,7 +50,7 @@ We run the `sessions` model on October 1, 2024, and then again on October 2. It
 
 <TabItem value="Model definition">
 
-The `event_time` for the `sessions` model is set to `session_start`, which marks the beginning of a user’s session on the website. This setting allows dbt to combine multiple page views (each tracked by their own `page_view_start` timestamps) into a single session. This way, `session_start` differentiates the timing of individual page views from the broader timeframe of the entire user session.
+The [`event_time`](/reference/resource-configs/event-time) for the `sessions` model is set to `session_start`, which marks the beginning of a user’s session on the website. This setting allows dbt to combine multiple page views (each tracked by their own `page_view_start` timestamps) into a single session. This way, `session_start` differentiates the timing of individual page views from the broader timeframe of the entire user session.
 
 <File name="models/sessions.sql">
 
@@ -164,7 +164,7 @@ Several configurations are relevant to microbatch models, and some are required:
 
 | Config   | Type | Description   | Default |
 |----------|------|---------------|---------|
-| `event_time` | Column  (required)   | The column indicating "at what time did the row occur." Required for your microbatch model and any direct parents that should be filtered.          | N/A     |
+| [`event_time`](/reference/resource-configs/event-time) | Column  (required)   | The column indicating "at what time did the row occur." Required for your microbatch model and any direct parents that should be filtered.          | N/A     |
 | `begin`      | Date (required)   | The "beginning of time" for the microbatch model. This is the starting point for any initial or full-refresh builds. For example, a daily-grain microbatch model run on `2024-10-01` with `begin = '2023-10-01` will process 366 batches (it's a leap year!) plus the batch for "today."        | N/A     |
 | `batch_size` | String (required)  | The granularity of your batches. Supported values are `hour`, `day`, `month`, and `year`             | N/A     |
 | `lookback`   | Integer (optional) | Process X batches prior to the latest bookmark to capture late-arriving records.                                         | `1`     |

diff --git a/website/docs/docs/build/incremental-strategy.md b/website/docs/docs/build/incremental-strategy.md
@@ -241,6 +241,12 @@ select * from {{ ref("some_model") }}
 
 ### Custom strategies
 
+:::note limited support
+
+Custom strategies are not currently suppored on the BigQuery and Spark adapters.
+
+:::
+
 Starting from dbt version 1.2 and onwards, users have an easier alternative to [creating an entirely new materialization](/guides/create-new-materializations). They define and use their own "custom" incremental strategies by:
 
 1. Defining a macro named `get_incremental_STRATEGY_sql`. Note that `STRATEGY` is a placeholder and you should replace it with the name of your custom incremental strategy.

diff --git a/website/docs/docs/build/materializations.md b/website/docs/docs/build/materializations.md
@@ -111,7 +111,7 @@ When using the `table` materialization, your model is rebuilt as a <Term id="tab
 
 ### Materialized View
 
-The `materialized view` materialization allows the creation and maintenance of materialized views in the target database.
+The `materialized_view` materialization allows the creation and maintenance of materialized views in the target database.
 Materialized views are a combination of a view and a table, and serve use cases similar to incremental models.
 
 * **Pros:**
-Original file line number
+Diff line change
@@ Expand Up @@
     :::info Development flow versus DAG order.
     This guide follows the order of the DAG, so we can get a holistic picture of how these three primary layers build on each other towards fueling impactful data products. It’s important to note though that developing models does not typically move linearly through the DAG. Most commonly, we should start by mocking out a design in a spreadsheet so we know we’re aligned with our stakeholders on output goals. Then, we’ll want to write the SQL to generate that output, and identify what tables are involved. Once we have our logic and dependencies, we’ll make sure we’ve staged all the necessary atomic pieces into the project, then bring them together based on the logic we wrote to generate our mart. Finally, with a functioning model flowing in dbt, we can start refactoring and optimizing that mart. By splitting the logic up and moving parts back upstream into intermediate models, we ensure all of our models are clean and readable, the story of our DAG is clear, and we have more surface area to apply thorough testing.
-    :::info
+    :::