Skip to content

Commit

Permalink
Merge branch 'current' into revert-6466-revert-6455-add-microbatch-flag
Browse files Browse the repository at this point in the history
  • Loading branch information
mirnawong1 authored Nov 22, 2024
2 parents e78e968 + 23e89a4 commit 69c9d18
Show file tree
Hide file tree
Showing 88 changed files with 1,663 additions and 982 deletions.
49 changes: 0 additions & 49 deletions .github/ISSUE_TEMPLATE/internal-orch-team.yml

This file was deleted.

2 changes: 1 addition & 1 deletion website/blog/2024-10-04-hybrid-mesh.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@ This allows dbt Cloud to know about the contents and metadata of your project, w
- Note: If you have [environment variables](/docs/build/environment-variables) in your project, dbt Cloud environment variables must be prefixed with `DBT_ `(including `DBT_ENV_CUSTOM_ENV_` or `DBT_ENV_SECRET`). Follow the instructions in [this guide](https://docs.getdbt.com/guides/core-to-cloud-1?step=8#environment-variables) to convert them for dbt Cloud.
- Each upstream Core project has to have a production [environment](/docs/dbt-cloud-environments) in dbt Cloud. You need to configure credentials and environment variables in dbt Cloud just so that it will resolve relation names to the same places where your dbt Core workflows are deploying those models.
- Set up a [merge job](/docs/deploy/merge-jobs) in a production environment to run `dbt parse`. This will enable connecting downstream projects in dbt Mesh by producing the necessary [artifacts](/reference/artifacts/dbt-artifacts) for cross-project referencing.
- Note: Set up a regular job to run `dbt build` instead of using a merge job for `dbt parse`, and centralize your dbt orchestration by moving production runs to dbt Cloud. Check out [this guide](/guides/core-to-cloud-1?step=9) for more details on converting your production runs to dbt Cloud.
- Optional: Set up a regular job to run `dbt build` instead of using a merge job for `dbt parse`, and centralize your dbt orchestration by moving production runs to dbt Cloud. Check out [this guide](/guides/core-to-cloud-1?step=9) for more details on converting your production runs to dbt Cloud.
- Optional: Set up a regular job (for example, daily) to run `source freshness` and `docs generate`. This will hydrate dbt Cloud with additional metadata and enable features in [dbt Explorer](/docs/collaborate/explore-projects) that will benefit both teams, including [Column-level lineage](/docs/collaborate/column-level-lineage).

### Step 3: Create and connect your downstream projects to your Core project using dbt Mesh
Expand Down
2 changes: 1 addition & 1 deletion website/docs/best-practices/how-we-structure/2-staging.md
Original file line number Diff line number Diff line change
Expand Up @@ -223,4 +223,4 @@ This is a welcome change for many of us who have become used to applying the sam

:::info Development flow versus DAG order.
This guide follows the order of the DAG, so we can get a holistic picture of how these three primary layers build on each other towards fueling impactful data products. It’s important to note though that developing models does not typically move linearly through the DAG. Most commonly, we should start by mocking out a design in a spreadsheet so we know we’re aligned with our stakeholders on output goals. Then, we’ll want to write the SQL to generate that output, and identify what tables are involved. Once we have our logic and dependencies, we’ll make sure we’ve staged all the necessary atomic pieces into the project, then bring them together based on the logic we wrote to generate our mart. Finally, with a functioning model flowing in dbt, we can start refactoring and optimizing that mart. By splitting the logic up and moving parts back upstream into intermediate models, we ensure all of our models are clean and readable, the story of our DAG is clear, and we have more surface area to apply thorough testing.
:::info
:::
4 changes: 3 additions & 1 deletion website/docs/docs/build/data-tests.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,9 @@ having total_amount < 0

</File>

The name of this test is the name of the file: `assert_total_payment_amount_is_positive`.
The name of this test is the name of the file: `assert_total_payment_amount_is_positive`.

Note, you won't need to include semicolons (;) at the end of the SQL statement in your singular test files as it can cause your test to fail.

To add a description to a singular test in your project, add a `.yml` file to your `tests` directory, for example, `tests/schema.yml` with the following content:

Expand Down
9 changes: 6 additions & 3 deletions website/docs/docs/build/environment-variables.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ There are four levels of environment variables:

To set environment variables at the project and environment level, click **Deploy** in the top left, then select **Environments**. Click **Environments Variables** to add and update your environment variables.

<Lightbox src="/img/docs/dbt-cloud/using-dbt-cloud/Environment Variables/navigate-to-env-vars.gif" title="Environment variables tab"/>
<Lightbox src="/img/docs/dbt-cloud/using-dbt-cloud/Environment Variables/navigate-to-env-vars.png" title="Environment variables tab"/>



Expand Down Expand Up @@ -62,7 +62,10 @@ Every job runs in a specific, deployment environment, and by default, a job will
**Overriding environment variables at the personal level**


You can also set a personal value override for an environment variable when you develop in the dbt-integrated developer environment (IDE). By default, dbt Cloud uses environment variable values set in the project's development environment. To see and override these values, click the gear icon in the top right. Under "Your Profile," click **Credentials** and select your project. Click **Edit** and make any changes in "Environment Variables."
You can also set a personal value override for an environment variable when you develop in the dbt-integrated developer environment (IDE). By default, dbt Cloud uses environment variable values set in the project's development environment. To see and override these values, from dbt Cloud:
- Click on your account name in the left side menu and select **Account settings**.
- Under the **Your profile** section, click **Credentials** and then select your project.
- Scroll to the **Environment variables** section and click **Edit** to make the necessary changes.

<Lightbox src="/img/docs/dbt-cloud/using-dbt-cloud/Environment Variables/personal-override.gif" title="Navigating to environment variables personal override settings"/>

Expand Down Expand Up @@ -115,7 +118,7 @@ The following environment variables are set automatically:

- `DBT_ENV` &mdash; This key is reserved for the dbt Cloud application and will always resolve to 'prod'. For deployment runs only.
- `DBT_CLOUD_ENVIRONMENT_NAME` &mdash; The name of the dbt Cloud environment in which `dbt` is running.
- `DBT_CLOUD_ENVIRONMENT_TYPE` &mdash; The type of dbt Cloud environment in which `dbt` is running. The valid values are `development` or `deployment`.
- `DBT_CLOUD_ENVIRONMENT_TYPE` &mdash; The type of dbt Cloud environment in which `dbt` is running. The valid values are `dev`, `staging`, or `prod`. It can be unset, so use a default like `{{env_var('DBT_CLOUD_ENVIRONMENT_TYPE', '')}}`.

#### Run details

Expand Down
6 changes: 3 additions & 3 deletions website/docs/docs/build/incremental-microbatch.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ Refer to [Supported incremental strategies by adapter](/docs/build/incremental-s

Incremental models in dbt are a [materialization](/docs/build/materializations) designed to efficiently update your data warehouse tables by only transforming and loading _new or changed data_ since the last run. Instead of reprocessing an entire dataset every time, incremental models process a smaller number of rows, and then append, update, or replace those rows in the existing table. This can significantly reduce the time and resources required for your data transformations.

Microbatch incremental models make it possible to process transformations on very large time-series datasets with efficiency and resiliency. When dbt runs a microbatch model — whether for the first time, during incremental runs, or in specified backfills — it will split the processing into multiple queries (or "batches"), based on the `event_time` and `batch_size` you configure.
Microbatch incremental models make it possible to process transformations on very large time-series datasets with efficiency and resiliency. When dbt runs a microbatch model — whether for the first time, during incremental runs, or in specified backfills — it will split the processing into multiple queries (or "batches"), based on the [`event_time`](/reference/resource-configs/event-time) and `batch_size` you configure.

Each "batch" corresponds to a single bounded time period (by default, a single day of data). Where other incremental strategies operate only on "old" and "new" data, microbatch models treat every batch as an atomic unit that can be built or replaced on its own. Each batch is independent and <Term id="idempotent" />. This is a powerful abstraction that makes it possible for dbt to run batches separately — in the future, concurrently — and to retry them independently.

Expand Down Expand Up @@ -50,7 +50,7 @@ We run the `sessions` model on October 1, 2024, and then again on October 2. It

<TabItem value="Model definition">

The `event_time` for the `sessions` model is set to `session_start`, which marks the beginning of a user’s session on the website. This setting allows dbt to combine multiple page views (each tracked by their own `page_view_start` timestamps) into a single session. This way, `session_start` differentiates the timing of individual page views from the broader timeframe of the entire user session.
The [`event_time`](/reference/resource-configs/event-time) for the `sessions` model is set to `session_start`, which marks the beginning of a user’s session on the website. This setting allows dbt to combine multiple page views (each tracked by their own `page_view_start` timestamps) into a single session. This way, `session_start` differentiates the timing of individual page views from the broader timeframe of the entire user session.

<File name="models/sessions.sql">

Expand Down Expand Up @@ -164,7 +164,7 @@ Several configurations are relevant to microbatch models, and some are required:

| Config | Type | Description | Default |
|----------|------|---------------|---------|
| `event_time` | Column (required) | The column indicating "at what time did the row occur." Required for your microbatch model and any direct parents that should be filtered. | N/A |
| [`event_time`](/reference/resource-configs/event-time) | Column (required) | The column indicating "at what time did the row occur." Required for your microbatch model and any direct parents that should be filtered. | N/A |
| `begin` | Date (required) | The "beginning of time" for the microbatch model. This is the starting point for any initial or full-refresh builds. For example, a daily-grain microbatch model run on `2024-10-01` with `begin = '2023-10-01` will process 366 batches (it's a leap year!) plus the batch for "today." | N/A |
| `batch_size` | String (required) | The granularity of your batches. Supported values are `hour`, `day`, `month`, and `year` | N/A |
| `lookback` | Integer (optional) | Process X batches prior to the latest bookmark to capture late-arriving records. | `1` |
Expand Down
6 changes: 6 additions & 0 deletions website/docs/docs/build/incremental-strategy.md
Original file line number Diff line number Diff line change
Expand Up @@ -241,6 +241,12 @@ select * from {{ ref("some_model") }}

### Custom strategies

:::note limited support

Custom strategies are not currently suppored on the BigQuery and Spark adapters.

:::

Starting from dbt version 1.2 and onwards, users have an easier alternative to [creating an entirely new materialization](/guides/create-new-materializations). They define and use their own "custom" incremental strategies by:

1. Defining a macro named `get_incremental_STRATEGY_sql`. Note that `STRATEGY` is a placeholder and you should replace it with the name of your custom incremental strategy.
Expand Down
2 changes: 1 addition & 1 deletion website/docs/docs/build/materializations.md
Original file line number Diff line number Diff line change
Expand Up @@ -111,7 +111,7 @@ When using the `table` materialization, your model is rebuilt as a <Term id="tab

### Materialized View

The `materialized view` materialization allows the creation and maintenance of materialized views in the target database.
The `materialized_view` materialization allows the creation and maintenance of materialized views in the target database.
Materialized views are a combination of a view and a table, and serve use cases similar to incremental models.

* **Pros:**
Expand Down
Loading

0 comments on commit 69c9d18

Please sign in to comment.