Skip to content

Commit

Permalink
add event_time page (#6383)
Browse files Browse the repository at this point in the history
  • Loading branch information
mirnawong1 authored Nov 19, 2024
2 parents a282714 + 0e16ca6 commit f420e6c
Show file tree
Hide file tree
Showing 10 changed files with 572 additions and 9 deletions.
6 changes: 3 additions & 3 deletions website/docs/docs/build/incremental-microbatch.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ Refer to [Supported incremental strategies by adapter](/docs/build/incremental-s

Incremental models in dbt are a [materialization](/docs/build/materializations) designed to efficiently update your data warehouse tables by only transforming and loading _new or changed data_ since the last run. Instead of reprocessing an entire dataset every time, incremental models process a smaller number of rows, and then append, update, or replace those rows in the existing table. This can significantly reduce the time and resources required for your data transformations.

Microbatch incremental models make it possible to process transformations on very large time-series datasets with efficiency and resiliency. When dbt runs a microbatch model — whether for the first time, during incremental runs, or in specified backfills — it will split the processing into multiple queries (or "batches"), based on the `event_time` and `batch_size` you configure.
Microbatch incremental models make it possible to process transformations on very large time-series datasets with efficiency and resiliency. When dbt runs a microbatch model — whether for the first time, during incremental runs, or in specified backfills — it will split the processing into multiple queries (or "batches"), based on the [`event_time`](/reference/resource-configs/event-time) and `batch_size` you configure.

Each "batch" corresponds to a single bounded time period (by default, a single day of data). Where other incremental strategies operate only on "old" and "new" data, microbatch models treat every batch as an atomic unit that can be built or replaced on its own. Each batch is independent and <Term id="idempotent" />. This is a powerful abstraction that makes it possible for dbt to run batches separately — in the future, concurrently — and to retry them independently.

Expand Down Expand Up @@ -48,7 +48,7 @@ We run the `sessions` model on October 1, 2024, and then again on October 2. It

<TabItem value="Model definition">

The `event_time` for the `sessions` model is set to `session_start`, which marks the beginning of a user’s session on the website. This setting allows dbt to combine multiple page views (each tracked by their own `page_view_start` timestamps) into a single session. This way, `session_start` differentiates the timing of individual page views from the broader timeframe of the entire user session.
The [`event_time`](/reference/resource-configs/event-time) for the `sessions` model is set to `session_start`, which marks the beginning of a user’s session on the website. This setting allows dbt to combine multiple page views (each tracked by their own `page_view_start` timestamps) into a single session. This way, `session_start` differentiates the timing of individual page views from the broader timeframe of the entire user session.

<File name="models/sessions.sql">

Expand Down Expand Up @@ -162,7 +162,7 @@ Several configurations are relevant to microbatch models, and some are required:

| Config | Type | Description | Default |
|----------|------|---------------|---------|
| `event_time` | Column (required) | The column indicating "at what time did the row occur." Required for your microbatch model and any direct parents that should be filtered. | N/A |
| [`event_time`](/reference/resource-configs/event-time) | Column (required) | The column indicating "at what time did the row occur." Required for your microbatch model and any direct parents that should be filtered. | N/A |
| `begin` | Date (required) | The "beginning of time" for the microbatch model. This is the starting point for any initial or full-refresh builds. For example, a daily-grain microbatch model run on `2024-10-01` with `begin = '2023-10-01` will process 366 batches (it's a leap year!) plus the batch for "today." | N/A |
| `batch_size` | String (required) | The granularity of your batches. Supported values are `hour`, `day`, `month`, and `year` | N/A |
| `lookback` | Integer (optional) | Process X batches prior to the latest bookmark to capture late-arriving records. | `1` |
Expand Down
2 changes: 2 additions & 0 deletions website/docs/docs/dbt-versions/release-notes.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ Release notes are grouped by month for both multi-tenant and virtual private clo
\* The official release date for this new format of release notes is May 15th, 2024. Historical release notes for prior dates may not reflect all available features released earlier this year or their tenancy availability.

## November 2024
- **New**: Use the `event_time` configuration to specify "at what time did the row occur." This configuration is required for [Incremental microbatch](/docs/build/incremental-microbatch) and can be added to ensure you're comparing overlapping times in [Advanced CI's compare changes](/docs/deploy/advanced-ci). Available in dbt Cloud Versionless and dbt Core v1.9 and higher. Refer to [event_time](/reference/resource-configs/event-time) for more information.
- **Fix**: This update improves [dbt Semantic Layer Tableau integration](/docs/cloud-integrations/semantic-layer/tableau) making query parsing more reliable. Some key fixes include:
- Error messages for unsupported joins between saved queries and ALL tables.
- Improved handling of queries when multiple tables are selected in a data source.
Expand All @@ -27,6 +28,7 @@ Release notes are grouped by month for both multi-tenant and virtual private clo
- **Enhancement**: The dbt Semantic Layer supports creating new credentials for users who don't have permissions to create service tokens. In the **Credentials & service tokens** side panel, the **+Add Service Token** option is unavailable for those users who don't have permission. Instead, the side panel displays a message indicating that the user doesn't have permission to create a service token and should contact their administration. Refer to [Set up dbt Semantic Layer](/docs/use-dbt-semantic-layer/setup-sl) for more details.

## October 2024

<Expandable alt_header="Coalesce 2024 announcements">

Documentation for new features and functionality announced at Coalesce 2024:
Expand Down
10 changes: 10 additions & 0 deletions website/docs/docs/deploy/advanced-ci.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,16 @@ dbt reports the comparison differences in:

<Lightbox src="/img/docs/dbt-cloud/example-ci-compare-changes-tab.png" width="85%" title="Example of the Compare tab" />

### Optimizing comparisons

When an [`event_time`](/reference/resource-configs/event-time) column is specified on your model, compare changes can optimize comparisons by using only the overlapping timeframe (meaning the timeframe exists in both the CI and production environment), helping you avoid incorrect row-count changes and return results faster.

This is useful in scenarios like:
- **Subset of data in CI** &mdash; When CI builds only a [subset of data](/best-practices/best-practice-workflows#limit-the-data-processed-when-in-development) (like the most recent 7 days), compare changes would interpret the excluded data as "deleted rows." Configuring `event_time` allows you to avoid this issue by limiting comparisons to the overlapping timeframe, preventing false alerts about data deletions that are just filtered out in CI.
- **Fresher data in CI than in production** &mdash; When your CI job includes fresher data than production (because it has run more recently), compare changes would flag the additional rows as "new" data, even though they’re just fresher data in CI. With `event_time` configured, the comparison only includes the shared timeframe and correctly reflects actual changes in the data.

<Lightbox src="/img/docs/deploy/apples_to_apples.png" width="90%" title="event_time ensures the same time-slice of data is accurately compared between your CI and production environments." />

## About the cached data

After [comparing changes](#compare-changes), dbt Cloud stores a cache of no more than 100 records for each modified model for preview purposes. By caching this data, you can view the examples of changed data without rerunning the comparison against the data warehouse every time (optimizing for lower compute costs). To display the changes, dbt Cloud uses a cached version of a sample of the data records. These data records are queried from the database using the connection configuration (such as user, role, service account, and so on) that's set in the CI job's environment.
Expand Down
78 changes: 76 additions & 2 deletions website/docs/reference/model-configs.md
Original file line number Diff line number Diff line change
Expand Up @@ -104,6 +104,8 @@ models:

<File name='dbt_project.yml'>

<VersionBlock lastVersion="1.8">

```yaml
models:
[<resource-path>](/reference/resource-configs/resource-path):
Expand All @@ -121,7 +123,29 @@ models:
[+](/reference/resource-configs/plus-prefix)[contract](/reference/resource-configs/contract): {<dictionary>}

```
</VersionBlock>

<VersionBlock firstVersion="1.9">

```yaml
models:
[<resource-path>](/reference/resource-configs/resource-path):
[+](/reference/resource-configs/plus-prefix)[enabled](/reference/resource-configs/enabled): true | false
[+](/reference/resource-configs/plus-prefix)[tags](/reference/resource-configs/tags): <string> | [<string>]
[+](/reference/resource-configs/plus-prefix)[pre-hook](/reference/resource-configs/pre-hook-post-hook): <sql-statement> | [<sql-statement>]
[+](/reference/resource-configs/plus-prefix)[post-hook](/reference/resource-configs/pre-hook-post-hook): <sql-statement> | [<sql-statement>]
[+](/reference/resource-configs/plus-prefix)[database](/reference/resource-configs/database): <string>
[+](/reference/resource-configs/plus-prefix)[schema](/reference/resource-properties/schema): <string>
[+](/reference/resource-configs/plus-prefix)[alias](/reference/resource-configs/alias): <string>
[+](/reference/resource-configs/plus-prefix)[persist_docs](/reference/resource-configs/persist_docs): <dict>
[+](/reference/resource-configs/plus-prefix)[full_refresh](/reference/resource-configs/full_refresh): <boolean>
[+](/reference/resource-configs/plus-prefix)[meta](/reference/resource-configs/meta): {<dictionary>}
[+](/reference/resource-configs/plus-prefix)[grants](/reference/resource-configs/grants): {<dictionary>}
[+](/reference/resource-configs/plus-prefix)[contract](/reference/resource-configs/contract): {<dictionary>}
[+](/reference/resource-configs/plus-prefix)[event_time](/reference/resource-configs/event-time): my_time_field

```
</VersionBlock>
</File>

</TabItem>
Expand All @@ -131,6 +155,8 @@ models:

<File name='models/properties.yml'>

<VersionBlock lastVersion="1.8">

```yaml
version: 2

Expand All @@ -150,17 +176,63 @@ models:
[grants](/reference/resource-configs/grants): {<dictionary>}
[contract](/reference/resource-configs/contract): {<dictionary>}
```
</VersionBlock>
</File>
<VersionBlock firstVersion="1.9">
</TabItem>
```yaml
version: 2

models:
- name: [<model-name>]
config:
[enabled](/reference/resource-configs/enabled): true | false
[tags](/reference/resource-configs/tags): <string> | [<string>]
[pre_hook](/reference/resource-configs/pre-hook-post-hook): <sql-statement> | [<sql-statement>]
[post_hook](/reference/resource-configs/pre-hook-post-hook): <sql-statement> | [<sql-statement>]
[database](/reference/resource-configs/database): <string>
[schema](/reference/resource-properties/schema): <string>
[alias](/reference/resource-configs/alias): <string>
[persist_docs](/reference/resource-configs/persist_docs): <dict>
[full_refresh](/reference/resource-configs/full_refresh): <boolean>
[meta](/reference/resource-configs/meta): {<dictionary>}
[grants](/reference/resource-configs/grants): {<dictionary>}
[contract](/reference/resource-configs/contract): {<dictionary>}
[event_time](/reference/resource-configs/event-time): my_time_field
```
</VersionBlock>
</File>
</TabItem>
<TabItem value="config">
<File name='models/<model_name>.sql'>
<VersionBlock lastVersion="1.8">
```jinja

{{ config(
[enabled](/reference/resource-configs/enabled)=true | false,
[tags](/reference/resource-configs/tags)="<string>" | ["<string>"],
[pre_hook](/reference/resource-configs/pre-hook-post-hook)="<sql-statement>" | ["<sql-statement>"],
[post_hook](/reference/resource-configs/pre-hook-post-hook)="<sql-statement>" | ["<sql-statement>"],
[database](/reference/resource-configs/database)="<string>",
[schema](/reference/resource-properties/schema)="<string>",
[alias](/reference/resource-configs/alias)="<string>",
[persist_docs](/reference/resource-configs/persist_docs)={<dict>},
[meta](/reference/resource-configs/meta)={<dict>},
[grants](/reference/resource-configs/grants)={<dict>},
[contract](/reference/resource-configs/contract)={<dictionary>}
) }}

```
</VersionBlock>

<VersionBlock firstVersion="1.9">

```jinja
{{ config(
Expand All @@ -175,9 +247,11 @@ models:
[meta](/reference/resource-configs/meta)={<dict>},
[grants](/reference/resource-configs/grants)={<dict>},
[contract](/reference/resource-configs/contract)={<dictionary>}
[event_time](/reference/resource-configs/event-time): my_time_field
) }}
```
</VersionBlock>

</File>

Expand Down
Loading

0 comments on commit f420e6c

Please sign in to comment.