Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add event_time page #6383

Merged
merged 71 commits into from
Nov 19, 2024
Merged
Show file tree
Hide file tree
Changes from 22 commits
Commits
Show all changes
71 commits
Select commit Hold shift + click to select a range
46231b0
add event_time page
mirnawong1 Oct 30, 2024
33a66a8
Merge branch 'current' into add-event-time
mirnawong1 Oct 30, 2024
6ebf5eb
update source/snapshots
mirnawong1 Oct 30, 2024
501d948
add to model
mirnawong1 Oct 30, 2024
4f2c6dc
add img and rn
mirnawong1 Oct 30, 2024
57ee608
fix link
mirnawong1 Oct 30, 2024
451fc46
Merge branch 'add-event-time' into update-sources-snapshots
mirnawong1 Oct 30, 2024
3354c9d
Update website/docs/reference/resource-configs/event-time.md
mirnawong1 Oct 30, 2024
603c21c
fix link again
mirnawong1 Oct 30, 2024
0c68f62
Merge branch 'add-event-time' into update-sources-snapshots
mirnawong1 Oct 30, 2024
488460c
Update event-time.md
mirnawong1 Oct 30, 2024
2fb62c5
Update release-notes.md
mirnawong1 Oct 30, 2024
69ba339
Update event-time.md
mirnawong1 Oct 30, 2024
1ebbbdb
Update advanced-ci.md
mirnawong1 Oct 30, 2024
2b713ee
Update advanced-ci.md
mirnawong1 Oct 30, 2024
c789601
Update advanced-ci.md
mirnawong1 Oct 30, 2024
5708119
Update website/docs/docs/deploy/advanced-ci.md
mirnawong1 Nov 4, 2024
903c5d1
Merge branch 'current' into add-event-time
mirnawong1 Nov 4, 2024
2dd873a
Update website/docs/docs/deploy/advanced-ci.md
mirnawong1 Nov 4, 2024
b7a07be
Update website/docs/reference/resource-configs/event-time.md
mirnawong1 Nov 4, 2024
12cdffa
Merge branch 'add-event-time' into update-sources-snapshots
mirnawong1 Nov 4, 2024
016c555
Update event-time.md
mirnawong1 Nov 4, 2024
9c49664
Merge branch 'add-event-time' into update-sources-snapshots
mirnawong1 Nov 4, 2024
2910914
Merge branch 'current' into add-event-time
mirnawong1 Nov 4, 2024
5ba059e
Merge branch 'add-event-time' into update-sources-snapshots
mirnawong1 Nov 4, 2024
79128fe
Merge branch 'current' into add-event-time
mirnawong1 Nov 4, 2024
cc34575
Merge branch 'add-event-time' into update-sources-snapshots
mirnawong1 Nov 5, 2024
551821d
update img
mirnawong1 Nov 6, 2024
735ae38
fix img size
mirnawong1 Nov 6, 2024
ac7616b
Merge branch 'current' into add-event-time
mirnawong1 Nov 6, 2024
bdc037e
Merge branch 'add-event-time' into update-sources-snapshots
mirnawong1 Nov 6, 2024
0363051
Update website/docs/reference/resource-configs/event-time.md
mirnawong1 Nov 6, 2024
d693c9b
Update website/docs/reference/resource-configs/event-time.md
mirnawong1 Nov 6, 2024
809f2a7
Update website/docs/reference/resource-configs/event-time.md
mirnawong1 Nov 6, 2024
81e2318
Update website/docs/reference/resource-configs/event-time.md
mirnawong1 Nov 6, 2024
14632b3
Merge branch 'current' into add-event-time
mirnawong1 Nov 6, 2024
aad3987
Merge branch 'add-event-time' into update-sources-snapshots
mirnawong1 Nov 6, 2024
e92c9db
Update website/docs/reference/source-configs.md
mirnawong1 Nov 6, 2024
2b98454
Merge branch 'current' into add-event-time
mirnawong1 Nov 11, 2024
3ad1bb6
Merge branch 'add-event-time' into update-sources-snapshots
mirnawong1 Nov 11, 2024
3da521f
Update website/docs/docs/deploy/advanced-ci.md
mirnawong1 Nov 11, 2024
edd1123
add scenarios
mirnawong1 Nov 11, 2024
52c0db9
add scenarios
mirnawong1 Nov 11, 2024
f461ffa
fold in grace's feedback
mirnawong1 Nov 11, 2024
a4f3b23
Merge branch 'current' into add-event-time
mirnawong1 Nov 11, 2024
a1c8166
Merge branch 'add-event-time' of github.com:dbt-labs/docs.getdbt.com …
mirnawong1 Nov 11, 2024
f1969f4
remove redundant
mirnawong1 Nov 11, 2024
c170a3b
Merge branch 'current' into add-event-time
mirnawong1 Nov 14, 2024
d6a309b
Merge branch 'add-event-time' into update-sources-snapshots
mirnawong1 Nov 14, 2024
3a8dee5
Update website/docs/reference/resource-configs/event-time.md
mirnawong1 Nov 14, 2024
5851c2b
Merge branch 'add-event-time' into update-sources-snapshots
mirnawong1 Nov 14, 2024
0656327
Update release-notes.md
mirnawong1 Nov 14, 2024
57679b2
Merge branch 'current' into add-event-time
mirnawong1 Nov 14, 2024
556249a
Update website/docs/docs/dbt-versions/release-notes.md
mirnawong1 Nov 15, 2024
0bd8584
Update website/docs/docs/deploy/advanced-ci.md
mirnawong1 Nov 15, 2024
bd233ad
Update website/docs/docs/deploy/advanced-ci.md
mirnawong1 Nov 15, 2024
b9e4be0
Update website/docs/reference/resource-configs/event-time.md
mirnawong1 Nov 15, 2024
ff3416a
Update website/docs/docs/deploy/advanced-ci.md
mirnawong1 Nov 15, 2024
613f1ef
Update website/docs/docs/deploy/advanced-ci.md
mirnawong1 Nov 15, 2024
85f181d
Update website/docs/reference/resource-configs/event-time.md
mirnawong1 Nov 15, 2024
4644684
Merge branch 'current' into add-event-time
mirnawong1 Nov 18, 2024
8cf073b
Merge branch 'add-event-time' into update-sources-snapshots
mirnawong1 Nov 18, 2024
46763d8
Merge branch 'current' into add-event-time
mirnawong1 Nov 18, 2024
d2bf5af
Merge branch 'add-event-time' into update-sources-snapshots
mirnawong1 Nov 18, 2024
6015dee
Update website/docs/docs/deploy/advanced-ci.md
mirnawong1 Nov 18, 2024
4250c9d
Update website/docs/docs/deploy/advanced-ci.md
mirnawong1 Nov 18, 2024
76b12e9
update header adn link
mirnawong1 Nov 19, 2024
de8f752
Merge branch 'current' into add-event-time
mirnawong1 Nov 19, 2024
4b28bbc
Merge branch 'add-event-time' into update-sources-snapshots
mirnawong1 Nov 19, 2024
337248b
add event _time to sources/snapshots/models/seeds (#6384)
mirnawong1 Nov 19, 2024
0e16ca6
Update incremental-microbatch.md
mirnawong1 Nov 19, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions website/docs/docs/build/incremental-microbatch.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ Refer to [Supported incremental strategies by adapter](/docs/build/incremental-s

Incremental models in dbt are a [materialization](/docs/build/materializations) designed to efficiently update your data warehouse tables by only transforming and loading _new or changed data_ since the last run. Instead of reprocessing an entire dataset every time, incremental models process a smaller number of rows, and then append, update, or replace those rows in the existing table. This can significantly reduce the time and resources required for your data transformations.

Microbatch incremental models make it possible to process transformations on very large time-series datasets with efficiency and resiliency. When dbt runs a microbatch model — whether for the first time, during incremental runs, or in specified backfills — it will split the processing into multiple queries (or "batches"), based on the `event_time` and `batch_size` you configure.
Microbatch incremental models make it possible to process transformations on very large time-series datasets with efficiency and resiliency. When dbt runs a microbatch model — whether for the first time, during incremental runs, or in specified backfills — it will split the processing into multiple queries (or "batches"), based on the [`event_time`](/reference/resource-configs/event-time) and `batch_size` you configure.

Each "batch" corresponds to a single bounded time period (by default, a single day of data). Where other incremental strategies operate only on "old" and "new" data, microbatch models treat every batch as an atomic unit that can be built or replaced on its own. Each batch is independent and <Term id="idempotent" />. This is a powerful abstraction that makes it possible for dbt to run batches separately — in the future, concurrently — and to retry them independently.

Expand Down Expand Up @@ -162,7 +162,7 @@ Several configurations are relevant to microbatch models, and some are required:

| Config | Type | Description | Default |
|----------|------|---------------|---------|
| `event_time` | Column (required) | The column indicating "at what time did the row occur." Required for your microbatch model and any direct parents that should be filtered. | N/A |
| [`event_time`](/reference/resource-configs/event-time) | Column (required) | The column indicating "at what time did the row occur." Required for your microbatch model and any direct parents that should be filtered. | N/A |
| `begin` | Date (required) | The "beginning of time" for the microbatch model. This is the starting point for any initial or full-refresh builds. For example, a daily-grain microbatch model run on `2024-10-01` with `begin = '2023-10-01` will process 366 batches (it's a leap year!) plus the batch for "today." | N/A |
| `batch_size` | String (required) | The granularity of your batches. Supported values are `hour`, `day`, `month`, and `year` | N/A |
| `lookback` | Integer (optional) | Process X batches prior to the latest bookmark to capture late-arriving records. | `1` |
Expand Down
3 changes: 3 additions & 0 deletions website/docs/docs/dbt-versions/release-notes.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,9 @@
- Better error messaging for queries that can't be parsed correctly.

## October 2024

- **New**: Use the `event_time` configuration to specify when an event occurred. This configuration is required for [Incremental microbatch](/docs/build/incremental-microbatch) and can be added to ensure you're comparing overlapping times in [Advanced CI's compare changes](/docs/deploy/advanced-ci). Available in dbt Cloud Versionless and dbt Core v1.9 and higher. Refer to [event_time](/reference/resource-configs/event-time) for more information.

Check warning on line 30 in website/docs/docs/dbt-versions/release-notes.md

View workflow job for this annotation

GitHub Actions / vale

[vale] website/docs/docs/dbt-versions/release-notes.md#L30

[custom.Typos] Oops there's a typo -- did you really mean 'event_time'?
Raw output
{"message": "[custom.Typos] Oops there's a typo -- did you really mean 'event_time'? ", "location": {"path": "website/docs/docs/dbt-versions/release-notes.md", "range": {"start": {"line": 30, "column": 21}}}, "severity": "WARNING"}

Check warning on line 30 in website/docs/docs/dbt-versions/release-notes.md

View workflow job for this annotation

GitHub Actions / vale

[vale] website/docs/docs/dbt-versions/release-notes.md#L30

[custom.Typos] Oops there's a typo -- did you really mean 'v1.9'?
Raw output
{"message": "[custom.Typos] Oops there's a typo -- did you really mean 'v1.9'? ", "location": {"path": "website/docs/docs/dbt-versions/release-notes.md", "range": {"start": {"line": 30, "column": 350}}}, "severity": "WARNING"}

<Expandable alt_header="Coalesce 2024 announcements">

Documentation for new features and functionality announced at Coalesce 2024:
Expand Down
8 changes: 8 additions & 0 deletions website/docs/docs/deploy/advanced-ci.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,14 @@ dbt reports the comparison differences in:

<Lightbox src="/img/docs/dbt-cloud/example-ci-compare-changes-tab.png" width="85%" title="Example of the Compare tab" />

### Speeding up comparisons
mirnawong1 marked this conversation as resolved.
Show resolved Hide resolved
It's common for CI jobs to only [build a subset of data](/best-practices/best-practice-workflows#limit-the-data-processed-when-in-development), for example only the last 7 days of data. When an [`event_time`](/reference/resource-configs/event-time) column is specified on your model, compare changes can:
mirnawong1 marked this conversation as resolved.
Show resolved Hide resolved

- Compare data in CI against production for only the overlapping times, avoiding false positives and returning results faster.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think both of these bullets have the same benefit of "using only the overlapping timeframe, which avoids incorrect row-count changes and returns results faster"

I would distinguish the 2 scenarios as:

  • scenarios where your CI job only builds a subset of data
  • scenarios where your CI job contains fresher data than production

Rather than nesting the second scenario within the first - lmk if that makes sense!

Copy link
Contributor Author

@mirnawong1 mirnawong1 Nov 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed it to this;

It's common for CI jobs to only build a subset of data (for example only the last 7 days of data).

When an event_time column is specified on your model, compare changes can optimize comparisons by using only the overlapping timeframe (meaning the timeframe exists in both the CI and production environment), helping you avoid incorrect row-count changes to return results faster.

This is useful in scenarios like:

  • Subset of data in CI — When CI builds only a subset of data (like the most recent 7 days), compare changes might interpret the excluded data as "deleted rows." Configuring event_time allows you to avoid this issue by limiting comparisons to the overlapping timeframe, preventing false alerts about data deletions that are just filtered out in CI.
  • Fresher data in CI than in production — When your CI job includes fresher data than production, compare changes might flag the additional rows as "new" data, even though they’re just fresher data in CI. With event_time configured, the comparison only includes the shared timeframe and correctly reflects actual changes in the data.

- Handle scenarios where CI contains fresher data than production by using only the overlapping timeframe, which avoids incorrect row-count changes.

<Lightbox src="/img/docs/deploy/apples_to_apples.png" width="90%" title="event_time ensures the same time-slice of data is accurately compared between your CI and production environments." />

mirnawong1 marked this conversation as resolved.
Show resolved Hide resolved
## About the cached data

After [comparing changes](#compare-changes), dbt Cloud stores a cache of no more than 100 records for each modified model for preview purposes. By caching this data, you can view the examples of changed data without rerunning the comparison against the data warehouse every time (optimizing for lower compute costs). To display the changes, dbt Cloud uses a cached version of a sample of the data records. These data records are queried from the database using the connection configuration (such as user, role, service account, and so on) that's set in the CI job's environment.
Expand Down
258 changes: 258 additions & 0 deletions website/docs/reference/resource-configs/event-time.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,258 @@
---
title: "event_time"
id: "event-time"
sidebar_label: "event_time"
resource_types: [models, seeds, source]
description: "dbt uses event_time to understand when an event occurred. When defined, event_time enables microbatch incremental models and more refined comparison of datasets during Advanced CI."
datatype: string
---

Available in dbt Cloud Versionless and dbt Core v1.9 and higher.

Check warning on line 10 in website/docs/reference/resource-configs/event-time.md

View workflow job for this annotation

GitHub Actions / vale

[vale] website/docs/reference/resource-configs/event-time.md#L10

[custom.Typos] Oops there's a typo -- did you really mean 'v1.9'?
Raw output
{"message": "[custom.Typos] Oops there's a typo -- did you really mean 'v1.9'? ", "location": {"path": "website/docs/reference/resource-configs/event-time.md", "range": {"start": {"line": 10, "column": 49}}}, "severity": "WARNING"}

<Tabs>
<TabItem value="model" label="Models">

<File name='dbt_project.yml'>

```yml
models:
[resource-path:](/reference/resource-configs/resource-path)
+event_time: my_time_field
```
</File>


<File name='models/properties.yml'>

```yml
models:
- name: model_name
[config](/reference/resource-properties/config):
event_time: my_time_field
```
</File>

<File name="models/modelname.sql">

```sql
{{ config(
event_time='my_time_field'
) }}
```

</File>

</TabItem>

<TabItem value="seeds" label="Seeds">

<File name='dbt_project.yml'>

```yml
seeds:
[resource-path:](/reference/resource-configs/resource-path)
+event_time: my_time_field
```
</File>

<File name='seeds/properties.yml'>

```yml
seeds:
- name: seed_name
[config](/reference/resource-properties/config):
event_time: my_time_field
```

</File>
</TabItem>

<TabItem value="snapshot" label="Snapshots">

<File name='dbt_project.yml'>

```yml
snapshots:
[resource-path:](/reference/resource-configs/resource-path)
+event_time: my_time_field
```
</File>

<VersionBlock firstVersion="1.9">
<File name='snapshots/properties.yml'>

```yml
snapshots:
- name: snapshot_name
[config](/reference/resource-properties/config):
event_time: my_time_field
```
</File>
</VersionBlock>

<VersionBlock lastVersion="1.8">

<File name="models/modlename.sql">

```sql

{{ config(
event_time: 'my_time_field'
) }}
```

</File>


import SnapshotYaml from '/snippets/_snapshot-yaml-spec.md';

<SnapshotYaml/>
</VersionBlock>



</TabItem>

<TabItem value="sources" label="Sources">

<File name='dbt_project.yml'>

```yml
sources:
[resource-path:](/reference/resource-configs/resource-path)
+event_time: my_time_field
```
</File>

<File name='models/properties.yml'>

```yml
sources:
- name: source_name
[config](/reference/resource-properties/config):
event_time: my_time_field
```

</File>
</TabItem>
</Tabs>

## Definition

Set the `event_time` to the name of the field that represents the timestamp of the event, as opposed to a date-like data loading date. You can configure `event_time` for a [model](/docs/build/models), [seed](/docs/build/seeds), or [source](/docs/build/sources) in your `dbt_project.yml` file, property YAML file, or config block.
mirnawong1 marked this conversation as resolved.
Show resolved Hide resolved
mirnawong1 marked this conversation as resolved.
Show resolved Hide resolved
mirnawong1 marked this conversation as resolved.
Show resolved Hide resolved

Here are some examples of good and bad `event_time` columns:
mirnawong1 marked this conversation as resolved.
Show resolved Hide resolved
✅ Good:

- `account_created_at` &mdash; This represents the specific time when an account was created, making it a fixed event in time.
- `session_began_at` &mdash; This captures the exact timestamp when a user session started, which won’t change and directly ties to the event.

❌ Bad:

- `_fivetran_synced` &mdash; This isn't the time that the event happened, it's the time that the event was ingested.
- `last_updated_at` &mdash; This isn't a good use case as this will keep changing over time.

`event_time` is required for [Incremental microbatch](/docs/build/incremental-microbatch) and [Advanced CI's compare changes](/docs/deploy/advanced-ci#speeding-up-comparisons) in CI/CD workflows, where it ensures the same time-slice of data is correctly compared between your CI and production environments.
mirnawong1 marked this conversation as resolved.
Show resolved Hide resolved

When you configure `event_time`, it enables compare changes to:
mirnawong1 marked this conversation as resolved.
Show resolved Hide resolved

- Compare data in CI versus production for overlapping times only, reducing false discrepancies.
- Handle scenarios where CI has "fresher" data than production by using only the overlapping timeframe, allowing you to avoid incorrect row-count changes.
mirnawong1 marked this conversation as resolved.
Show resolved Hide resolved
- Account for subset data builds in CI without flagging filtered-out rows as "deleted" when compared with production.
mirnawong1 marked this conversation as resolved.
Show resolved Hide resolved

## Examples

<Tabs>

<TabItem value="model" label="Models">

Here's an example in the `dbt_project.yml` file:

<File name='dbt_project.yml'>

```yml
models:
my_project:
user_sessions:
+event_time: session_start_time
```
</File>

Example in a properties YAML file:

<File name='models/properties.yml'>

```yml
models:
- name: user_sessions
config:
event_time: session_start_time
```

</File>

Example in sql model config block:

<File name="models/user_sessions.sql">

```sql
{{ config(
event_time='session_start_time'
) }}
```

</File>

This setup sets `session_start_time` as the `event_time` for the `user_sessions` model, which makes sure the compare changes process uses this timestamp for time-slice comparisons or incremental microbatching.
mirnawong1 marked this conversation as resolved.
Show resolved Hide resolved
</TabItem>

<TabItem value="seeds" label="Seeds">

Here's an example in the `dbt_project.yml` file:

<File name='dbt_project.yml'>

```yml
seeds:
my_project:
my_seed:
+event_time: record_timestamp
```

</File>

Example in a seed properties YAML:

<File name='seeds/properties.yml'>

```yml
seeds:
- name: my_seed
config:
event_time: record_timestamp
```
</File>

This setup sets `record_timestamp` as the `event_time` for `my_seed`. This ensures that the `record_timestamp` is used consistently for compare changes processes or incremental microbatching.
mirnawong1 marked this conversation as resolved.
Show resolved Hide resolved

</TabItem>
<TabItem value="sources" label="Sources">

Here's an example of source properties YAML file:

<File name='models/properties.yml'>

```yml
sources:
- name: source_name
tables:
- name: table_name
config:
event_time: event_timestamp
```
</File>

This setup sets `event_timestamp` as the `event_time` for the specified source table.

</TabItem>
</Tabs>
1 change: 1 addition & 0 deletions website/sidebars.js
Original file line number Diff line number Diff line change
Expand Up @@ -926,6 +926,7 @@ const sidebarSettings = {
"reference/resource-configs/alias",
"reference/resource-configs/database",
"reference/resource-configs/enabled",
"reference/resource-configs/event-time",
"reference/resource-configs/full_refresh",
"reference/resource-configs/contract",
"reference/resource-configs/grants",
Expand Down
mirnawong1 marked this conversation as resolved.
Show resolved Hide resolved
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading