Skip to content

Commit

Permalink
0.2.0 (#18)
Browse files Browse the repository at this point in the history
## Features
- major overhaul of `dataset.sql` so that each appended activity is
joined and aggregated in an independent CTE, which are then combined in
a final CTE
- clarified project variables usage
- large reorganization of the folders and yml
  • Loading branch information
tnightengale authored Mar 16, 2023
1 parent b8841a0 commit b7501ad
Show file tree
Hide file tree
Showing 43 changed files with 369 additions and 235 deletions.
83 changes: 40 additions & 43 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,11 +12,10 @@ modelling framework, based on the
- [Install](#install)
- [Usage](#usage)
- [Create a Dataset](#create-a-dataset)
- [Configure Columns](#configure-columns)
- [Required Columns](#required-columns)
- [Mapping Column Names](#mapping-column-names)
- [Included Dataset Columns](#included-dataset-columns)
- [Configure Appended Activity Column Names](#configure-appended-activity-column-names)
- [Required Columns](#required-columns)
- [Vars](#vars)
- [Column Mappings (optional)](#column-mappings-optional)
- [Included Columns (optional)](#included-columns-optional)
- [Macros](#macros)
- [Dataset (source)](#dataset-source)
- [Activity (source)](#activity-source)
Expand Down Expand Up @@ -59,17 +58,19 @@ Include in `packages.yml`:

```yaml
packages:
- git: "https://github.com/tnightengale/dbt-activity-schema"
revision: 0.1.0
- package: tnightengale/dbt_activity_schema
version: 0.2.0
```
For latest release, see
https://github.com/tnightengale/dbt-activity-schema/releases.
## Usage
### Create a Dataset
Use the [dataset macro](#dataset-source) with the appropriate arguments to
derive a Dataset by self-joining the Activity Stream model in your project. The
Use the [dataset macro](#dataset-source) to self-join an Activity Stream using
[relationships](#relationships).
The
[dataset macro](#dataset-source) will compile based on the provided [activity
macros](#activity-source) and the [relationship macros](#relationships). It
can then be nested in a CTE in a dbt-Core model. Eg:
Expand All @@ -80,7 +81,7 @@ with

dataset_cte as (
{{ dbt_activity_schema.dataset(
activity_stream_ref = ref("example__activity_stream"),
activity_stream = ref("example__activity_stream"),

primary_activity = dbt_activity_schema.activity(
dbt_activity_schema.all_ever(), "bought something"),
Expand All @@ -98,17 +99,12 @@ select * from dataset_cte

```
> Note: This package does not contain macros to create the Activity Stream
> model. It derives Dataset models on top of an existing Activity Stream model.
> model. It generates the SQL to self-join an existing Activity Stream model.
### Configure Columns
### Required Columns
This package conforms to the [Activity Schema V2
Specification](https://github.com/ActivitySchema/ActivitySchema/blob/main/2.0.md#entity-table)
and, by default, it expects the columns in that spec to exist in the Activity
Stream model.

#### Required Columns
In order for critical joins in the [dataset macro](#dataset-source) to work as
expected, the following columns must exist:
and requires the following columns to function:
- **`activity`**: A string or ID that identifies the action or fact
attributable to the `customer`.
- **`customer`**: The UUID of the entity or customer. Must be used across
Expand All @@ -120,18 +116,24 @@ expected, the following columns must exist:
- **`activity_occurrence`**: The running count of the activity per customer.
Create using a rank window function, partitioned by activity and customer.

#### Mapping Column Names
If the required columns exist conceptually under different names, they can be
aliased using the nested `activity_schema_v2_column_mappings` project var. Eg:
## Vars
This package can be configured with the following project variables. All project
vars can be scoped globally or to the `dbt_activity_schema` package.

### Column Mappings (optional)
The `column_mappings` project variable can be used to alias columns in Activity
Stream. If the [required columns](#required-columns) exist conceptually under
different names, they can be mapped to their names in the [V2
Specification](https://github.com/ActivitySchema/ActivitySchema/blob/main/2.0.md#entity-table).
Eg:

```yml
# dbt_project.yml

...

vars:
dbt_activity_schema:
activity_schema_v2_column_mappings:
column_mappings:
# Activity Stream with required column names that
# differ from the V2 spec, mapped from their spec name.
customer: entity_uuid
Expand All @@ -140,19 +142,18 @@ vars:
...
```

#### Included Dataset Columns
The set of columns that are included in the compiled SQL of the [dataset
macro](#dataset-source) can be configured using the nested
`default_dataset_columns` project var. Eg:
### Included Columns (optional)
The `included_columns` project variable can be set to indicate the default
columns to be included in each [activity](#activity-source) passed to
[dataset](#dataset-source). Eg:
```yml
# dbt_project.yml

...

vars:
dbt_activity_schema:
# List columns from the Activity Schema to include in the Dataset
default_dataset_columns:
included_columns:
- activity_id
- entity_uuid
- activity_occurred_at
Expand All @@ -161,27 +162,23 @@ vars:
...
```

These defaults can be overridden using the `override_columns` argument in the
[activity macro](#activity-source).
If it is not set, all the columns from the [V2
Specification](https://github.com/ActivitySchema/ActivitySchema/blob/main/2.0.md#entity-table)
will be included, based on the [columns macro](./macros/utils/columns.sql).

#### Configure Appended Activity Column Names
The naming convention of the columns, in the activities passed to the
`appended_activities` argument can be configured by overriding the
[generate_appended_column_alias](./macros/utils/generate_appended_column_alias.sql)
macro. See the dbt docs on [overriding package
macros](https://docs.getdbt.com/reference/dbt-jinja-functions/dispatch#overriding-package-macros)
for more details.
These defaults can be overridden on a per-activity basis by passing a list of column names to the `included_columns` argument in the
[activity macro](#activity-source).

## Macros

### Dataset ([source](macros/dataset.sql))
Generate the SQL for self-joining the Activity Stream.

**args:**
- **`activity_stream_ref (required)`** :
[ref](https://docs.getdbt.com/reference/dbt-jinja-functions/ref)
- **`activity_stream (required)`** :
[ref](https://docs.getdbt.com/reference/dbt-jinja-functions/ref) | str

The dbt `ref()` that points to the activity stream model.
The dbt `ref()` or a CTE name that contains the [required columns](#required-columns).

- **`primary_activity (required)`** : [activity](#activity-source)

Expand Down Expand Up @@ -209,9 +206,9 @@ dataset.
The string identifier of the activity in the Activity Stream. Should match the
value in the `activity` column.

- **`override_columns (optional)`** : List [ str ]
- **`included_columns (optional)`** : List [ str ]

List of columns to include for the activity. Setting this Overrides the
List of columns to include for the activity. Setting this overrides the
defaults configured by the `default_dataset_columns` project var.

- **`additional_join_condition (optional)`** : str
Expand Down
22 changes: 18 additions & 4 deletions dbt_project.yml
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@

# Project name.
name: 'dbt_activity_schema'
version: '0.1.1'
version: '0.2.0'
config-version: 2
require-dbt-version: [">=1.3.0"]
require-dbt-version: [">=1.3.0", "<2.0.0"]

# The "profile" dbt uses for this project.
profile: 'dbt_activity_schema'
Expand All @@ -18,5 +18,19 @@ snapshot-paths: ["snapshots"]

target-path: "target"
clean-targets:
- "target"
- "dbt_modules"
- "target"
- "dbt_modules"

vars:
included_columns:
- activity_id
- ts
- customer
- anonymous_customer_id
- activity
- activity_occurrence
- activity_repeated_at
- feature_json
- revenue_impact
- link
column_mappings: {}
4 changes: 2 additions & 2 deletions integration_tests/dbt_project.yml
Original file line number Diff line number Diff line change
Expand Up @@ -21,12 +21,12 @@ models:

vars:
dbt_activity_schema:
default_dataset_columns:
included_columns:
- activity_id
- entity_uuid
- ts
- revenue_impact
activity_schema_v2_column_mappings:
column_mappings:
customer: entity_uuid
anonymous_customer_id: anonymous_entity_uuid

Expand Down
15 changes: 13 additions & 2 deletions integration_tests/models/first_after/dataset__first_after_3.sql
Original file line number Diff line number Diff line change
Expand Up @@ -3,13 +3,24 @@
ref("input__first_after"),
dbt_activity_schema.activity(
dbt_activity_schema.all_ever(),
"signed up"
"signed up",
[
"activity_id",
"entity_uuid",
"ts",
"revenue_impact",
"feature_json"
]
),
[
dbt_activity_schema.activity(
dbt_activity_schema.first_after(),
"visit page",
["feature_json", "activity_occurrence", "ts"],
[
"feature_json",
"activity_occurrence",
"ts"
],
additional_join_condition="
json_extract({primary}.feature_json, 'type')
= json_extract({appended}.feature_json, 'type')
Expand Down
18 changes: 18 additions & 0 deletions integration_tests/models/first_after/first_after.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
version: 2

models:

- name: dataset__first_after_1
tests:
- dbt_utils.equality:
compare_model: ref("output__first_after_1")

- name: dataset__first_after_2
tests:
- dbt_utils.equality:
compare_model: ref("output__first_after_2")

- name: dataset__first_after_3
tests:
- dbt_utils.equality:
compare_model: ref("output__first_after_3")
8 changes: 8 additions & 0 deletions integration_tests/models/first_before/first_before.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
version: 2

models:

- name: dataset__first_before_1
tests:
- dbt_utils.equality:
compare_model: ref("output__first_before_1")
8 changes: 8 additions & 0 deletions integration_tests/models/first_ever/first_ever.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
version: 2

models:

- name: dataset__first_ever_1
tests:
- dbt_utils.equality:
compare_model: ref("output__first_ever_1")
Original file line number Diff line number Diff line change
Expand Up @@ -3,17 +3,36 @@
ref("input__first_in_between"),
dbt_activity_schema.activity(
dbt_activity_schema.all_ever(),
"signed up"
"signed up",
[
"activity_id",
"entity_uuid",
"ts",
"revenue_impact",
"feature_json"
]
),
[
dbt_activity_schema.activity(
dbt_activity_schema.first_in_between(),
"visit page",
["feature_json", "activity_occurrence", "ts"],
[
"feature_json",
"activity_occurrence",
"ts"
],
additional_join_condition="
json_extract({primary}.feature_json, 'type')
= json_extract({appended}.feature_json, 'type')
"
),
dbt_activity_schema.activity(
dbt_activity_schema.first_in_between(),
"bought something",
[
"activity_id",
"ts"
]
)
]
)
Expand Down
8 changes: 8 additions & 0 deletions integration_tests/models/last_after/last_after.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
version: 2

models:

- name: dataset__last_after_1
tests:
- dbt_utils.equality:
compare_model: ref("output__last_after_1")
8 changes: 8 additions & 0 deletions integration_tests/models/last_before/last_before.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
version: 2

models:

- name: dataset__last_before_1
tests:
- dbt_utils.equality:
compare_model: ref("output__last_before_1")
8 changes: 8 additions & 0 deletions integration_tests/models/last_ever/last_ever.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
version: 2

models:

- name: dataset__last_ever_1
tests:
- dbt_utils.equality:
compare_model: ref("output__last_ever_1")
43 changes: 0 additions & 43 deletions integration_tests/models/models.yml

This file was deleted.

Loading

0 comments on commit b7501ad

Please sign in to comment.