Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/add union data #123

Open
wants to merge 16 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
43 changes: 11 additions & 32 deletions .github/PULL_REQUEST_TEMPLATE/maintainer_pull_request_template.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,48 +4,27 @@
**This PR will result in the following new package version:**
<!--- Please add details around your decision for breaking vs non-breaking version upgrade. If this is a breaking change, were backwards-compatible options explored? -->

**Please detail what change(s) this PR introduces and any additional information that should be known during the review of this PR:**
**Please provide the finalized CHANGELOG entry which details the relevant changes included in this PR:**
<!--- Copy/paste the CHANGELOG for this version below. -->

## PR Checklist
### Basic Validation
Please acknowledge that you have successfully performed the following commands locally:
- [ ] dbt compile
- [ ] dbt run –full-refresh
- [ ] dbt run
- [ ] dbt test
- [ ] dbt run –vars (if applicable)
- [ ] dbt run –full-refresh && dbt test
- [ ] dbt run (if incremental models are present) && dbt test

Before marking this PR as "ready for review" the following have been applied:
- [ ] The appropriate issue has been linked and tagged
- [ ] You are assigned to the corresponding issue and this PR
- [ ] The appropriate issue has been linked, tagged, and properly assigned
- [ ] All necessary documentation and version upgrades have been applied
<!--- Be sure to update the package version in the dbt_project.yml, integration_tests/dbt_project.yml, and README if necessary. -->
- [ ] docs were regenerated (unless this PR does not include any code or yml updates)
- [ ] BuildKite integration tests are passing
- [ ] Detailed validation steps have been provided below

### Detailed Validation
Please acknowledge that the following validation checks have been performed prior to marking this PR as "ready for review":
- [ ] You have validated these changes and assure this PR will address the respective Issue/Feature.
- [ ] You are reasonably confident these changes will not impact any other components of this package or any dependent packages.
- [ ] You have provided details below around the validation steps performed to gain confidence in these changes.
Please share any and all of your validation steps:
<!--- Provide the steps you took to validate your changes below. -->

### Standard Updates
Please acknowledge that your PR contains the following standard updates:
- Package versioning has been appropriately indexed in the following locations:
- [ ] indexed within dbt_project.yml
- [ ] indexed within integration_tests/dbt_project.yml
- [ ] CHANGELOG has individual entries for each respective change in this PR
<!--- If there is a parallel upstream change, remember to reference the corresponding CHANGELOG as an individual entry. -->
- [ ] README updates have been applied (if applicable)
<!--- Remember to check the following README locations for common updates. →
<!--- Suggested install range (needed for breaking changes) →
<!--- Dependency matrix is appropriately updated (if applicable) →
<!--- New variable documentation (if applicable) -->
- [ ] DECISIONLOG updates have been updated (if applicable)
- [ ] Appropriate yml documentation has been added (if applicable)

### dbt Docs
Please acknowledge that after the above were all completed the below were applied to your branch:
- [ ] docs were regenerated (unless this PR does not include any code or yml updates)

### If you had to summarize this PR in an emoji, which would it be?
<!--- For a complete list of markdown compatible emojis check our this git repo (https://gist.github.com/rxaviers/7360908) -->
:dancer:
:dancer:
13 changes: 13 additions & 0 deletions .github/workflows/auto-release.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
name: 'auto release'
on:
pull_request:
types:
- closed
branches:
- main

jobs:
call-workflow-passing-data:
if: github.event.pull_request.merged
uses: fivetran/dbt_package_automations/.github/workflows/auto-release.yml@main
secrets: inherit
10 changes: 10 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,13 @@
# dbt_hubspot_source v0.15.0
fivetran-jamie marked this conversation as resolved.
Show resolved Hide resolved
[PR #123](https://github.com/fivetran/dbt_hubspot_source/pull/123) includes the following updates:

## 🎉 Feature Update 🎉
- This release supports running the package on multiple Hubspot sources at once! See the [README](https://github.com/fivetran/dbt_hubspot_source?tab=readme-ov-file#step-3-define-database-and-schema-variables) for details on how to leverage this feature.

## 🛠️ Under the Hood 🛠️
- Included auto-releaser GitHub Actions workflow to automate future releases.
- Updated the maintainer PR template to resemble the most up to date format.

# dbt_hubspot_source v0.14.0
[PR #122](https://github.com/fivetran/dbt_hubspot_source/pull/122) includes the following updates:

Expand Down
55 changes: 50 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,17 +44,62 @@ Include the following hubspot_source package version in your `packages.yml` file
```yaml
packages:
- package: fivetran/hubspot_source
version: [">=0.14.0", "<0.15.0"]
version: [">=0.15.0", "<0.16.0"]
```

## Step 3: Define database and schema variables
### Option 1: Single connector 💃
By default, this package runs using your destination and the `hubspot` schema. If this is not where your HubSpot data is (for example, if your HubSpot schema is named `hubspot_fivetran`), add the following configuration to your root `dbt_project.yml` file:

```yml
vars:
hubspot_database: your_destination_name
hubspot_schema: your_schema_name
```
> **Note**: If you are running the package on one source connector, each model will have a `source_relation` column that is just an empty string.
fivetran-jamie marked this conversation as resolved.
Show resolved Hide resolved

### Option 2: Union multiple connectors 👯
If you have multiple Hubspot connectors in Fivetran and would like to use this package on all of them simultaneously, we have provided functionality to do so. The package will union all of the data together and pass the unioned table into the transformations. You will be able to see which source it came from in the `source_relation` column of each model. To use this functionality, you will need to set either the `hubspot_union_schemas` OR `hubspot_union_databases` variables (cannot do both, though a more flexible approach is in the works...) in your root `dbt_project.yml` file:

```yml
# dbt_project.yml

vars:
hubspot_union_schemas: ['hubspot_usa','hubspot_canada'] # use this if the data is in different schemas/datasets of the same database/project
hubspot_union_databases: ['hubspot_usa','hubspot_canada'] # use this if the data is in different databases/projects but uses the same schema name
```

#### Recommended: Incorporate unioned sources into DAG
By default, this package defines one single-connector source, called `hubspot`, which will be disabled if you are unioning multiple connectors. This means that your DAG will not include your Hubspot sources, though the package will run successfully.

To properly incorporate all of your Hubspot connectors into your project's DAG:
1. Define each of your sources in a `.yml` file in your project. Utilize the following template for the `source`-level configurations, and, **most importantly**, copy and paste the table and column-level definitions from the package's `src_hubspot.yml` [file](https://github.com/fivetran/dbt_hubspot_source/blob/main/models/src_hubspot.yml#L9-L1313).

```yml
# a .yml file in your root project
sources:
- name: <name> # ex: hubspot_usa
schema: <schema_name> # one of var('hubspot_union_schemas') if unioning schemas, otherwise just 'hubspot'
database: <database_name> # one of var('hubspot_union_databases') if unioning databases, otherwise whatever DB your hubspot schemas all live in
loader: Fivetran
loaded_at_field: _fivetran_synced
tables: # copy and paste from models/src_hubspot.yml
```

> **Note**: If there are source tables you do not have (see [Step 4](https://github.com/fivetran/dbt_hubspot_source?tab=readme-ov-file#step-4-disable-models-for-non-existent-sources)), you may still include them here, as long as you have set the right variables to `False`. Otherwise, you may remove them from your source definitions.

2. Set the `has_defined_sources` variable (scoped to the `hubspot_source` package) to `True`, like such:
```yml
# dbt_project.yml
vars:
hubspot_source:
has_defined_sources: true
```

## Step 4: Disable models for non-existent sources

> _This step is unnecessary (but still available for use) if you are unioning multiple connectors together in the previous step. That is, the `union_data` macro we use will create completely empty staging models for sources that are not found in any of your Hubspot schemas/databases. However, you can still leverage the below variables if you would like to avoid this behavior._

When setting up your Hubspot connection in Fivetran, it is possible that not every table this package expects will be synced. This can occur because you either don't use that functionality in Hubspot or have actively decided to not sync some tables. Therefore we have added enable/disable configs in the `src.yml` to allow you to disable certain sources not present. Downstream models are automatically disabled as well. In order to disable the relevant functionality in the package, you will need to add the relevant variables in your root `dbt_project.yml`. By default, all variables are assumed to be `true` (with exception of `hubspot_service_enabled`, `hubspot_ticket_deal_enabled`, and `hubspot_contact_merge_audit_enabled`). You only need to add variables for the tables different from default:

```yml
Expand Down Expand Up @@ -111,10 +156,8 @@ vars:
hubspot_ticket_deal_enabled: true
```

### Dbt-core Version Requirement for disabling freshness tests
If you are not using a source table that involves freshness tests, please be aware that the feature to disable freshness was only introduced in dbt-core 1.1.0. Therefore ensure the dbt version you're using is v1.1.0 or greater for this config to work.

## (Optional) Step 5: Additional configurations
<details open><summary>Expand/collapse configurations</summary>

### Adding passthrough columns
This package includes all source columns defined in the macros folder. Models by default only bring in a few fields for the `company`, `contact`, `deal`, and `ticket` tables. You can add more columns using our pass-through column variables. These variables allow for the pass-through fields to be aliased (`alias`) and casted (`transform_sql`) if desired, but not required. Datatype casting is configured via a sql snippet within the `transform_sql` key. You may add the desired sql while omitting the `as field_name` at the end and your custom pass-though fields will be casted accordingly. Use the below format for declaring the respective pass-through variables within your root `dbt_project.yml`.
Expand Down Expand Up @@ -206,7 +249,7 @@ models:
+schema: my_new_schema_name # leave blank for just the target_schema
```

### Change the source table references
### Change the source table references (only if using a single connector)
If an individual source table has a different name than the package expects, add the table name as it appears in your destination to the respective variable:
> IMPORTANT: See this project's [`dbt_project.yml`](https://github.com/fivetran/dbt_hubspot_source/blob/main/dbt_project.yml) variable declarations to see the expected names.

Expand All @@ -215,6 +258,8 @@ vars:
hubspot_<default_source_table_name>_identifier: your_table_name
```

</details>

## (Optional) Step 6: Orchestrate your models with Fivetran Transformations for dbt Core™

Fivetran offers the ability for you to orchestrate your dbt project through [Fivetran Transformations for dbt Core™](https://fivetran.com/docs/transformations/dbt). Learn how to set up your project for orchestration through Fivetran in our [Transformations for dbt Core setup guides](https://fivetran.com/docs/transformations/dbt#setupguide).
Expand Down
2 changes: 1 addition & 1 deletion dbt_project.yml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
name: 'hubspot_source'
version: '0.14.0'
version: '0.15.0'
config-version: 2
require-dbt-version: [">=1.3.0", "<2.0.0"]
models:
Expand Down
10 changes: 5 additions & 5 deletions integration_tests/ci/sample.profiles.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,13 +16,13 @@ integration_tests:
pass: "{{ env_var('CI_REDSHIFT_DBT_PASS') }}"
dbname: "{{ env_var('CI_REDSHIFT_DBT_DBNAME') }}"
port: 5439
schema: hubspot_source_integration_tests_999
schema: hubspot_source_integration_tests_001
threads: 8
bigquery:
type: bigquery
method: service-account-json
project: 'dbt-package-testing'
schema: hubspot_source_integration_tests_999
schema: hubspot_source_integration_tests_001
threads: 8
keyfile_json: "{{ env_var('GCLOUD_SERVICE_KEY') | as_native }}"
snowflake:
Expand All @@ -33,7 +33,7 @@ integration_tests:
role: "{{ env_var('CI_SNOWFLAKE_DBT_ROLE') }}"
database: "{{ env_var('CI_SNOWFLAKE_DBT_DATABASE') }}"
warehouse: "{{ env_var('CI_SNOWFLAKE_DBT_WAREHOUSE') }}"
schema: hubspot_source_integration_tests_999
schema: hubspot_source_integration_tests_001
threads: 8
postgres:
type: postgres
Expand All @@ -42,13 +42,13 @@ integration_tests:
pass: "{{ env_var('CI_POSTGRES_DBT_PASS') }}"
dbname: "{{ env_var('CI_POSTGRES_DBT_DBNAME') }}"
port: 5432
schema: hubspot_source_integration_tests_999
schema: hubspot_source_integration_tests_001
threads: 8
databricks:
catalog: null
host: "{{ env_var('CI_DATABRICKS_DBT_HOST') }}"
http_path: "{{ env_var('CI_DATABRICKS_DBT_HTTP_PATH') }}"
schema: hubspot_source_integration_tests_999
schema: hubspot_source_integration_tests_001
threads: 8
token: "{{ env_var('CI_DATABRICKS_DBT_TOKEN') }}"
type: databricks
4 changes: 2 additions & 2 deletions integration_tests/dbt_project.yml
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
name: 'hubspot_source_integration_tests'
version: '0.14.0'
version: '0.15.0'
profile: 'integration_tests'
config-version: 2
models:
hubspot_source:
+schema:
vars:
hubspot_schema: hubspot_source_integration_tests_999
hubspot_schema: hubspot_source_integration_tests_001
hubspot_source:
hubspot_service_enabled: true
# hubspot_sales_enabled: true # enable when generating docs
Expand Down
5 changes: 4 additions & 1 deletion macros/add_property_labels.sql
Original file line number Diff line number Diff line change
Expand Up @@ -22,16 +22,19 @@ select {{ cte_name }}.*
left join -- create subset of property and property_options for property in question
(select
property_option.property_option_value,
property_option.property_option_label
property_option.property_option_label,
property_option.source_relation
from {{ ref('stg_hubspot__property_option') }} as property_option
join {{ ref('stg_hubspot__property') }} as property
on property_option.property_id = property._fivetran_id
and property_option.source_relation = property.source_relation
where property.property_name = '{{ col.name.replace('property_', '') }}'
and property.hubspot_object = '{{ source_name }}'
) as {{ col.name }}_option

on cast({{ cte_name }}.{{ col_alias }} as {{ dbt.type_string() }})
= cast({{ col.name }}_option.property_option_value as {{ dbt.type_string() }})
and {{ cte_name }}.source_relation = {{ col.name }}_option.source_relation

{% endif -%}
{%- endfor %}
Expand Down
2 changes: 1 addition & 1 deletion macros/all_passthrough_column_check.sql
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

{% set available_passthrough_columns = fivetran_utils.remove_prefix_from_columns(
columns=adapter.get_columns_in_relation(ref(relation)),
prefix='property_', exclude=get_macro_columns(get_columns))
prefix='property_', exclude=(get_macro_columns(get_columns) + ['_dbt_source_relation']))
%}

{{ return(available_passthrough_columns|length) }}
Expand Down
4 changes: 4 additions & 0 deletions models/docs.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,10 @@
Timestamp of when Fivetran synced a record.
{% enddocs %}

{% docs source_relation %}
The schema or database this record came from if you are unioning multiple connectors together in this package. If you are running the package on a single connector, this will be its schema name.
{% enddocs %}

{% docs _fivetran_deleted %}
Boolean indicating whether a record has been deleted in Hubspot and/or inferred deleted in Hubspot by Fivetran; _fivetran_deleted and is_deleted fields are equivalent.
{% enddocs %}
Expand Down
2 changes: 2 additions & 0 deletions models/src_hubspot.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,8 @@ sources:
database: "{% if target.type != 'spark'%}{{ var('hubspot_database', target.database) }}{% endif %}"
loader: Fivetran
loaded_at_field: _fivetran_synced
config:
enabled: "{{ var('hubspot_union_schemas', []) == [] and var('hubspot_union_databases', []) == [] }}"
tables:
- name: calendar_event
identifier: "{{ var('hubspot_calendar_event_identifier', 'calendar_event')}}"
Expand Down
21 changes: 19 additions & 2 deletions models/stg_hubspot__company.sql
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,14 @@ with base as (
staging_columns=get_company_columns()
)
}}

{{
fivetran_utils.source_relation(
union_schema_variable='hubspot_union_schemas',
union_database_variable='hubspot_union_databases'
)
}}

from base

), fields as (
Expand All @@ -27,12 +35,20 @@ with base as (
staging_columns=get_company_columns()
)
}}

{{
fivetran_utils.source_relation(
union_schema_variable='hubspot_union_schemas',
union_database_variable='hubspot_union_databases'
)
}}

{% if all_passthrough_column_check('stg_hubspot__company_tmp',get_company_columns()) > 0 %}
-- just pass everything through if extra columns are present, but ensure required columns are present.
,{{
fivetran_utils.remove_prefix_from_columns(
columns=adapter.get_columns_in_relation(ref('stg_hubspot__company_tmp')),
prefix='property_', exclude=get_macro_columns(get_company_columns()))
prefix='property_', exclude=(get_macro_columns(get_company_columns()) + ['_dbt_source_relation']))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain the need for this addition?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment applies to the other models with the similar code update.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah essentially we don't want to include _dbt_source_relation (which is created by union_data/union_relation) in the remove_prefix_from_columns macro call.

without adding it to the exclude list, users passing through all columns would end up with both a source_relation and _dbt_source_relation column, which is redundant and a lil confusing. thus, this change makes sure that these users just have the more-nicely-named source_relation field.

}}
{% endif %}
from base
Expand All @@ -52,7 +68,8 @@ with base as (
city,
state,
country,
company_annual_revenue
company_annual_revenue,
source_relation

--The below macro adds the fields defined within your hubspot__ticket_pass_through_columns variable into the staging model
{{ fivetran_utils.fill_pass_through_columns('hubspot__company_pass_through_columns') }}
Expand Down
10 changes: 9 additions & 1 deletion models/stg_hubspot__company.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,9 +19,16 @@ models:
description: '{{ doc("history_name") }}'
- name: new_value
description: '{{ doc("history_value") }}'
- name: source_relation
description: '{{ doc("source_relation") }}'

- name: stg_hubspot__company
description: Each record represents a company in Hubspot.
tests:
- dbt_utils.unique_combination_of_columns:
combination_of_columns:
- company_id
- source_relation
columns:
- name: _fivetran_synced
description: '{{ doc("_fivetran_synced") }}'
Expand All @@ -30,7 +37,6 @@ models:
- name: company_id
description: The ID of the company.
tests:
- unique
- not_null
- name: company_name
description: The name of the company.
Expand All @@ -52,3 +58,5 @@ models:
description: The country where the company is located.
- name: company_annual_revenue
description: The actual or estimated annual revenue of the company.
- name: source_relation
description: '{{ doc("source_relation") }}'
11 changes: 10 additions & 1 deletion models/stg_hubspot__company_property_history.sql
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,14 @@ with base as (
staging_columns=get_company_property_history_columns()
)
}}

{{
fivetran_utils.source_relation(
union_schema_variable='hubspot_union_schemas',
union_database_variable='hubspot_union_databases'
)
}}

from base

), fields as (
Expand All @@ -25,7 +33,8 @@ with base as (
source as change_source,
source_id as change_source_id,
cast(change_timestamp as {{ dbt.type_timestamp() }}) as change_timestamp, -- source field name = timestamp ; alias declared in macros/get_company_property_history_columns.sql
value as new_value
value as new_value,
source_relation
from macro

)
Expand Down
Loading