Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

introduce the hubspot engagement table to adjust the joins in int_hub… #13

Open
wants to merge 19 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 15 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,18 @@
# dbt_unified_rag v0.1.0-a4

## Breaking Changes
- Added the hubspot `engagement` source table to the package and made the following updates:
- Added `stg_rag_hubspot__engagement` model as part of the hubspot staging models and updated relevant documentation.
- Updated `int_rag_hubspot__deal_document` joins so that `stg_rag_hubspot__engagement` table joins first over the `stg_rag_hubspot__engagement_contact` and `stg_rag_hubspot__engagement_company` tables to bring in all necessary engagement records.
- Updated `int_rag_hubspot__deal_document` to retrieve `engagement_type` from the hubspot `engagement` table, as opposed to the `engagement_email` and `engagement_note` tables. As such, removes their respective references as they are no longer used in this model.

## Bug Fix (`--full-refresh` required when upgrading)
- Updated the `unique_id` in `rag__unified_document` to include `chunk_index`. Previously, the unique key was a combination of only `document_id`, `platform`, and `source_relation`, which was potentially inaccurate if there were multiple chunks associated with a document.

## Under the Hood
- Updated the *hubspot_x* seed data and *get_hubspot_x_columns* macros with the new `category` field where relevant.
Copy link
Contributor

@fivetran-avinash fivetran-avinash Nov 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Updated the *hubspot_x* seed data and *get_hubspot_x_columns* macros with the new `category` field where relevant.
- Updated the `hubspot_*` seed data and `get_hubspot_*_columns` macros with the new `category` field where relevant.

Small suggestion update.

- Updated missing field descriptions in the Hubspot documentation.

# dbt_unified_rag v0.1.0-a3
[PR #9](https://github.com/fivetran/dbt_unified_rag/pull/9) includes the following updates:

Expand Down
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Unified RAG dbt Package ([Docs](https://fivetran.github.io/dbt_unified_rag/))

<p align="center">
<p align="left">
fivetran-joemarkiewicz marked this conversation as resolved.
Show resolved Hide resolved
<a alt="License"
href="https://github.com/fivetran/dbt_unified_rag/blob/main/LICENSE">
<img src="https://img.shields.io/badge/License-Apache%202.0-blue.svg" /></a>
Expand Down Expand Up @@ -46,7 +46,7 @@ Include the following package_display_name package version in your `packages.yml
```yml
packages:
- package: fivetran/unified_rag
version: 0.1.0-a3
version: 0.1.0-a4
```

### Step 3: Define database and schema variables
Expand Down
1 change: 1 addition & 0 deletions dbt_project.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ vars:
jira_priority: "{{ source('rag_jira', 'priority') }}"

# Hubspot Sources
hubspot_engagement: "{{ source('rag_hubspot', 'engagement') }}"
hubspot_engagement_note: "{{ source('rag_hubspot', 'engagement_note') }}"
hubspot_engagement_email: "{{ source('rag_hubspot', 'engagement_email') }}"
hubspot_engagement_company: "{{ source('rag_hubspot', 'engagement_company') }}"
Expand Down
2 changes: 1 addition & 1 deletion docs/catalog.json

Large diffs are not rendered by default.

274 changes: 63 additions & 211 deletions docs/index.html

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion docs/manifest.json

Large diffs are not rendered by default.

10 changes: 5 additions & 5 deletions integration_tests/ci/sample.profiles.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,13 +16,13 @@ integration_tests:
pass: "{{ env_var('CI_REDSHIFT_DBT_PASS') }}"
dbname: "{{ env_var('CI_REDSHIFT_DBT_DBNAME') }}"
port: 5439
schema: rag_integration_tests_1
schema: rag_integration_tests_3
threads: 8
bigquery:
type: bigquery
method: service-account-json
project: 'dbt-package-testing'
schema: rag_integration_tests_1
schema: rag_integration_tests_3
threads: 8
keyfile_json: "{{ env_var('GCLOUD_SERVICE_KEY') | as_native }}"
snowflake:
Expand All @@ -33,7 +33,7 @@ integration_tests:
role: "{{ env_var('CI_SNOWFLAKE_DBT_ROLE') }}"
database: "{{ env_var('CI_SNOWFLAKE_DBT_DATABASE') }}"
warehouse: "{{ env_var('CI_SNOWFLAKE_DBT_WAREHOUSE') }}"
schema: rag_integration_tests_1
schema: rag_integration_tests_3
threads: 8
postgres:
type: postgres
Expand All @@ -42,13 +42,13 @@ integration_tests:
pass: "{{ env_var('CI_POSTGRES_DBT_PASS') }}"
dbname: "{{ env_var('CI_POSTGRES_DBT_DBNAME') }}"
port: 5432
schema: rag_integration_tests_1
schema: rag_integration_tests_3
threads: 8
databricks:
catalog: "{{ env_var('CI_DATABRICKS_DBT_CATALOG') }}"
host: "{{ env_var('CI_DATABRICKS_DBT_HOST') }}"
http_path: "{{ env_var('CI_DATABRICKS_DBT_HTTP_PATH') }}"
schema: rag_integration_tests_1
schema: rag_integration_tests_3
threads: 2
token: "{{ env_var('CI_DATABRICKS_DBT_TOKEN') }}"
type: databricks
12 changes: 8 additions & 4 deletions integration_tests/dbt_project.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,14 +6,15 @@ version: '0.1.0'
profile: "integration_tests"

vars:
rag_hubspot_schema: "rag_integration_tests_1"
rag_zendesk_schema: "rag_integration_tests_1"
rag_jira_schema: "rag_integration_tests_1"
rag_hubspot_schema: "rag_integration_tests_3"
rag_zendesk_schema: "rag_integration_tests_3"
rag_jira_schema: "rag_integration_tests_3"

rag__using_jira: True
rag__using_zendesk: True
rag__using_hubspot: True

rag_hubspot_engagement_identifier: "hubspot_engagement"
rag_hubspot_engagement_note_identifier: "hubspot_engagement_note"
rag_hubspot_engagement_email_identifier: "hubspot_engagement_email"
rag_hubspot_engagement_company_identifier: "hubspot_engagement_company"
Expand All @@ -32,7 +33,7 @@ vars:
rag_zendesk_ticket_comment_identifier: "zendesk_ticket_comment"
rag_zendesk_user_identifier: "zendesk_user"

document_max_tokens: 2000
document_max_tokens: 50

seeds:
rag_integration_tests:
Expand Down Expand Up @@ -98,6 +99,9 @@ seeds:
_fivetran_synced: timestamp
property_closedate: timestamp
property_createdate: timestamp
hubspot_engagement:
+column_types:
id: "{{ 'int64' if target.type == 'bigquery' else 'bigint' }}"
hubspot_engagement_company:
+column_types:
engagement_id: "{{ 'int64' if target.type == 'bigquery' else 'bigint' }}"
Expand Down
11 changes: 11 additions & 0 deletions integration_tests/seeds/hubspot_engagement.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
id,_fivetran_synced,active,activity_type,created_at,last_updated,owner_id,portal_id,timestamp,type
19732910159,2023-11-02 13:50:33.715000 UTC,,,,,,4703379,,CALL
19732728857,2023-11-02 13:50:34.157000 UTC,,,,,,4703379,,NOTE
32034885640,2023-11-02 13:50:34.159000 UTC,,,,,,4703379,,NOTE
32034887079,2023-11-02 13:50:34.160000 UTC,,,,,,4703379,,NOTE
32034932747,2023-11-02 13:50:34.161000 UTC,,,,,,4703379,,NOTE
32034933592,2023-11-02 13:50:34.161000 UTC,,,,,,4703379,,NOTE
32083319945,2023-11-02 13:50:34.162000 UTC,,,,,,4703379,,NOTE
27683507047,2023-11-02 13:50:34.157000 UTC,,,,,,4703379,,NOTE
27683507372,2023-11-02 13:50:34.158000 UTC,,,,,,4703379,,NOTE
27683512957,2023-11-02 13:50:34.158000 UTC,,,,,,4703379,,NOTE
5 changes: 2 additions & 3 deletions integration_tests/seeds/hubspot_engagement_company.csv
Original file line number Diff line number Diff line change
@@ -1,3 +1,2 @@
company_id,engagement_id,engagement_type,type_id,_fivetran_synced
9991774791,31479928586,EMAIL,194,2023-06-08 23:22:50.829000
2319384765,19732728857,NOTE,195,2023-06-08 23:22:50.829000
company_id,engagement_id,engagement_type,type_id,_fivetran_synced,category
9991774791,31479928586,TASK,192,2023-06-08 23:22:50.829000,HUBSPOT_DEFINED
6 changes: 3 additions & 3 deletions integration_tests/seeds/hubspot_engagement_contact.csv
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
contact_id,engagement_id,engagement_type,type_id,_fivetran_synced
501,19732910159,NOTE,195,2023-06-08 23:22:49.869000
76340251,31479928586,EMAIL,194,2023-06-08 23:22:49.869000
contact_id,engagement_id,engagement_type,type_id,_fivetran_synced,category
501,19732910159,NOTE,195,2023-06-08 23:22:49.869000,HUBSPOT_DEFINED
76340251,31479928586,EMAIL,194,2023-06-08 23:22:49.869000,HUBSPOT_DEFINED
6 changes: 3 additions & 3 deletions integration_tests/seeds/hubspot_engagement_deal.csv
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
deal_id,engagement_id,engagement_type,type_id,_fivetran_synced
10828857779,31479928586,EMAIL,216,2023-06-08 23:22:50.768000
10828857779,19732728857,NOTE,216,2023-06-08 23:22:50.768000
deal_id,engagement_id,engagement_type,type_id,_fivetran_synced,category
10828857779,31479928586,EMAIL,216,2023-06-08 23:22:50.768000,HUBSPOT_DEFINED
10828857779,19732728857,NOTE,216,2023-06-08 23:22:50.768000,HUBSPOT_DEFINED
2 changes: 1 addition & 1 deletion integration_tests/seeds/jira_comment.csv
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
id,_fivetran_synced,author_id,body,created,is_public,issue_id,update_author_id,updated
1,2020-11-12 12:20:53.148,1a,Hello,2020-11-10 19:19:41.224,true,10011,1a,2020-11-10 19:19:41.224
1,2020-11-12 12:20:53.148,1a,The quick brown fox jumps over the lazy dog. This sentence uses every letter in the English alphabet. It is often used as a typing practice sentence. Repetition of this sentence will ensure a consistent length. The quick brown fox jumps over the lazy dog. This sentence uses every letter in the English alphabet. It is often used as a typing practice sentence. Repetition of this sentence will ensure a consistent length. The quick brown fox jumps over the lazy dog. This sentence uses every letter in the English alphabet. It is often used as a typing practice sentence. Repetition of this sentence will ensure a consistent length. The quick brown fox jumps over the lazy dog. This sentence uses every letter in the English alphabet. It is often used as a typing practice sentence. Repetition of this sentence will ensure a consistent length. The quick brown fox jumps over the lazy dog. This sentence uses every letter in the English alphabet. It is often used as a typing practice sentence.,2020-11-10 19:19:41.224,true,10011,1a,2020-11-10 19:19:41.224
2,2020-11-10 19:21:48.619,1a,To Do to In Progress 6 days 22 hours 26 minutes ago In Progress to Done 3 days 16 hours 34 minutes ago,2020-11-07 02:45:38.717,true,10011,1a,2020-11-07 02:45:38.717
3,2020-11-10 19:21:48.618,1a,Joined Sample Sprint 2 7 days 9 hours 10 minutes ago,2020-11-07 02:45:38.717,true,10011,1a,2020-11-07 02:45:38.717
16 changes: 16 additions & 0 deletions macros/staging/hubspot/get_hubspot_engagement_columns.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
{% macro get_hubspot_engagement_columns() %}

{% set columns = [
{"name": "_fivetran_synced", "datatype": dbt.type_timestamp()},
{"name": "active", "datatype": "boolean", "alias": "is_active"},
{"name": "created_at", "datatype": dbt.type_timestamp(), "alias": "created_timestamp"},
{"name": "id", "datatype": dbt.type_int()},
{"name": "owner_id", "datatype": dbt.type_int()},
{"name": "portal_id", "datatype": dbt.type_int()},
{"name": "timestamp", "datatype": dbt.type_timestamp(), "alias": "occurred_timestamp"},
{"name": "type", "datatype": dbt.type_string(), "alias": "engagement_type"}
] %}

{{ return(columns) }}

{% endmacro %}
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,8 @@
{% set columns = [
{"name": "_fivetran_synced", "datatype": dbt.type_timestamp()},
{"name": "company_id", "datatype": dbt.type_int()},
{"name": "engagement_id", "datatype": dbt.type_int()}
{"name": "engagement_id", "datatype": dbt.type_int()},
{"name": "category", "datatype": dbt.type_string()}
] %}

{{ return(columns) }}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,8 @@
{% set columns = [
{"name": "_fivetran_synced", "datatype": dbt.type_timestamp()},
{"name": "contact_id", "datatype": dbt.type_int()},
{"name": "engagement_id", "datatype": dbt.type_int()}
{"name": "engagement_id", "datatype": dbt.type_int()},
{"name": "category", "datatype": dbt.type_string()}
] %}

{{ return(columns) }}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,8 @@
{% set columns = [
{"name": "_fivetran_synced", "datatype": dbt.type_timestamp()},
{"name": "deal_id", "datatype": dbt.type_int()},
{"name": "engagement_id", "datatype": dbt.type_int()}
{"name": "engagement_id", "datatype": dbt.type_int()},
{"name": "category", "datatype": dbt.type_string()}
] %}

{{ return(columns) }}
Expand Down
38 changes: 14 additions & 24 deletions models/intermediate/hubspot/int_rag_hubspot__deal_document.sql
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I realized that now that engagement exists and the field engagement_type is available in that table, do we still need the coalesce between "engagement_emails.engagement_type", "engagement_notes.engagement_type" here

{{ unified_rag.coalesce_cast(["engagement_emails.engagement_type", "engagement_notes.engagement_type", "'UNKNOWN'"], dbt.type_string()) }} as engagement_type,

in order to grab engagement_type

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update-- have since removed the previous "engagement_emails.engagement_type", "engagement_notes.engagement_type" to swap with engagement.engagement_type

Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,11 @@ companies as (
from {{ ref('stg_rag_hubspot__company') }}
),

engagements as (
select *
from {{ ref('stg_rag_hubspot__engagement') }}
),

engagement_companies as (

select *
Expand All @@ -30,18 +35,6 @@ engagement_contacts as (
from {{ ref('stg_rag_hubspot__engagement_contact') }}
),

engagement_emails as (

select *
from {{ ref('stg_rag_hubspot__engagement_email') }}
),

engagement_notes as (

select *
from {{ ref('stg_rag_hubspot__engagement_note') }}
),

engagement_deals as (

select *
Expand All @@ -53,7 +46,7 @@ engagement_detail_prep as (
select
deals.deal_id,
deals.deal_name,
{{ unified_rag.coalesce_cast(["engagement_emails.engagement_type", "engagement_notes.engagement_type", "'UNKNOWN'"], dbt.type_string()) }} as engagement_type,
{{ unified_rag.coalesce_cast(["engagements.engagement_type", "'UNKNOWN'"], dbt.type_string()) }} as engagement_type,
{{ dbt.concat(["'https://app.hubspot.com/contacts'", "deals.portal_id", "'/record/0-3/'", "deals.deal_id"]) }} as url_reference,
deals.source_relation,
{{ unified_rag.coalesce_cast(["contacts.contact_name", "'UNKNOWN'"], dbt.type_string()) }} as contact_name,
Expand All @@ -64,24 +57,21 @@ engagement_detail_prep as (
left join engagement_deals
on deals.deal_id = engagement_deals.deal_id
and deals.source_relation = engagement_deals.source_relation
left join engagements
on engagement_deals.engagement_id = engagements.engagement_id
and engagement_deals.source_relation = engagements.source_relation
left join engagement_contacts
on engagement_deals.engagement_id = engagement_contacts.engagement_id
and engagement_deals.source_relation = engagement_contacts.source_relation
on engagements.engagement_id = engagement_contacts.engagement_id
and engagements.source_relation = engagement_contacts.source_relation
left join engagement_companies
on engagements.engagement_id = engagement_companies.engagement_id
and engagements.source_relation = engagement_companies.source_relation
left join contacts
on engagement_contacts.contact_id = contacts.contact_id
and engagement_contacts.source_relation = contacts.source_relation
left join engagement_companies
on engagement_deals.engagement_id = engagement_companies.engagement_id
and engagement_deals.source_relation = engagement_companies.source_relation
left join companies
on engagement_companies.company_id = companies.company_id
and engagement_companies.source_relation = companies.source_relation
left join engagement_emails
on engagement_deals.engagement_id = engagement_emails.engagement_id
and engagement_deals.source_relation = engagement_emails.source_relation
left join engagement_notes
on engagement_deals.engagement_id = engagement_notes.engagement_id
and engagement_deals.source_relation = engagement_notes.source_relation
fivetran-joemarkiewicz marked this conversation as resolved.
Show resolved Hide resolved
),

engagement_details as (
Expand Down
2 changes: 1 addition & 1 deletion models/rag__unified_document.sql
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@
{% for platform in enabled_variables %}
{% if var(platform) == true -%}
{%- set platform_name = platform | replace('rag__using_', '') -%}
{%- set unique_key_fields = ['document_id', 'platform', 'source_relation'] -%}
{%- set unique_key_fields = ['document_id', 'platform', 'chunk_index', 'source_relation'] -%}
{% set select_statement = (
"select \n" ~
" " ~ dbt_utils.generate_surrogate_key(unique_key_fields) ~ "as unique_id, \n" ~
Expand Down
35 changes: 35 additions & 0 deletions models/staging/hubspot_staging/src_rag_hubspot.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,41 @@ sources:
loaded_at_field: _fivetran_synced

tables:
- name: engagement
identifier: "{{ var('rag_hubspot_engagement_identifier', 'engagement')}}"
description: Each record represents an engagement
config:
enabled: "{{ var('rag_hubspot_sales_enabled', true) and var('rag_hubspot_engagement_enabled', true) }}"
columns:
- name: _fivetran_synced
description: '{{ doc("_fivetran_synced") }}'
- name: active
description: >
Whether the engagement is currently being shown in the UI.

PLEASE NOTE: This field will not be populated for connectors utilizing the HubSpot v3 API version. This field will be deprecated in a future release.
- name: created_at
description: >
A timestamp representing when the engagement was created.

PLEASE NOTE: This field will not be populated for connectors utilizing the HubSpot v3 API version. This field will be deprecated in a future release.
- name: id
description: The ID of the engagement.
- name: owner_id
description: >
The ID of the engagement's owner.

PLEASE NOTE: This field will not be populated for connectors utilizing the HubSpot v3 API version. This field will be deprecated in a future release.
- name: portal_id
description: '{{ doc("portal_id") }}'
- name: timestamp
description: >
A timestamp in representing the time that the engagement should appear in the timeline.

PLEASE NOTE: This field will not be populated for connectors utilizing the HubSpot v3 API version. This field will be deprecated in a future release.
- name: type
description: One of NOTE, EMAIL, TASK, MEETING, or CALL, the type of the engagement.

- name: engagement_note
identifier: "{{ var('rag_hubspot_engagement_note_identifier', 'engagement_note')}}"
description: Each record represents a NOTE engagement event.
Expand Down
Loading