Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

introduce the hubspot engagement table to adjust the joins in int_hub… #13

Open
wants to merge 19 commits into
base: main
Choose a base branch
from

Conversation

fivetran-reneeli
Copy link
Contributor

@fivetran-reneeli fivetran-reneeli commented Nov 20, 2024

PR Overview

This PR will address the following Issue/Feature: #11 #10 #12

This PR will result in the following new package version:

v0.1.0-a4

Please provide the finalized CHANGELOG entry which details the relevant changes included in this PR:

Breaking Changes

  • Added the hubspot engagement table to the package and made the following updates:
    • Added stg_rag_hubspot__engagement model as part of the hubspot staging models.
    • Updated int_rag_hubspot__deal_document to adjust the method that hubspot_engagement_* models are joined by leveraging the hubspot__engagement table as the intermediary joining table for the engagement_contact and engagement_company tables.
    • Updated int_rag_hubspot__deal_document to retrieve engagement_type from the hubspot engagement table as opposed to the engagement_emails and engagement_notes tables. As such, removes their respective references as they are no longer used in this model.

Under the Hood

  • Updated the unique key in rag__unified_document to include chunk_index. Previously, the unique key was a combination of only document_id, platform, and source_relation, which was potentially inaccurate if there were multiple chunks associated with a document.

PR Checklist

Basic Validation

Please acknowledge that you have successfully performed the following commands locally:

  • dbt run –full-refresh && dbt test
  • [na] dbt run (if incremental models are present) && dbt test

Before marking this PR as "ready for review" the following have been applied:

  • The appropriate issue has been linked, tagged, and properly assigned
  • All necessary documentation and version upgrades have been applied
  • docs were regenerated (unless this PR does not include any code or yml updates)
  • BuildKite integration tests are passing
  • Detailed validation steps have been provided below

Detailed Validation

Please share any and all of your validation steps:

If you had to summarize this PR in an emoji, which would it be?

💃

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I realized that now that engagement exists and the field engagement_type is available in that table, do we still need the coalesce between "engagement_emails.engagement_type", "engagement_notes.engagement_type" here

{{ unified_rag.coalesce_cast(["engagement_emails.engagement_type", "engagement_notes.engagement_type", "'UNKNOWN'"], dbt.type_string()) }} as engagement_type,

in order to grab engagement_type

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update-- have since removed the previous "engagement_emails.engagement_type", "engagement_notes.engagement_type" to swap with engagement.engagement_type

Copy link
Collaborator

@fivetran-joemarkiewicz fivetran-joemarkiewicz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fivetran-reneeli thanks for working through these changes. I have a few comments to be addressed before approval. Let me know if you have any questions. Thanks!

CHANGELOG.md Outdated
Comment on lines 9 to 10
## Under the Hood
- Updated the unique key in `rag__unified_document` to include `chunk_index`. Previously, the unique key was a combination of only `document_id`, `platform`, and `source_relation`, which was potentially inaccurate if there were multiple chunks associated with a document.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would recommend we classify this as a bug fix instead of an under the hood change. This will have an effect on customer data and we should classify it more than an under the hood change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, updated

README.md Show resolved Hide resolved
dbt_project.yml Outdated Show resolved Hide resolved
Copy link
Collaborator

@fivetran-joemarkiewicz fivetran-joemarkiewicz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fivetran-reneeli approved with just a two small requests before moving to release review.

CHANGELOG.md Outdated Show resolved Hide resolved
dbt_project.yml Outdated Show resolved Hide resolved
integration_tests/dbt_project.yml Outdated Show resolved Hide resolved
@@ -0,0 +1,2 @@
id,type,_fivetran_synced,portal_id
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason only a subset of seed columns are being run and tested here? Can we add all the relevant columns brought into the staging model?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's a great point. i initially mocked this up from the internal schema with the new version of tables that i was testing with. I added in the rest of the columns!

The ID of the engagement's owner.

PLEASE NOTE - This field will only be populated for pre HubSpot v3 API versions. This field is only included to allow for backwards compatibility between HubSpot API versions. This field will be deprecated in the near future.
- name: portal_id
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't forget to document source_relation!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks! updated

CHANGELOG.md Outdated Show resolved Hide resolved
CHANGELOG.md Outdated Show resolved Hide resolved
CHANGELOG.md Outdated Show resolved Hide resolved
Copy link
Contributor

@fivetran-avinash fivetran-avinash left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fivetran-reneeli Thanks for the PR! A few comments and a question about the seed file before approving.

Co-authored-by: Avinash Kunnath <[email protected]>
fivetran-reneeli and others added 4 commits November 27, 2024 18:50
Co-authored-by: Avinash Kunnath <[email protected]>
Co-authored-by: Avinash Kunnath <[email protected]>
@fivetran-reneeli
Copy link
Contributor Author

Thanks @fivetran-avinash ! Addressed comments. Also, I noticed that I didn't update the hubspot seed data with the new category field. Even though it doesn't play a part downstream, I figured to update the seed data and get-columns macro anyways. Additionally, I noticed some missing field descriptions in the rest of the hubspot staging models, so I added those in as well.

- Updated the `unique_id` in `rag__unified_document` to include `chunk_index`. Previously, the unique key was a combination of only `document_id`, `platform`, and `source_relation`, which was potentially inaccurate if there were multiple chunks associated with a document.

## Under the Hood
- Updated the *hubspot_x* seed data and *get_hubspot_x_columns* macros with the new `category` field where relevant.
Copy link
Contributor

@fivetran-avinash fivetran-avinash Nov 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Updated the *hubspot_x* seed data and *get_hubspot_x_columns* macros with the new `category` field where relevant.
- Updated the `hubspot_*` seed data and `get_hubspot_*_columns` macros with the new `category` field where relevant.

Small suggestion update.

Copy link
Contributor

@fivetran-avinash fivetran-avinash left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @fivetran-reneeli for doing this quick review!

Good call out on adding category to the relevant seed files. However, I noticed category is not being brought into the staging models, and we need to account for the joins in int_rag_hubspot__deal_document, as was discussed in [#10] (I presume it's part of this PR since it's linked to the issue). Sorry for not catching this in the first go-around.

@fivetran-joemarkiewicz Let's discuss Monday if we want to try and fold these changes back in this sprint or just release as is.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants