Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(data-warehouse): integrating data warehouse with trends insight #20320

Merged
merged 91 commits into from
Feb 29, 2024

Conversation

EDsCODE
Copy link
Member

@EDsCODE EDsCODE commented Feb 13, 2024

Problem

  • data warehouse data can't be used outside of SQL right now

Changes

  • attempt to connect data to insight trends
  • adds a data warehouse trends query builder
  • breakdowns and property filters work
  • unique aggregations and actor queries not supported yet

Design Decision

  • @Gilbert09 and I discussed this and figured that for the purposes of getting this functionality in trends, splitting at the trends query builder level would be the clearest because:

    • Query builders are responsible for converting filter object/json into an AST. AST is passed into the hogql parser/printer which provides the query output
    • The current query builders assemble the ast with "events" table everywhere and events table has a specific schema that isn't "generalized". For example, properties is a giant object that's parsed whereas in data warehouse tables, properties are often flattened and not in a giant object
    • Drawbacks:
      • will need to do this logic splitting in many places (everywhere where "events" is explicitly parsed)
  • @timgl suggests we find a place to fit data warehouse table acceptance where we wouldn't need to repeat/split out logic more such as within the hogql parser itself

    • I don't think the parser is the appropriate place to handle this because we would be passing in "events" and have unintuitive logic that infers when it actually spit out a data warehouse table which would make the parser abstraction pretty muddy.

@EDsCODE EDsCODE requested a review from Gilbert09 February 15, 2024 04:06
Copy link
Member

@Gilbert09 Gilbert09 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good, I'm guessing DataWarehouseTrendsQueryBuilder was mostly copied over with a few changes so far?

Will the types of aggregations we support be reduced? e.g. does "Weekly Active Users" on a data warehouse series make sense? Hopefully from this, we can reduce some the complexity around aggregations and breakdowns.

timings=timings,
)

def create_parquet_file(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll likely need to add some higher-level helper funcs somewhere in our testing framework for doing this kinda stuff. I imagine I'm gonna need the same when doing the table linking too

@EDsCODE
Copy link
Member Author

EDsCODE commented Feb 15, 2024

Yep, trying to share as many code paths as possible. I need to figure out if it's realistic to also have an "actor" id mapping which would allow for all the unique user math aggregations

@posthog-bot
Copy link
Contributor

📸 UI snapshots have been updated

1 snapshot changes in total. 0 added, 1 modified, 0 deleted:

  • chromium: 0 added, 1 modified, 0 deleted (diff for shard 2)
  • webkit: 0 added, 0 modified, 0 deleted

Triggered by this commit.

👉 Review this PR's diff of snapshots.

@posthog-bot
Copy link
Contributor

📸 UI snapshots have been updated

1 snapshot changes in total. 0 added, 1 modified, 0 deleted:

  • chromium: 0 added, 1 modified, 0 deleted (diff for shard 2)
  • webkit: 0 added, 0 modified, 0 deleted

Triggered by this commit.

👉 Review this PR's diff of snapshots.

@posthog-bot
Copy link
Contributor

📸 UI snapshots have been updated

1 snapshot changes in total. 0 added, 1 modified, 0 deleted:

  • chromium: 0 added, 1 modified, 0 deleted (diff for shard 2)
  • webkit: 0 added, 0 modified, 0 deleted

Triggered by this commit.

👉 Review this PR's diff of snapshots.

@posthog-bot
Copy link
Contributor

📸 UI snapshots have been updated

1 snapshot changes in total. 0 added, 1 modified, 0 deleted:

  • chromium: 0 added, 1 modified, 0 deleted (diff for shard 2)
  • webkit: 0 added, 0 modified, 0 deleted

Triggered by this commit.

👉 Review this PR's diff of snapshots.

@posthog-bot
Copy link
Contributor

📸 UI snapshots have been updated

1 snapshot changes in total. 0 added, 1 modified, 0 deleted:

  • chromium: 0 added, 1 modified, 0 deleted (diff for shard 2)
  • webkit: 0 added, 0 modified, 0 deleted

Triggered by this commit.

👉 Review this PR's diff of snapshots.

@EDsCODE EDsCODE requested a review from Gilbert09 February 23, 2024 05:09
Copy link
Member

@Gilbert09 Gilbert09 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking gooood, 🥳

@posthog-bot
Copy link
Contributor

📸 UI snapshots have been updated

1 snapshot changes in total. 0 added, 1 modified, 0 deleted:

  • chromium: 0 added, 1 modified, 0 deleted (diff for shard 2)
  • webkit: 0 added, 0 modified, 0 deleted

Triggered by this commit.

👉 Review this PR's diff of snapshots.

@EDsCODE EDsCODE requested a review from mariusandra February 29, 2024 17:59
Copy link
Collaborator

@mariusandra mariusandra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had a look through the code. A lot of it is a much needed change. However I think we can have an easier time and get rid of some of the duplicated trends code, if we instead make a virtual events table (events_from_stripe) during query time, and then just query against that. This table would be set up with HogQL's custom fields that'll do a bit of proxying. Mapping person_id, distinct_id and timestamp would already enable matching all person properties. The properties field could be a special object that gives access to all other fields on the table, now with a custom data picker in the frontend. Passing the field mapping down to the HogQL layer should make all existing insights work nicely with minimal modifications and no special runners. At least in theory 😅 Slack thread

@EDsCODE EDsCODE merged commit a1c21f9 into master Feb 29, 2024
89 checks passed
@EDsCODE EDsCODE deleted the dw-test-insight-integration branch February 29, 2024 21:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants