-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(data-warehouse): integrating data warehouse with trends insight #20320
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good, I'm guessing DataWarehouseTrendsQueryBuilder
was mostly copied over with a few changes so far?
Will the types of aggregations we support be reduced? e.g. does "Weekly Active Users" on a data warehouse series make sense? Hopefully from this, we can reduce some the complexity around aggregations and breakdowns.
posthog/hogql_queries/insights/trends/data_warehouse_trends_query_builder.py
Outdated
Show resolved
Hide resolved
posthog/hogql_queries/insights/trends/test/test_data_warehouse_query_builder.py
Show resolved
Hide resolved
timings=timings, | ||
) | ||
|
||
def create_parquet_file(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll likely need to add some higher-level helper funcs somewhere in our testing framework for doing this kinda stuff. I imagine I'm gonna need the same when doing the table linking too
Yep, trying to share as many code paths as possible. I need to figure out if it's realistic to also have an "actor" id mapping which would allow for all the unique user math aggregations |
…hog into dw-test-insight-integration
📸 UI snapshots have been updated1 snapshot changes in total. 0 added, 1 modified, 0 deleted:
Triggered by this commit. |
…hog into dw-test-insight-integration
📸 UI snapshots have been updated1 snapshot changes in total. 0 added, 1 modified, 0 deleted:
Triggered by this commit. |
📸 UI snapshots have been updated1 snapshot changes in total. 0 added, 1 modified, 0 deleted:
Triggered by this commit. |
…hog into dw-test-insight-integration
📸 UI snapshots have been updated1 snapshot changes in total. 0 added, 1 modified, 0 deleted:
Triggered by this commit. |
📸 UI snapshots have been updated1 snapshot changes in total. 0 added, 1 modified, 0 deleted:
Triggered by this commit. |
…hog into dw-test-insight-integration
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking gooood, 🥳
📸 UI snapshots have been updated1 snapshot changes in total. 0 added, 1 modified, 0 deleted:
Triggered by this commit. |
…hog into dw-test-insight-integration
…hog into dw-test-insight-integration
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had a look through the code. A lot of it is a much needed change. However I think we can have an easier time and get rid of some of the duplicated trends code, if we instead make a virtual events
table (events_from_stripe
) during query time, and then just query against that. This table would be set up with HogQL's custom fields that'll do a bit of proxying. Mapping person_id
, distinct_id
and timestamp
would already enable matching all person properties. The properties
field could be a special object that gives access to all other fields on the table, now with a custom data picker in the frontend. Passing the field mapping down to the HogQL layer should make all existing insights work nicely with minimal modifications and no special runners. At least in theory 😅 Slack thread
Problem
Changes
Design Decision
@Gilbert09 and I discussed this and figured that for the purposes of getting this functionality in trends, splitting at the trends query builder level would be the clearest because:
@timgl suggests we find a place to fit data warehouse table acceptance where we wouldn't need to repeat/split out logic more such as within the hogql parser itself