-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(insights): launch funnels as a Clickhouse UDF behind a feature flag #23587
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
📸 UI snapshots have been updated1 snapshot changes in total. 0 added, 1 modified, 0 deleted:
Triggered by this commit. |
📸 UI snapshots have been updated1 snapshot changes in total. 0 added, 1 modified, 0 deleted:
Triggered by this commit. |
timgl
pushed a commit
that referenced
this pull request
Sep 10, 2024
…lag (#23587) Co-authored-by: github-actions <41898282+github-actions[bot]@users.noreply.github.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Problem
Funnels are not working with more than about 12 steps. They show exponential time growth with each step.
Changes
This is an experiment to rewrite funnels to run in a UDF. This PR launches that feature behind a feature-flag.
What's a UDF? It's a user-defined function. You can define a function and implement it in any language of your choosing. Clickhouse launches your function and has it sit idle, preloaded and hot, waiting for input on stdin. When you call the function from Clickhouse, it pipes the data to the function and listens for its output on stdout.
Why a UDF? This makes the funnels code a lot simpler and easier to reason about. The core of the functionality is in a small python function.
What are the downsides of this approach? The biggest one is that UDFs are slower than native clickhouse. You shouldn't use an UDF for anything clickhouse does well natively. A lot of this speed difference can be mitigated by optimizing your UDF (writing it in C / C++ for example).
How does it work
We use clickhouse to turn events into mostly the same structure we have now - a set of matching steps and exclusions.
The core of the code is in
aggregate_funnel.py
. For each aggregation_target, it iterates through all the matching events in time order, keeps track of funnel progress, and returns timings and results.At the end, we use Clickhouse to do a couple aggregations on breakdowns and to calculate averages.
At least locally, it runs much faster than the existing queries.
Product Questions (about strict and unordered funnels)
While working on this, I dug a bit into usage of strict and unordered funnel. Both have shockingly low usage at 0.3% and 1.2% respectively.
select coalesce(di.filters->>'funnel_order_type', 'ordered') as fot, count(*) from posthog_dashboarditem as di where di.filters->>'insight' = 'FUNNELS' group by fot
select coalesce(di.filters->>'funnel_order_type', 'ordered') as fot, coalesce(di.filters->>'funnel_viz_type', 'steps') as viz, count(*) from posthog_dashboarditem as di where di.filters->>'insight' = 'FUNNELS' group by (fot, viz) order by count(*) desc
Strict mode is supposed to not allow any events between steps in the funnel. In reality, this doesn't make sense - if you start tracking something new, it could break all your funnel data. Strict was launched in July of 2021, so it's not exactly new. The fact the usage is so low isn't a great sign.
Unordered mode was also launched around the same time, and maybe has a plausible use case but I think needs some product focus and changes if we're going to support it. An example of an improvement that would make it more useful would be allowing sets of events to be unordered, vs having unordered be a global toggle for the whole funnel. Allowing sets of unordered events would allow people to track out of order user behavior, potentially gated on things like "checkout" vs just having a somewhat aimless unordered funnel that just tells you how many steps they completed.
I think maybe we could think about moving these to being deprecated. Next steps here would be looking to see if any of these strict or unordered funnels get frequent traffic. Thoughts?
Further Questions
How does this work at scale? It runs quickly (much quicker than the old code) locally, but how does it scale for the largest funnels we have?
Follow up
Assuming that it works at scale, follow up items are
Does this work well for both Cloud and self-hosted?
It might take some work to figure out how to deploy this for Cloud.
How did you test this code?
Unit testing. Local testing comparing the old funnel to the new funnel on dev.