Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(insights): launch funnels as a Clickhouse UDF behind a feature flag #23587

Merged
merged 201 commits into from
Sep 5, 2024

Conversation

aspicer
Copy link
Contributor

@aspicer aspicer commented Jul 9, 2024

Problem

Funnels are not working with more than about 12 steps. They show exponential time growth with each step.

Changes

This is an experiment to rewrite funnels to run in a UDF. This PR launches that feature behind a feature-flag.

What's a UDF? It's a user-defined function. You can define a function and implement it in any language of your choosing. Clickhouse launches your function and has it sit idle, preloaded and hot, waiting for input on stdin. When you call the function from Clickhouse, it pipes the data to the function and listens for its output on stdout.

Why a UDF? This makes the funnels code a lot simpler and easier to reason about. The core of the functionality is in a small python function.

What are the downsides of this approach? The biggest one is that UDFs are slower than native clickhouse. You shouldn't use an UDF for anything clickhouse does well natively. A lot of this speed difference can be mitigated by optimizing your UDF (writing it in C / C++ for example).

How does it work

We use clickhouse to turn events into mostly the same structure we have now - a set of matching steps and exclusions.

The core of the code is in aggregate_funnel.py. For each aggregation_target, it iterates through all the matching events in time order, keeps track of funnel progress, and returns timings and results.

At the end, we use Clickhouse to do a couple aggregations on breakdowns and to calculate averages.

At least locally, it runs much faster than the existing queries.

Product Questions (about strict and unordered funnels)

While working on this, I dug a bit into usage of strict and unordered funnel. Both have shockingly low usage at 0.3% and 1.2% respectively.
select coalesce(di.filters->>'funnel_order_type', 'ordered') as fot, count(*) from posthog_dashboarditem as di where di.filters->>'insight' = 'FUNNELS' group by fot
image

select coalesce(di.filters->>'funnel_order_type', 'ordered') as fot, coalesce(di.filters->>'funnel_viz_type', 'steps') as viz, count(*) from posthog_dashboarditem as di where di.filters->>'insight' = 'FUNNELS' group by (fot, viz) order by count(*) desc
image

Strict mode is supposed to not allow any events between steps in the funnel. In reality, this doesn't make sense - if you start tracking something new, it could break all your funnel data. Strict was launched in July of 2021, so it's not exactly new. The fact the usage is so low isn't a great sign.

Unordered mode was also launched around the same time, and maybe has a plausible use case but I think needs some product focus and changes if we're going to support it. An example of an improvement that would make it more useful would be allowing sets of events to be unordered, vs having unordered be a global toggle for the whole funnel. Allowing sets of unordered events would allow people to track out of order user behavior, potentially gated on things like "checkout" vs just having a somewhat aimless unordered funnel that just tells you how many steps they completed.

I think maybe we could think about moving these to being deprecated. Next steps here would be looking to see if any of these strict or unordered funnels get frequent traffic. Thoughts?

Further Questions

How does this work at scale? It runs quickly (much quicker than the old code) locally, but how does it scale for the largest funnels we have?

Follow up

Assuming that it works at scale, follow up items are

  • Add support for (or remove) unordered mode
  • Port actors to the new queries (should we always calculate actors so that we can return it faster?)
  • Performance improvements. Optimize the tight loop / rewrite it in a faster language (C, rust)
  • Clean up legacy code

Does this work well for both Cloud and self-hosted?

It might take some work to figure out how to deploy this for Cloud.

How did you test this code?

Unit testing. Local testing comparing the old funnel to the new funnel on dev.

@posthog-bot
Copy link
Contributor

📸 UI snapshots have been updated

1 snapshot changes in total. 0 added, 1 modified, 0 deleted:

  • chromium: 0 added, 1 modified, 0 deleted (diff for shard 2)
  • webkit: 0 added, 0 modified, 0 deleted

Triggered by this commit.

👉 Review this PR's diff of snapshots.

@posthog-bot
Copy link
Contributor

📸 UI snapshots have been updated

1 snapshot changes in total. 0 added, 1 modified, 0 deleted:

  • chromium: 0 added, 1 modified, 0 deleted (diff for shard 2)
  • webkit: 0 added, 0 modified, 0 deleted

Triggered by this commit.

👉 Review this PR's diff of snapshots.

@aspicer aspicer merged commit 7fa73a8 into master Sep 5, 2024
93 checks passed
@aspicer aspicer deleted the aspicer/udf branch September 5, 2024 18:40
timgl pushed a commit that referenced this pull request Sep 10, 2024
…lag (#23587)

Co-authored-by: github-actions <41898282+github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
waiting Prevents stale-bot from marking the PR as stale.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants