Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: only send product_intent events if not activated #25889

Merged
merged 20 commits into from
Nov 1, 2024

Conversation

raquelmsmith
Copy link
Member

@raquelmsmith raquelmsmith commented Oct 29, 2024

Problem

I went to make an activation query for data warehouse and fell down a rabbit hole.

We don't expect most users to show intent for our data warehouse product when they sign up (though we are getting a fair amount of people clicking it in onboarding!). So we need signals from inside the app that say someone is interested in a product. That's easy - just record a ProductIntent when someone clicks a certain button.

But then:

  • This button is part of normal use of the product
  • Thus, if we use this for our activation queries, already activated teams will re-enter our activation funnel
  • We'd get inflated activation numbers

So I needed a solution to allow us to use app signals before activation, but not after successful activation, and send events for product intents only if they haven't activated yet.

Changes

Basically, this starts calculating activation for the data warehouse product using data stored in our database, not event data. This is fine for data warehouse, I'm not 100% sure how well it will extend to the other products, but that's a bridge I can cross when I need to (eg we can (cautiously) theoretically query team 2 event data).

If a ProductIntent has activated_at, then we don't send any events for user showed product intent, which is what our activation queries are based off of.

👉 Stay up-to-date with PostHog coding conventions for a smoother review.

Does this work well for both Cloud and self-hosted?

How did you test this code?

Need to write tests and do migrations

@raquelmsmith raquelmsmith changed the title and not product_intent.activated_at feat: only send product_intent events if not activated Oct 29, 2024
Copy link
Contributor

github-actions bot commented Oct 29, 2024

Size Change: 0 B

Total Size: 1.15 MB

ℹ️ View Unchanged
Filename Size
frontend/dist/toolbar.js 1.15 MB

compressed-size-action

Copy link
Contributor

@zlwaterfield zlwaterfield left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thus, if we use this for our activation queries, already activated teams will re-enter our activation funnel

I might be missing something by why can't we just make the queries distinct and only count the first time they do it so they don't re-enter the full and get counted more than once?

I'm also not sure I understand why we have the product intent table instead of events and then queries in PostHog? Is it so we can use the information to change the product experience? If so, is this related to the funnel or just state management and the activation funnel is kinda unrelated?

if insight.query and insight.query.get("source", {}).get("query"):
query_text = insight.query["source"]["query"].lower()
# Check if query doesn't contain any of the excluded tables after 'from'
has_excluded_table = any(f"from {table}" in query_text.replace("\\", "") for table in excluded_tables)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this part. Why are these tables excluded? Are they just the stock ones provided by PostHog? So if not excluded, then it's assumed to be one of their tables?

This could be optimized so it's all done in the query:

    excluded_tables = ["events", "persons", "sessions", "person_distinct_ids"]
    
    # Create a list of Q objects for each excluded table
    excluded_patterns = [
        Q(query__source__query__icontains=f"from {table}") 
        for table in excluded_tables
    ]
    
    # Combine all exclusion patterns with OR operator
    exclusion_filter = reduce(operator.or_, excluded_patterns)
    
    # Single optimized query that:
    # 1. Filters by team and date
    # 2. Filters by query kind
    # 3. Excludes queries containing any excluded table
    # 4. Checks if any matching record exists
    return Insight.objects.filter(
        team=self.team,
        created_at__gte=datetime(2024, 6, 1, tzinfo=UTC),
        query__kind="DataVisualizationNode",
    ).exclude(
        exclusion_filter
    ).filter(
        query__source__query__isnull=False
    ).exists()
    ```

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This query is what Eric gave me for data warehouse activation. I'll look at your optimization! I haven't tested that this works at all yet haha.

can use the `activated_at` field to know if we should continue to update the product
intent row, or if we should stop because it's just regular usage.

The `activated_at` field is set by checking against certain criteria that differs for
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is clever.

@raquelmsmith
Copy link
Member Author

I might be missing something by why can't we just make the queries distinct and only count the first time they do it so they don't re-enter the full and get counted more than once?

It's possible that someone shows intent, but then finds out they don't need the product right now. In 6 months, they now have more data, etc, so they can use the product. So they show intent again. If we only used the first event, we would't "see" them in our activation funnel 6 months later.

A concrete example is:

  • Someone signs up, says they're interested in DW
  • They have to get product analytics set up first and foremost, and that takes a while (>30 days).
  • Now they want to go set up DW, but because their first intent was longer ago than the conversion window for the funnel, we never see them in our funnel.

I'm also not sure I understand why we have the product intent table instead of events and then queries in PostHog? Is it so we can use the information to change the product experience? If so, is this related to the funnel or just state management and the activation funnel is kinda unrelated?

Right now events + queries is enough (though with this post-onboarding flow might not be because we can't store that an org has already activated). BUT in the future I want to use this info in the UI and in emails. Ie. if someone shows intent for a product by clicking on a certain button, then we can know that in the db and show tailored info to them in the quickstart panel, highlight areas of the app, etc.

@posthog-bot
Copy link
Contributor

📸 UI snapshots have been updated

2 snapshot changes in total. 0 added, 2 modified, 0 deleted:

  • chromium: 0 added, 2 modified, 0 deleted (diff for shard 1)
  • webkit: 0 added, 0 modified, 0 deleted

Triggered by this commit.

👉 Review this PR's diff of snapshots.

@posthog-bot
Copy link
Contributor

📸 UI snapshots have been updated

2 snapshot changes in total. 0 added, 2 modified, 0 deleted:

  • chromium: 0 added, 2 modified, 0 deleted (diff for shard 1)
  • webkit: 0 added, 0 modified, 0 deleted

Triggered by this commit.

👉 Review this PR's diff of snapshots.

Copy link
Contributor

@zlwaterfield zlwaterfield left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given your descriptions I think this solution makes sense.

Side note: this is an interesting case where we want analytics in the application scope to use those events to customize the experience. I think this is a very popular thing and a feature we should build for our customers. Like create views in DW that cache data that they can query in their API to load context in for a customer. I think it's probably possible now but convoluted and hacky.

@raquelmsmith raquelmsmith enabled auto-merge (squash) November 1, 2024 04:03
@raquelmsmith raquelmsmith merged commit 54efcdc into master Nov 1, 2024
96 checks passed
@raquelmsmith raquelmsmith deleted the feat/product-intents-recored-activation branch November 1, 2024 16:36
Copy link

sentry-io bot commented Nov 4, 2024

Suspect Issues

This pull request was deployed and Sentry observed the following issues:

  • ‼️ Team.DoesNotExist: Team matching query does not exist. posthog.models.product_intent.product_intent.ca... View Issue
  • ‼️ OperationalError: connection failed: connection to server at "172.20.49.244", port 6543 failed: server closed the c... posthog.models.product_intent.product_intent.ca... View Issue
  • ‼️ OperationalError: Error -3 connecting to posthog-solo.txwsvb.ng.0001.use1.cache.amazonaws.com:6379. Temporary failu... /api/environments/{id}/ View Issue
  • ‼️ OperationalError: connection timeout expired posthog.models.product_intent.product_intent.ca... View Issue
  • ‼️ OperationalError: consuming input failed: query_wait_timeout posthog.models.product_intent.product_intent.ca... View Issue

Did you find this useful? React with a 👍 or 👎

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants