Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HogQL Insights #130

Open
3 of 48 tasks
thmsobrmlr opened this issue Sep 11, 2023 · 4 comments
Open
3 of 48 tasks

HogQL Insights #130

thmsobrmlr opened this issue Sep 11, 2023 · 4 comments

Comments

@thmsobrmlr
Copy link
Contributor

thmsobrmlr commented Sep 11, 2023

HogQL Insights

Current state of the HogQL conversion for insights and moving from filters-based insights to query-based insights.

What are we doing and why?

We are rewriting all our insights in HogQL, instead of raw ClickHouse SQL, which allows us to implement performance improvements and feature toggles (e.g. PoE modes) on this intermediate layer. This also allows us to expose the query to the end user, so they debug issues themselves or adapt queries to less frequent use cases.

In addition to the changes on the SQL layer we also change the way we store the insight configuration. Currently we have a mixin-based filters format (flat key-value structure) that became hard to maintain and doesn't allow reusage of sub-parts. The new query format (nested json) should allow copy-pasting parts and nesting "sources" in other queries to allow re-using the results throughout PostHog.

High-level plan of remaining steps

  1. Remove all filters from the frontend and use the frontend-side filterToQueryNode function to convert all api responses to the new query format (when fetching) and the queryNodeToFilter function to convert them back to filters for saving/duplicating/etc.
  2. Toggle the feature flag query-based-insights-saving, that then sends insights back with query, instead of filters for saving/duplicating/etc.
  3. Optional: Use the backend side filter_to_query function in the insights serializer to return a query from the backend (just this endpoint, as this is only for testing the filter_to_query function works as expected).
  4. Optional: Any changes we want to make to the query schema e.g. camel casing key that are still snake cased.
  5. If everything works, migrate the insights backend side (iterate over them and use the backend side filter_to_query function to replace filters with query).
  6. Migrate other entities that have insights e.g. activity log or notebooks.
  7. Convert experiments to HogQL.
  8. Convert cohorts to HogQL.
  9. Cleanup.

Remove frontend side dependency on filters

We can get rid of filters frontend side first by using the backend side filter_to_query function to return only queries from insights (and any other places that might return filters) and adapting the frontend so that it only handles queries. For saving insights we can use the frontend side queryNodeToFilters function to send finally send filters to the backend.

  • Use the backend side filter_to_query function to return only queries, not filters feat(hogql): replace filters with query backend side posthog#21945
  • Fix any places that might still rely on filters:
    • Make ActionFilter and entityFilterLogic based on series, not actions and events
    •  Remove filter based helpers from frontend/src/scenes/insights/sharedUtils.ts e.g. isTrendsFilter or isFilterWithDisplay
    • Remove usage of filtersToQueryNode, queryNodeToFilters, etc. in as many places as possible
    • Remove all usages of the cleanFilter function
    • Make experiments use HogQL queries
    • Migrate: activity log, insights, notebook nodes
    • Cleanup: Remove all getQueryBasedInsightModel

After migrating to backend side filters

For insights

  • replace InsightModel in subscriptionsLogic.test.ts
  • replace InsightModel in funnelDataLogic.test.ts
  • replace InsightModel in insightVizDataLogic.test.ts
  • replace InsightModel in trendsDataLogic.test.ts
  • convert all mocked insights in tests, stories etc. to be query based

For the activity log

For notebooks

Experiments backend

Experiments use the PA code backend side to generate trends/funnel results. We should swap out the legacy implementation for the HogQL one there as well.

  • Replace funnels in experiments backend side

Finalize query schema

At some point we want to run a migration to replace filters with queries. After that migration it will be harder to make changes to the query schema, meaning we should clean up the schema as good as we can now.

Unfortunately this got complicated by the fact that notebooks already save insights as queries and not filters. Thus they need additional handling in https://github.com/PostHog/posthog/blob/master/frontend/src/scenes/notebooks/Notebook/migrations/migrate.ts and we need to come up with a way to clean up tech debt there. The queries are stored both in the notebook nodes and in the activity log from which the user can go back in time.

Some proposed changes to the current query schema:

  • Camel case all properties
    • Camel case dateRange properties date_from and date_to
    • Camel case breakdownFilter properties
    • Camel case math properties in entities
    • Camel case and fix hidden_legend_indexes / hidden_legend_keys
      Currently the legend items can be hidden by index or by key depending on the insight and a bug prevents the hidden entries from being saved. We should agree on a single way to hide the entries and fix saving them with the query.
    • Camel case aggregation_group_type_index
    • Remove/fix breakdown_histogram_bin_count in trends filter properties
    • Replace legacy entities in retention insights with new Event/ActionNodes

Related bugs

Trends

Funnels

Retention

Cleanup

  • Update mocks to query based mocks

Make it flippin' amazing

@mariusandra mariusandra changed the title HogQL Queries HogQL Insights TODOs Dec 7, 2023
@mariusandra mariusandra changed the title HogQL Insights TODOs HogQL Insights Dec 7, 2023
@thmsobrmlr
Copy link
Contributor Author

thmsobrmlr commented Mar 1, 2024

Consolidation 1.3.2024

Step 1: Make it work

PostHog/posthog#17295 & PostHog/posthog#17414 & PostHog/posthog#17440

To have a scaffold for other queries and to enable discussion on implementation details, we want to have the lifecycle query ported over to HogQL from frontend to the backend and back i.e.

  • The frontend sends a LifecycleQuery node to the backend
  • The backend takes this query node and converts it to a HogQL query
  • Lifecycle test account filters
  • Global filters
  • The frontend displays the lifecycle insight from this response

Step 2. Make it good

Step 3. Make it complete

image

@MarconLP
Copy link
Member

@MarconLP
Copy link
Member

@thmsobrmlr
Copy link
Contributor Author

Consolidated post-HogQL-insights points on 2024-04-23 & 2024-06-05

Trends

Known issues with the HogQL implementation of trends:

Funnels

Known issues with the HogQL implementation of funnels:

Correlation

Lifecycle

Paths

  • Blocking: Writing and running a comparision script.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants