Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(insights): HogQL calculation of saved legacy insights v2 #21595

Merged
merged 26 commits into from
Apr 22, 2024

Conversation

Twixes
Copy link
Member

@Twixes Twixes commented Apr 17, 2024

Changes

Revert of #21590, with one change two changes added:

  1. insight serialization will now only use cached results, to match the behavior of the legacy system, preventing ClickHouse overload (5c295c8)
  2. dashboard filters are now taken into account, with proper validation

@Twixes Twixes requested review from timgl and a team April 17, 2024 07:49
@Twixes Twixes force-pushed the hogql-calculation-bis branch from bb87101 to 7a8db16 Compare April 17, 2024 13:29
@Twixes Twixes requested a review from webjunkie April 17, 2024 17:27
@Twixes Twixes force-pushed the hogql-calculation-bis branch from aa0de3e to f4db229 Compare April 17, 2024 20:34
@PostHog PostHog deleted a comment from posthog-bot Apr 17, 2024
@PostHog PostHog deleted a comment from posthog-bot Apr 17, 2024
@PostHog PostHog deleted a comment from github-actions bot Apr 17, 2024
posthog/api/insight.py Dismissed Show dismissed Hide dismissed
package.json Outdated
@@ -34,7 +34,8 @@
"build:esbuild": "node frontend/build.mjs",
"schema:build": "pnpm run schema:build:json && pnpm run schema:build:python",
"schema:build:json": "ts-node bin/build-schema.mjs && prettier --write frontend/src/queries/schema.json",
"schema:build:python": "datamodel-codegen --class-name='SchemaRoot' --collapse-root-models --target-python-version 3.10 --disable-timestamp --use-one-literal-as-default --use-default --use-default-kwarg --use-subclass-enum --input frontend/src/queries/schema.json --input-file-type jsonschema --output posthog/schema.py --output-model-type pydantic_v2.BaseModel && ruff format posthog/schema.py",
"schema:build:python": "datamodel-codegen --class-name='SchemaRoot' --collapse-root-models --target-python-version 3.10 --disable-timestamp --use-one-literal-as-default --use-default --use-default-kwarg --use-subclass-enum --input frontend/src/queries/schema.json --input-file-type jsonschema --output posthog/schema.py --output-model-type pydantic_v2.BaseModel && ruff format posthog/schema.py && pnpm schema:build:python:fix-up-enum",
"schema:build:python:fix-up-enum": "sed -i '' -e 's/Optional\\[PropertyOperator\\] = \\(\"[A-Za-z_]*\"\\)/Optional[PropertyOperator] = PropertyOperator(\\1)/g' posthog/schema.py # THIS SED INVOCATION IS A MASSIVE HACK - remove when datamodel-codegen properly supports enums",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Context: We need EventPropertyFilter.operator to have a default value, to support the historic assumption that "exact" is the default operator. But that default value needs to be an enum member (PropertyOperator.exact), which datamodel-codegen doesn't support

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we move that into a shell file? (See build:json above where I also moved to a file cause of more logic)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's definitely a good idea at this point! Done

@PostHog PostHog deleted a comment from posthog-bot Apr 18, 2024
@PostHog PostHog deleted a comment from posthog-bot Apr 18, 2024
@PostHog PostHog deleted a comment from github-actions bot Apr 18, 2024
Copy link
Contributor

github-actions bot commented Apr 18, 2024

Size Change: 0 B

Total Size: 1.01 MB

ℹ️ View Unchanged
Filename Size
frontend/dist/toolbar.js 1.01 MB

compressed-size-action

@Twixes Twixes force-pushed the hogql-calculation-bis branch from ce321cd to 79a61d3 Compare April 18, 2024 16:50
Twixes added 4 commits April 19, 2024 00:01
Wow, this was a pain to figure out, only was an issue in CI, because the trigger was `TestCohort::test_creating_update_and_calculating_with_new_cohort_query` running prior to `TestInsight:: test_insight_refreshing_query` – had to use trial and error.
@Twixes Twixes force-pushed the hogql-calculation-bis branch from 281d302 to 4770169 Compare April 18, 2024 22:01
@PostHog PostHog deleted a comment from posthog-bot Apr 18, 2024
@PostHog PostHog deleted a comment from posthog-bot Apr 18, 2024
Copy link
Contributor

@webjunkie webjunkie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems alright in general. Some notes though.

Another thing I'm wondering for later:
If I go to an insight to view it, and it has been updated "1 day ago". I can click refresh there and it'll make just the request to /query and then tell me "Computed a few seconds ago". But since we don't store this in the insight, upon refresh it'll be a day old again.
Not sure how to tie this in together... we probably cannot be sure the query is exactly the umodified insight and such 😩

package.json Outdated
@@ -34,7 +34,8 @@
"build:esbuild": "node frontend/build.mjs",
"schema:build": "pnpm run schema:build:json && pnpm run schema:build:python",
"schema:build:json": "ts-node bin/build-schema.mjs && prettier --write frontend/src/queries/schema.json",
"schema:build:python": "datamodel-codegen --class-name='SchemaRoot' --collapse-root-models --target-python-version 3.10 --disable-timestamp --use-one-literal-as-default --use-default --use-default-kwarg --use-subclass-enum --input frontend/src/queries/schema.json --input-file-type jsonschema --output posthog/schema.py --output-model-type pydantic_v2.BaseModel && ruff format posthog/schema.py",
"schema:build:python": "datamodel-codegen --class-name='SchemaRoot' --collapse-root-models --target-python-version 3.10 --disable-timestamp --use-one-literal-as-default --use-default --use-default-kwarg --use-subclass-enum --input frontend/src/queries/schema.json --input-file-type jsonschema --output posthog/schema.py --output-model-type pydantic_v2.BaseModel && ruff format posthog/schema.py && pnpm schema:build:python:fix-up-enum",
"schema:build:python:fix-up-enum": "sed -i '' -e 's/Optional\\[PropertyOperator\\] = \\(\"[A-Za-z_]*\"\\)/Optional[PropertyOperator] = PropertyOperator(\\1)/g' posthog/schema.py # THIS SED INVOCATION IS A MASSIVE HACK - remove when datamodel-codegen properly supports enums",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we move that into a shell file? (See build:json above where I also moved to a file cause of more logic)

],
)
self.assertEqual(response["last_refresh"], "2012-01-16T05:01:34Z")
self.assertEqual(response["last_modified_at"], "2012-01-15T04:01:34Z") # did not change
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, does that mean it didn't change from the non-dashboard one? Shouldn't it be different? Or it didn't change since creation? In the latter case we might adjust the case to make it clear we get a different not-change for dashboards 🙈

Copy link
Member Author

@Twixes Twixes Apr 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is pretty valid, because the underlying definition of the insight hasn't changed. The dashboard config is just overlaid on top and kind of ephemeral. Unless I misunderstood?

@@ -313,8 +335,12 @@ def run(self, refresh_requested: Optional[bool] = None) -> CachedQueryResponse:
return cached_response
else:
QUERY_CACHE_HIT_COUNTER.labels(team_id=self.team.pk, cache_hit="stale").inc()
if execution_mode == ExecutionMode.CACHE_ONLY:
return cached_response
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to set is_cached = True here as well? 🤔

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, yes, reworked and hopefully clarified this code overall

@Twixes Twixes force-pushed the hogql-calculation-bis branch from 0fdf1e2 to af5670b Compare April 19, 2024 12:44
Clarifies the `cached_response.is_cached = True` situation.
@Twixes Twixes force-pushed the hogql-calculation-bis branch from af5670b to e457fd4 Compare April 19, 2024 13:15
@posthog-bot
Copy link
Contributor

📸 UI snapshots have been updated

2 snapshot changes in total. 0 added, 2 modified, 0 deleted:

  • chromium: 0 added, 0 modified, 0 deleted
  • webkit: 0 added, 2 modified, 0 deleted (diff for shard 2)

Triggered by this commit.

👉 Review this PR's diff of snapshots.

@Twixes Twixes force-pushed the hogql-calculation-bis branch from 239b23b to 0d86acb Compare April 19, 2024 18:13
@Twixes
Copy link
Member Author

Twixes commented Apr 19, 2024

But since we don't store this in the insight, upon refresh it'll be a day old again.

This is exactly what should be addressed here. Just added a test to make sure the two endpoints indeed use the same cache, unlike previously.

@Twixes Twixes merged commit 3e2d28f into master Apr 22, 2024
106 checks passed
@Twixes Twixes deleted the hogql-calculation-bis branch April 22, 2024 08:30
Copy link

sentry-io bot commented Apr 22, 2024

Suspect Issues

This pull request was deployed and Sentry observed the following issues:

  • ‼️ CHQueryErrorUnknownIdentifier: Missing columns: '--events__override.person_id' '--events__override.distinct_id' while processi... /api/projects/{parent_lookup_team_id}/insights/... View Issue
  • ‼️ ValidationError: 2 validation errors for PropertyGroupFilter /api/projects/{parent_lookup_team_id}/dashboard... View Issue
  • ‼️ ValidationError: 2 validation errors for PropertyGroupFilter /api/projects/{parent_lookup_team_id}/insights/... View Issue
  • ‼️ ValidationError: 2 validation errors for PropertyGroupFilter /api/projects/{parent_lookup_team_id}/dashboard... View Issue
  • ‼️ NotImplementedError: EventsQuery does not support dashboard filters out of the box /api/projects/{parent_lookup_team_id}/insights/... View Issue

Did you find this useful? React with a 👍 or 👎

Twixes added a commit that referenced this pull request Apr 22, 2024
Twixes added a commit that referenced this pull request Apr 22, 2024
Revert "fix(insights): HogQL calculation of saved legacy insights v2 (#21595)"

This reverts commit 3e2d28f.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants