feat(hogql): implement basic caching for hogql insight queries #17483

thmsobrmlr · 2023-09-16T20:13:47Z

Problem

We're not caching calls to /query.

Changes

This PR implements a basic caching mechanism for insight queries on the /query endpoint.

Compared to current implementation:

Does not handle insight results. At the moment these still respond with a filter-based calculation, since we do not have the query on the backend. The game plan here is to (1) write a backend-side conversion function from filters -> query (2) make an internal request to the query endpoint with this query, for obtaining the cached/fresh result (our current implementation behaves differently depending on which api endpoint you call and I'd like to avoid that).
Does not sync the caching state with Postgres (InsightCachingState) - tbd if we still want background refreshes going forward.
Adds the team's timezone to the cache key payload, so that results are invalidated when changing the timezone.
Is not as robust against cache key changes, as it could be. For example properties: null and properties: [] have a different cache key. We can improve on that later on and the best way to do so would be to move the schema generation to the backend, so that we can use Pydantic validators.

In detail, this PR:

Sends a refresh param with api.query for HogQL queries
Adds a QueryResponse interface with is_cached and last_refresh attributes
Makes QueryRunner an abstract base class, adds methods for caching and adds tests
- The cache key is based on Pydantic v2 model_dump_json
- Instruments cache reads and writes with prometheus
- Makes process_query accept an optional request for determining wether we want to refresh
- Implements caching behaviour similar to what we have for insights minus everything where we need the insight context e.g. longer TTL for shared insights

Todos:

Add more tests

How did you test this code?

Added tests

# Conflicts: # frontend/__snapshots__/scenes-app-recordings--recordings-play-list-no-pinned-recordings.png

thmsobrmlr · 2023-09-19T14:43:37Z

This is ready for a review. I'm going to add some more tests for the caching behaviour later.

mariusandra

This looks great, and the gameplan sound solid to me. However a thought appeared: what if we moved the caching to the AST/HogQL/SQL level? Would that make anything simpler (more predictible), or would that cause problems with future plans such as partial reloading? Currently we'd lose in time as the query still needs to be parsed and generated, but assuming those things get taken care of, would there be any point in moving this caching up (or down? 🙃) a layer?

mariusandra · 2023-09-19T15:48:59Z

posthog/hogql_queries/lifecycle_query_runner.py

@@ -16,15 +20,12 @@

 class LifecycleQueryRunner(QueryRunner):
    query: LifecycleQuery
+    query_type = LifecycleQuery


mariusandra · 2023-09-19T16:00:40Z

posthog/hogql_queries/lifecycle_query_runner.py

+    def _is_stale(self, cached_result_package):
+        date_to = self.query_date_range.date_to()
+        interval = self.query_date_range.interval_name
+        return is_stale(self.team, date_to, interval, cached_result_package)
+
+    def _refresh_frequency(self):
+        date_to = self.query_date_range.date_to()
+        date_from = self.query_date_range.date_from()
+        interval = self.query_date_range.interval_name
+
+        delta_days: Optional[int] = None
+        if date_from and date_to:
+            delta = date_to - date_from
+            delta_days = ceil(delta.total_seconds() / timedelta(days=1).total_seconds())
+
+        refresh_frequency = BASE_MINIMUM_INSIGHT_REFRESH_INTERVAL
+        if interval == "hour" or (delta_days is not None and delta_days <= 7):
+            # The interval is shorter for short-term insights
+            refresh_frequency = REDUCED_MINIMUM_INSIGHT_REFRESH_INTERVAL
+
+        return refresh_frequency


This feels like something we could standardise across all queries into query_runner.py?

Yep. There are a few special cases e.g. RetentionFilter uses period instead of interval, so I thought it's best to look at each query individually first and the unify the handling. Easy to forget this place otherwise.

Also, should we cache other HogQL queries e.g. those based on a date range?

mariusandra · 2023-09-19T16:02:23Z

posthog/caching/utils.py

+def is_stale_filter(
+    team: Team, filter: Filter | RetentionFilter | StickinessFilter | PathFilter, cached_result: Any
+) -> bool:
+    interval = filter.period.lower() if isinstance(filter, RetentionFilter) else filter.interval


thmsobrmlr · 2023-09-19T18:46:45Z

This looks great, and the gameplan sound solid to me. However a thought appeared: what if we moved the caching to the AST/HogQL/SQL level?

You mean cache the ClickHouse response instead of the serialized output? I've thought about it before as a way to make persons modal responses more reliable (by caching the event query base - doesn't work, cache gets blown up). We could cache the final query output, but I don't see where that would improve things (I imagine we almost have a 1-to-1 mapping of query node to ClickHouse query). Wouldn't be difficult if there's a good reason to do it though.

mariusandra · 2023-09-19T20:46:15Z

I imagine we almost have a 1-to-1 mapping of query node to ClickHouse query

Yup, I imagine this as well. The case where it's not 1:1 will be when different query nodes with properties: null and properties: [] map both to the same ClickHouse query. Caching at the query level could help unify these. 🤷

mariusandra mentioned this pull request Sep 18, 2023

HogQL Insights PostHog/meta#130

Open

48 tasks

Base automatically changed from hogql-extend-schema to master September 18, 2023 13:05

thmsobrmlr added 6 commits September 18, 2023 15:08

feat(hogql): add cache_key for hogql queries

e299590

use insight query node to fix typings for kind

ee9ac19

fix tests

ab7912f

basic caching

42e4e39

wip

6199feb

fixes

1b943cb

thmsobrmlr force-pushed the hogql-query-cache branch from 905f1ec to 1b943cb Compare September 18, 2023 13:08

PostHog deleted a comment from posthog-bot Sep 18, 2023

github-actions bot and others added 3 commits September 18, 2023 13:19

Update UI snapshots for chromium (1)

6091fb0

send refresh from client

997fd09

Update UI snapshots for chromium (1)

dbe6364

PostHog deleted a comment from posthog-bot Sep 18, 2023

thmsobrmlr and others added 13 commits September 18, 2023 15:32

add QueryResponse interface

adb3e85

remove redundant query kind from cache key payload

6036a6e

remove cache_invalidation_key support

dd1ffb9

method namings

0ea1c1f

query response typings

fdac22e

select query typings

c0ed2e9

method access

59a1731

correct last_refresh handling

f6ac1a9

add timezone to cache key

5206e85

improve comment

b0c4b43

Update UI snapshots for chromium (1)

29edc2b

improve comment

058ec48

avoid numeric insights api for queries with hogql support

9054454

thmsobrmlr force-pushed the hogql-query-cache branch from 632cae6 to 9054454 Compare September 19, 2023 06:59

make process_query accept request instead of refresh_requested

29b88aa

thmsobrmlr added 3 commits September 19, 2023 15:17

implement next_allowed_client_refresh

14c6e81

Merge branch 'master' into hogql-query-cache

4692a1c

# Conflicts: # frontend/__snapshots__/scenes-app-recordings--recordings-play-list-no-pinned-recordings.png

fix types and tests

c8d1bde

thmsobrmlr marked this pull request as ready for review September 19, 2023 14:41

thmsobrmlr requested a review from mariusandra September 19, 2023 14:43

thmsobrmlr changed the title ~~feat(hogql): add cache_key for hogql queries~~ feat(hogql): implement basic caching for hogql insight queries Sep 19, 2023

mariusandra approved these changes Sep 19, 2023

View reviewed changes

thmsobrmlr merged commit c61a9d0 into master Sep 19, 2023
81 checks passed

thmsobrmlr deleted the hogql-query-cache branch September 19, 2023 18:59

thmsobrmlr mentioned this pull request Sep 19, 2023

test(hogql): add tests for hogql query cache #17538

Merged

daibhin pushed a commit that referenced this pull request Sep 21, 2023

feat(hogql): implement basic caching for hogql insight queries (#17483)

6730484

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(hogql): implement basic caching for hogql insight queries #17483

feat(hogql): implement basic caching for hogql insight queries #17483

thmsobrmlr commented Sep 16, 2023 •

edited

Loading

thmsobrmlr commented Sep 19, 2023

mariusandra left a comment

mariusandra Sep 19, 2023

mariusandra Sep 19, 2023

thmsobrmlr Sep 19, 2023

mariusandra Sep 19, 2023

thmsobrmlr commented Sep 19, 2023

mariusandra commented Sep 19, 2023

feat(hogql): implement basic caching for hogql insight queries #17483

feat(hogql): implement basic caching for hogql insight queries #17483

Conversation

thmsobrmlr commented Sep 16, 2023 • edited Loading

Problem

Changes

How did you test this code?

thmsobrmlr commented Sep 19, 2023

mariusandra left a comment

Choose a reason for hiding this comment

mariusandra Sep 19, 2023

Choose a reason for hiding this comment

mariusandra Sep 19, 2023

Choose a reason for hiding this comment

thmsobrmlr Sep 19, 2023

Choose a reason for hiding this comment

mariusandra Sep 19, 2023

Choose a reason for hiding this comment

thmsobrmlr commented Sep 19, 2023

mariusandra commented Sep 19, 2023

thmsobrmlr commented Sep 16, 2023 •

edited

Loading