Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(performance): query improvements for trends (load less people) #23135

Merged
merged 91 commits into from
Jun 25, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
91 commits
Select commit Hold shift + click to select a range
22c538d
group pdi
aspicer Jun 20, 2024
07ac16a
fixes
aspicer Jun 20, 2024
bf33493
Update query snapshots
github-actions[bot] Jun 20, 2024
160b1c5
Update UI snapshots for `chromium` (2)
github-actions[bot] Jun 20, 2024
ee3b5dc
Update query snapshots
github-actions[bot] Jun 20, 2024
1878fbc
Update UI snapshots for `chromium` (2)
github-actions[bot] Jun 20, 2024
50964dc
person ids
aspicer Jun 20, 2024
d377853
fix
aspicer Jun 20, 2024
5da1b47
Update UI snapshots for `chromium` (2)
github-actions[bot] Jun 20, 2024
14c0caf
fix mypy
aspicer Jun 20, 2024
35e2cb2
fix bug
aspicer Jun 20, 2024
f2b3926
Update UI snapshots for `chromium` (2)
github-actions[bot] Jun 20, 2024
6486f1e
Merge branch 'aspicer/group_pdi' of github.com:PostHog/posthog into a…
aspicer Jun 20, 2024
4ca7bbd
Update UI snapshots for `chromium` (2)
github-actions[bot] Jun 21, 2024
ae94436
Update query snapshots
github-actions[bot] Jun 21, 2024
9149173
Update query snapshots
github-actions[bot] Jun 21, 2024
c76a96a
Update query snapshots
github-actions[bot] Jun 21, 2024
cb5af95
don't cache if not using cte
aspicer Jun 21, 2024
64f160f
Update query snapshots
github-actions[bot] Jun 21, 2024
814766d
Merge branch 'aspicer/group_pdi' of github.com:PostHog/posthog into a…
aspicer Jun 21, 2024
a1266e8
Merge branch 'master' into aspicer/group_pdi
aspicer Jun 21, 2024
e61b9fc
Update UI snapshots for `chromium` (2)
github-actions[bot] Jun 21, 2024
a58194c
Update query snapshots
github-actions[bot] Jun 21, 2024
583e223
Update query snapshots
github-actions[bot] Jun 21, 2024
d897d3d
capture_queries
aspicer Jun 21, 2024
61b0555
Merge branch 'aspicer/group_pdi' of github.com:PostHog/posthog into a…
aspicer Jun 21, 2024
f02c16b
Update query snapshots
github-actions[bot] Jun 21, 2024
9d22b62
Update query snapshots
github-actions[bot] Jun 21, 2024
b58a61e
value
aspicer Jun 21, 2024
f290733
make cohorts work with lazy tables and make conjoined work with types
aspicer Jun 21, 2024
fb33fe6
Update query snapshots
github-actions[bot] Jun 21, 2024
52a6a6f
Update query snapshots
github-actions[bot] Jun 21, 2024
e363641
Update query snapshots
github-actions[bot] Jun 21, 2024
d4a7415
Update query snapshots
github-actions[bot] Jun 21, 2024
b445519
Update query snapshots
github-actions[bot] Jun 21, 2024
99e9166
pass tests
aspicer Jun 21, 2024
55cdc89
Update UI snapshots for `chromium` (2)
github-actions[bot] Jun 21, 2024
a8599fd
printer
aspicer Jun 21, 2024
bad350a
Merge branch 'aspicer/group_pdi' of github.com:PostHog/posthog into a…
aspicer Jun 21, 2024
d2e0276
Update query snapshots
github-actions[bot] Jun 21, 2024
900a3a6
Update query snapshots
github-actions[bot] Jun 21, 2024
7cf5d07
Update UI snapshots for `chromium` (2)
github-actions[bot] Jun 21, 2024
76af95d
Update query snapshots
github-actions[bot] Jun 21, 2024
fce4e66
Update query snapshots
github-actions[bot] Jun 21, 2024
e245720
merge
aspicer Jun 21, 2024
99a92cc
snapshots
aspicer Jun 21, 2024
6f386df
Merge remote-tracking branch 'origin/master' into aspicer/group_pdi
aspicer Jun 21, 2024
88a3cea
changes
aspicer Jun 22, 2024
e60c7da
all tests pass
aspicer Jun 22, 2024
0a5534e
Update query snapshots
github-actions[bot] Jun 22, 2024
a082165
work on all queries
aspicer Jun 22, 2024
7af1b6f
Merge branch 'aspicer/group_pdi' of github.com:PostHog/posthog into a…
aspicer Jun 22, 2024
324ea5a
all queries
aspicer Jun 22, 2024
5373668
Update query snapshots
github-actions[bot] Jun 22, 2024
38939d6
Update query snapshots
github-actions[bot] Jun 22, 2024
5aa25e1
Update query snapshots
github-actions[bot] Jun 22, 2024
39d6661
Update UI snapshots for `chromium` (2)
github-actions[bot] Jun 22, 2024
7543446
Update UI snapshots for `chromium` (2)
github-actions[bot] Jun 22, 2024
08e77ec
source id generalization
aspicer Jun 22, 2024
20abe64
Merge branch 'aspicer/group_pdi' of github.com:PostHog/posthog into a…
aspicer Jun 22, 2024
e93ae5d
Update query snapshots
github-actions[bot] Jun 23, 2024
b9d9ada
mmm
aspicer Jun 24, 2024
25475ab
groups failing
aspicer Jun 24, 2024
0101a34
refactor
aspicer Jun 24, 2024
2e2c0e1
merge snapshot
aspicer Jun 24, 2024
54fa578
fixed typo
aspicer Jun 24, 2024
81351ca
xmlsec
aspicer Jun 24, 2024
5ec2768
test actors
aspicer Jun 24, 2024
2fe6258
mypy
aspicer Jun 24, 2024
8d46f61
flap
aspicer Jun 24, 2024
e5cc291
join on
aspicer Jun 25, 2024
a8d1f32
hm
aspicer Jun 25, 2024
140d6d0
refactor
aspicer Jun 25, 2024
29ff40b
working
aspicer Jun 25, 2024
f9eb633
tests and cleanup
aspicer Jun 25, 2024
38116a4
don't allow no table name
aspicer Jun 25, 2024
e3e7f27
don't make a new table if no promotions
aspicer Jun 25, 2024
1bf03d7
cleanup
aspicer Jun 25, 2024
a4a4a91
remove tests that check for query caceh
aspicer Jun 25, 2024
cdb838f
and
aspicer Jun 25, 2024
432ec1a
minor reorder
aspicer Jun 25, 2024
0cfe7d6
conflict insights
aspicer Jun 25, 2024
3b57efb
Update query snapshots
github-actions[bot] Jun 25, 2024
5450c98
Update query snapshots
github-actions[bot] Jun 25, 2024
be0a2a3
Update query snapshots
github-actions[bot] Jun 25, 2024
f3caafe
Update query snapshots
github-actions[bot] Jun 25, 2024
6b8c378
resolve test fix
aspicer Jun 25, 2024
94dc197
Merge branch 'aspicer/group_pdi' of github.com:PostHog/posthog into a…
aspicer Jun 25, 2024
1d43f69
Update query snapshots
github-actions[bot] Jun 25, 2024
811446e
Update query snapshots
github-actions[bot] Jun 25, 2024
436617b
Update query snapshots
github-actions[bot] Jun 25, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view

Large diffs are not rendered by default.

Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion frontend/src/lib/components/CommandBar/commandBarLogic.ts
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@ export const commandBarLogic = kea<commandBarLogicType>([
if (shouldIgnoreInput(event)) {
return
}
if ((event.ctrlKey || event.metaKey) && event.key === 'k') {
if ((event.ctrlKey || event.metaKey) && (event.key === 'k' || event.key === 'K')) {
event.preventDefault()
if (event.shiftKey) {
// cmd+shift+k opens actions
Expand Down
8 changes: 4 additions & 4 deletions posthog/api/test/__snapshots__/test_query.ambr
Original file line number Diff line number Diff line change
Expand Up @@ -486,7 +486,7 @@
FROM person_distinct_id2
WHERE equals(person_distinct_id2.team_id, 2)
GROUP BY person_distinct_id2.distinct_id
HAVING ifNull(equals(argMax(person_distinct_id2.is_deleted, person_distinct_id2.version), 0), 0)) AS events__pdi ON equals(events.distinct_id, events__pdi.distinct_id)
HAVING ifNull(equals(argMax(person_distinct_id2.is_deleted, person_distinct_id2.version), 0), 0) SETTINGS optimize_aggregation_in_order=1) AS events__pdi ON equals(events.distinct_id, events__pdi.distinct_id)
LEFT JOIN
(SELECT person.id AS id,
replaceRegexpAll(nullIf(nullIf(JSONExtractRaw(person.properties, 'email'), ''), 'null'), '^"|"$', '') AS properties___email
Expand All @@ -496,7 +496,7 @@
FROM person
WHERE equals(person.team_id, 2)
GROUP BY person.id
HAVING and(ifNull(equals(argMax(person.is_deleted, person.version), 0), 0), ifNull(less(argMax(person.created_at, person.version), plus(now64(6, 'UTC'), toIntervalDay(1))), 0)))), 0)) SETTINGS optimize_aggregation_in_order=1) AS events__pdi__person ON equals(events__pdi.events__pdi___person_id, events__pdi__person.id)
HAVING and(ifNull(equals(argMax(person.is_deleted, person.version), 0), 0), ifNull(less(argMax(toTimeZone(person.created_at, 'UTC'), person.version), plus(now64(6, 'UTC'), toIntervalDay(1))), 0)))), 0)) SETTINGS optimize_aggregation_in_order=1) AS events__pdi__person ON equals(events__pdi.events__pdi___person_id, events__pdi__person.id)
WHERE and(equals(events.team_id, 2), ifNull(equals(events__pdi__person.properties___email, '[email protected]'), 0), less(toTimeZone(events.timestamp, 'UTC'), toDateTime64('2020-01-10 12:14:05.000000', 6, 'UTC')), greater(toTimeZone(events.timestamp, 'UTC'), toDateTime64('2020-01-09 12:14:00.000000', 6, 'UTC')))
ORDER BY events.event ASC
LIMIT 101
Expand Down Expand Up @@ -526,7 +526,7 @@
FROM person_distinct_id2
WHERE equals(person_distinct_id2.team_id, 2)
GROUP BY person_distinct_id2.distinct_id
HAVING ifNull(equals(argMax(person_distinct_id2.is_deleted, person_distinct_id2.version), 0), 0)) AS events__pdi ON equals(events.distinct_id, events__pdi.distinct_id)
HAVING ifNull(equals(argMax(person_distinct_id2.is_deleted, person_distinct_id2.version), 0), 0) SETTINGS optimize_aggregation_in_order=1) AS events__pdi ON equals(events.distinct_id, events__pdi.distinct_id)
LEFT JOIN
(SELECT person.id AS id,
nullIf(nullIf(person.pmat_email, ''), 'null') AS properties___email
Expand All @@ -536,7 +536,7 @@
FROM person
WHERE equals(person.team_id, 2)
GROUP BY person.id
HAVING and(ifNull(equals(argMax(person.is_deleted, person.version), 0), 0), ifNull(less(argMax(person.created_at, person.version), plus(now64(6, 'UTC'), toIntervalDay(1))), 0)))), 0)) SETTINGS optimize_aggregation_in_order=1) AS events__pdi__person ON equals(events__pdi.events__pdi___person_id, events__pdi__person.id)
HAVING and(ifNull(equals(argMax(person.is_deleted, person.version), 0), 0), ifNull(less(argMax(toTimeZone(person.created_at, 'UTC'), person.version), plus(now64(6, 'UTC'), toIntervalDay(1))), 0)))), 0)) SETTINGS optimize_aggregation_in_order=1) AS events__pdi__person ON equals(events__pdi.events__pdi___person_id, events__pdi__person.id)
WHERE and(equals(events.team_id, 2), ifNull(equals(events__pdi__person.properties___email, '[email protected]'), 0), less(toTimeZone(events.timestamp, 'UTC'), toDateTime64('2020-01-10 12:14:05.000000', 6, 'UTC')), greater(toTimeZone(events.timestamp, 'UTC'), toDateTime64('2020-01-09 12:14:00.000000', 6, 'UTC')))
ORDER BY events.event ASC
LIMIT 101
Expand Down
4 changes: 2 additions & 2 deletions posthog/api/test/test_cohort.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@
class TestCohort(TestExportMixin, ClickhouseTestMixin, APIBaseTest, QueryMatchingTest):
# select all queries for snapshots
def capture_select_queries(self):
return self.capture_queries(("INSERT INTO cohortpeople", "SELECT", "ALTER", "select", "DELETE"))
return self.capture_queries_startswith(("INSERT INTO cohortpeople", "SELECT", "ALTER", "select", "DELETE"))

def _get_cohort_activity(
self,
Expand Down Expand Up @@ -101,7 +101,7 @@ def test_creating_update_and_calculating(self, patch_sync_execute, patch_calcula
},
)

with self.capture_queries("INSERT INTO cohortpeople") as insert_statements:
with self.capture_queries_startswith("INSERT INTO cohortpeople") as insert_statements:
response = self.client.patch(
f"/api/projects/{self.team.id}/cohorts/{response.json()['id']}",
data={
Expand Down
6 changes: 5 additions & 1 deletion posthog/hogql/database/database.py
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,11 @@
PersonDistinctIdsTable,
RawPersonDistinctIdsTable,
)
from posthog.hogql.database.schema.persons import PersonsTable, RawPersonsTable, join_with_persons_table
from posthog.hogql.database.schema.persons import (
PersonsTable,
RawPersonsTable,
join_with_persons_table,
)
from posthog.hogql.database.schema.session_replay_events import (
RawSessionReplayEventsTable,
SessionReplayEventsTable,
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
from posthog.hogql.ast import SelectQuery
from posthog.hogql.constants import HogQLQuerySettings
from posthog.hogql.context import HogQLContext

from posthog.hogql.database.argmax import argmax_select
Expand Down Expand Up @@ -32,13 +33,15 @@ def select_from_person_distinct_id_overrides_table(requested_fields: dict[str, l
# Always include "person_id", as it's the key we use to make further joins, and it'd be great if it's available
if "person_id" not in requested_fields:
requested_fields = {**requested_fields, "person_id": ["person_id"]}
return argmax_select(
select = argmax_select(
table_name="raw_person_distinct_id_overrides",
select_fields=requested_fields,
group_fields=["distinct_id"],
argmax_field="version",
deleted_field="is_deleted",
)
select.settings = HogQLQuerySettings(optimize_aggregation_in_order=True)
return select


def join_with_person_distinct_id_overrides_table(
Expand Down
5 changes: 4 additions & 1 deletion posthog/hogql/database/schema/person_distinct_ids.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
from posthog.hogql.ast import SelectQuery
from posthog.hogql.constants import HogQLQuerySettings
from posthog.hogql.context import HogQLContext

from posthog.hogql.database.argmax import argmax_select
Expand Down Expand Up @@ -32,13 +33,15 @@ def select_from_person_distinct_ids_table(requested_fields: dict[str, list[str |
# Always include "person_id", as it's the key we use to make further joins, and it'd be great if it's available
if "person_id" not in requested_fields:
requested_fields = {**requested_fields, "person_id": ["person_id"]}
return argmax_select(
select = argmax_select(
table_name="raw_person_distinct_ids",
select_fields=requested_fields,
group_fields=["distinct_id"],
argmax_field="version",
deleted_field="is_deleted",
)
select.settings = HogQLQuerySettings(optimize_aggregation_in_order=True)
return select


def join_with_person_distinct_ids_table(
Expand Down
67 changes: 64 additions & 3 deletions posthog/hogql/database/schema/persons.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,9 @@
from typing import cast
from typing import cast, Optional
from typing_extensions import Self
import posthoganalytics

from posthog.hogql.ast import SelectQuery, And
from posthog.hogql.ast import SelectQuery, And, CompareOperation, CompareOperationOp, Field, JoinExpr
from posthog.hogql.base import Expr
from posthog.hogql.constants import HogQLQuerySettings
from posthog.hogql.context import HogQLContext
from posthog.hogql.database.argmax import argmax_select
Expand All @@ -21,6 +23,7 @@
from posthog.hogql.database.schema.util.where_clause_extractor import WhereClauseExtractor
from posthog.hogql.database.schema.persons_pdi import PersonsPDITable, persons_pdi_join
from posthog.hogql.errors import ResolutionError
from posthog.hogql.visitor import clone_expr
from posthog.models.organization import Organization
from posthog.schema import PersonsArgMaxVersion

Expand All @@ -38,7 +41,13 @@
}


def select_from_persons_table(join_or_table: LazyJoinToAdd | LazyTableToAdd, context: HogQLContext, node: SelectQuery):
def select_from_persons_table(
join_or_table: LazyJoinToAdd | LazyTableToAdd,
context: HogQLContext,
node: SelectQuery,
*,
filter: Optional[Expr] = None,
):
version = context.modifiers.personsArgMaxVersion
if version == PersonsArgMaxVersion.AUTO:
version = PersonsArgMaxVersion.V1
Expand Down Expand Up @@ -67,6 +76,8 @@ def select_from_persons_table(join_or_table: LazyJoinToAdd | LazyTableToAdd, con
),
)
select.settings = HogQLQuerySettings(optimize_aggregation_in_order=True)
if filter is not None:
cast(ast.SelectQuery, cast(ast.CompareOperation, select.where).right).where = filter

for field_name, field_chain in join_or_table.fields_accessed.items():
# We need to always select the 'id' field for the join constraint. The field name here is likely to
Expand All @@ -88,6 +99,11 @@ def select_from_persons_table(join_or_table: LazyJoinToAdd | LazyTableToAdd, con
timestamp_field_to_clamp="created_at",
)
select.settings = HogQLQuerySettings(optimize_aggregation_in_order=True)
if filter is not None:
if select.where:
select.where = And(exprs=[select.where, filter])
else:
select.where = filter

if context.modifiers.optimizeJoinedFilters:
extractor = WhereClauseExtractor(context)
Expand Down Expand Up @@ -162,10 +178,55 @@ def to_printed_hogql(self):
return "raw_persons"


# Persons is a lazy table that allows you to insert a where statement inside of the person subselect
# It pulls any "persons.id in ()" statement inside of the argmax subselect
# This is useful when executing a query for a large team.
class PersonsTable(LazyTable):
fields: dict[str, FieldOrTable] = PERSONS_FIELDS
filter: Optional[Expr] = None

@staticmethod
def _is_promotable_expr(expr, alias: Optional[str] = None):
return (
isinstance(expr, CompareOperation)
and expr.op == CompareOperationOp.In
and isinstance(expr.left, Field)
and expr.left.chain == [alias or "persons", "id"]
)

@staticmethod
def _partition_exprs(exprs, alias: Optional[str] = None):
not_promotable = []
promotable = []
for expr in exprs:
if PersonsTable._is_promotable_expr(expr, alias):
# Erase "persons" from the chain before bringing inside
expr.left = Field(chain=["id"])
promotable.append(expr)
else:
not_promotable.append(expr)

return promotable, not_promotable

# If the join has a clause we can bring inside the subselect, create a new table that represents that
def create_new_table_with_filter(self, join: JoinExpr) -> Self:
if join.constraint is not None and isinstance(join.constraint.expr, And):
exprs = cast(And, join.constraint.expr).exprs
promotable, not_promotable = PersonsTable._partition_exprs(exprs, join.alias)
if len(promotable) == 0:
return self
join.constraint.expr.exprs = not_promotable
p = self.model_copy()
if len(promotable) == 1:
p.filter = promotable[0]
elif len(promotable) > 1:
p.filter = And(exprs=promotable)
return p
return self

def lazy_select(self, table_to_add: LazyTableToAdd, context, node):
if self.filter is not None:
return select_from_persons_table(table_to_add, context, node, filter=clone_expr(self.filter, True, True))
return select_from_persons_table(table_to_add, context, node)

def to_printed_clickhouse(self, context):
Expand Down
5 changes: 4 additions & 1 deletion posthog/hogql/database/schema/persons_pdi.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
import posthoganalytics

from posthog.hogql.ast import SelectQuery
from posthog.hogql.constants import HogQLQuerySettings
from posthog.hogql.context import HogQLContext
from posthog.hogql.database.argmax import argmax_select
from posthog.hogql.database.models import (
Expand All @@ -21,13 +22,15 @@ def persons_pdi_select(requested_fields: dict[str, list[str | int]]):
# Always include "person_id", as it's the key we use to make further joins, and it'd be great if it's available
if "person_id" not in requested_fields:
requested_fields = {**requested_fields, "person_id": ["person_id"]}
return argmax_select(
select = argmax_select(
table_name="raw_person_distinct_ids",
select_fields=requested_fields,
group_fields=["distinct_id"],
argmax_field="version",
deleted_field="is_deleted",
)
select.settings = HogQLQuerySettings(optimize_aggregation_in_order=True)
return select


# :NOTE: We already have person_distinct_ids.py, which most tables link to. This persons_pdi.py is a hack to
Expand Down
Loading
Loading