Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: Speed up filtering persons #25604

Merged
merged 92 commits into from
Oct 21, 2024
Merged
Show file tree
Hide file tree
Changes from 59 commits
Commits
Show all changes
92 commits
Select commit Hold shift + click to select a range
d97020e
perf: Speed up filtering persons
timgl Oct 15, 2024
5d4800a
test
timgl Oct 15, 2024
f18f976
fix
timgl Oct 16, 2024
21165a2
Update query snapshots
github-actions[bot] Oct 16, 2024
2bf8231
Update query snapshots
github-actions[bot] Oct 16, 2024
2c39341
fix and add tests
timgl Oct 16, 2024
54f96a2
Update query snapshots
github-actions[bot] Oct 16, 2024
02d6717
Update query snapshots
github-actions[bot] Oct 16, 2024
51761c8
fix
timgl Oct 16, 2024
d7f4b21
remove personsArgMax and optimizedJoin modifiers
timgl Oct 16, 2024
ae4c773
fix negated queries
timgl Oct 16, 2024
fb7ea54
fix
timgl Oct 16, 2024
275f425
Update query snapshots
github-actions[bot] Oct 16, 2024
b8c8aed
Update query snapshots
github-actions[bot] Oct 16, 2024
29b2f47
Update query snapshots
github-actions[bot] Oct 16, 2024
de2423b
fix
timgl Oct 16, 2024
1547c9b
fix
timgl Oct 17, 2024
20ed939
Update query snapshots
github-actions[bot] Oct 17, 2024
ab77dc1
Update UI snapshots for `chromium` (1)
github-actions[bot] Oct 17, 2024
65a42d6
Update query snapshots
github-actions[bot] Oct 17, 2024
46d1b2e
Update query snapshots
github-actions[bot] Oct 17, 2024
3d431ab
fix
timgl Oct 17, 2024
1a7652f
fix
timgl Oct 17, 2024
de66639
Merge branch 'master' into speed-up-person-queries
timgl Oct 17, 2024
7ec2625
fix
timgl Oct 17, 2024
580543a
Update query snapshots
github-actions[bot] Oct 17, 2024
cc31913
Update UI snapshots for `chromium` (1)
github-actions[bot] Oct 17, 2024
43fad40
Update query snapshots
github-actions[bot] Oct 17, 2024
a5d2eeb
Update UI snapshots for `chromium` (1)
github-actions[bot] Oct 17, 2024
48a4186
refactor(batch-exports): Buffer batches while async flushing (#25631)
tomasfarias Oct 17, 2024
cc7f3d7
feat(alerts): Relative alerts (#25423)
timgl Oct 17, 2024
0b6c96f
feat(cdp): prominently show source code (#25639)
mariusandra Oct 17, 2024
e2a7d61
fix(notebooks): bind logic for hogql insights (#25623)
skoob13 Oct 17, 2024
2b97ef6
fix(LemonInputSelect): remove all escaped commas (#25646)
skoob13 Oct 17, 2024
794a974
chore: implement store & resolver in cymbal logic flow (#25592)
daibhin Oct 17, 2024
28f6ec8
chore(deps): Update posthog-js to 1.172.0 (#25648)
posthog-bot Oct 17, 2024
f5e0af4
feat(replay): allow triggering session recording based on urls (#25451)
richard-better Oct 17, 2024
ab7b1d3
chore(data-warehouse): update mssql setup readme (#25630)
EDsCODE Oct 17, 2024
9211e50
chore(data-warehouse): add connect_args to sql import (#25637)
EDsCODE Oct 17, 2024
01816b5
chore: add crossOrigin='anonymous' to snippet script (#25645)
daibhin Oct 17, 2024
8cee817
feat: publish es5 bundle on CDN (#25647)
pauldambra Oct 17, 2024
2c97f0b
feat(experiments HogQL): calculate statistics for Funnels (#25626)
jurajmajerik Oct 17, 2024
648ba8e
chore: even more taxonomy fangling (#25650)
pauldambra Oct 17, 2024
6bf6c82
fix(surveys): do not reset survey iteration count if value is nil (#2…
Phanatic Oct 17, 2024
a471925
Merge branch 'master' into speed-up-person-queries
timgl Oct 17, 2024
f2a2573
Update UI snapshots for `chromium` (1)
github-actions[bot] Oct 17, 2024
b525972
Update query snapshots
github-actions[bot] Oct 17, 2024
47e891b
chore(bi): Dont allow multiple insight variables with the same code_n…
Gilbert09 Oct 17, 2024
fab24df
fix(experiments): remove Edit button (#25644)
jurajmajerik Oct 17, 2024
d57b6c4
fix(flags): update payloads object when removing variants (#25580)
dmarticus Oct 17, 2024
9f2d558
Merge branch 'master' into speed-up-person-queries
timgl Oct 17, 2024
eb8426a
Update UI snapshots for `chromium` (1)
github-actions[bot] Oct 17, 2024
66d4f2b
Update query snapshots
github-actions[bot] Oct 17, 2024
3d0dbc1
Update UI snapshots for `chromium` (1)
github-actions[bot] Oct 17, 2024
57d4f02
Update UI snapshots for `chromium` (1)
github-actions[bot] Oct 17, 2024
bdf8f10
Update UI snapshots for `chromium` (1)
github-actions[bot] Oct 18, 2024
9bb74d6
Update UI snapshots for `chromium` (1)
github-actions[bot] Oct 18, 2024
be3a78a
fix negation not
timgl Oct 18, 2024
f699b33
Update query snapshots
github-actions[bot] Oct 18, 2024
92e1b7f
fix: Use Hogqlmapping
timgl Oct 21, 2024
eb398df
Update UI snapshots for `chromium` (1)
github-actions[bot] Oct 21, 2024
676153a
chore(deps): Update posthog-js to 1.174.0 (#25667)
posthog-bot Oct 17, 2024
c44b3ef
feat(Surveys): Add a survey preview to global settings page (#25603)
Phanatic Oct 18, 2024
2599db1
feat(bi): Added support for number and bool vars (#25633)
Gilbert09 Oct 18, 2024
1f305bb
fix: add command to delete persons with no distinct id's (#25657)
oliverb123 Oct 18, 2024
f89bf3a
chore(deps): Update posthog-js to 1.174.2 (#25677)
posthog-bot Oct 18, 2024
96ec39c
feat(bi): Adds support for list type variables in BI (#25651)
Gilbert09 Oct 18, 2024
811c94e
fix(data-warehouse): Increase timeout for creating external data mode…
Gilbert09 Oct 18, 2024
3a7d182
fix(cdp): allow falsy default values (#25672)
MarconLP Oct 18, 2024
49bc8e9
feat(bi): Add the ability to update variables (#25659)
Gilbert09 Oct 18, 2024
160dc36
fix(data-warehouse): source deletion fix (#25664)
EDsCODE Oct 18, 2024
5b24503
chore(environments): Make `team.project_id` officially non-null (#25597)
Twixes Oct 18, 2024
4c03ae8
fix(alerts): alert monitoring + trendlines only for absolute alerts (…
anirudhpillai Oct 18, 2024
0da6fd1
fix(data-warehouse): Split out the create model activity (#25683)
Gilbert09 Oct 18, 2024
8cc2ab3
fix: warming and filter (#25674)
aspicer Oct 18, 2024
3f7ed19
fix(insights): handle negation property in cohort property filters (#…
skoob13 Oct 18, 2024
49bc2cd
feat(data-warehouse): edit SQL based import configs (#25685)
EDsCODE Oct 18, 2024
e54795f
feat(hog): print ast in console (#25608)
mariusandra Oct 21, 2024
aeed74e
feat(cdp): add google ads integration warning (#25698)
MarconLP Oct 21, 2024
f8927f0
feat(max): Answer feedback buttons (#25670)
Twixes Oct 21, 2024
44e140e
feat(cdp): add klaviyo templates (#25693)
MarconLP Oct 21, 2024
a905711
chore: use meta columns in data visualization table (#25697)
daibhin Oct 21, 2024
9e51d80
fix: smaller cohort resource usage for replay filters (#25699)
pauldambra Oct 21, 2024
1690a36
feat(errors): Use exception list and remove stack trace raw (#25655)
neilkakkar Oct 21, 2024
e5b4546
feat(max): Show query summary (#25696)
timgl Oct 21, 2024
8599fd4
Merge branch 'master' into speed-up-person-queries
timgl Oct 21, 2024
9ee4ffb
Update UI snapshots for `chromium` (1)
github-actions[bot] Oct 21, 2024
fe96ba1
Update query snapshots
github-actions[bot] Oct 21, 2024
4d01e25
Update UI snapshots for `chromium` (1)
github-actions[bot] Oct 21, 2024
e7e4f29
Update UI snapshots for `chromium` (2)
github-actions[bot] Oct 21, 2024
6e061ff
Update UI snapshots for `chromium` (1)
github-actions[bot] Oct 21, 2024
ddf47d4
Update UI snapshots for `chromium` (2)
github-actions[bot] Oct 21, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view

Large diffs are not rendered by default.

7 changes: 0 additions & 7 deletions frontend/src/queries/schema.json
Original file line number Diff line number Diff line change
Expand Up @@ -6533,13 +6533,6 @@
"enum": ["auto", "legacy_null_as_string", "legacy_null_as_null", "disabled"],
"type": "string"
},
"optimizeJoinedFilters": {
"type": "boolean"
},
"personsArgMaxVersion": {
"enum": ["auto", "v1", "v2"],
"type": "string"
},
"personsJoinMode": {
"enum": ["inner", "left"],
"type": "string"
Expand Down
2 changes: 0 additions & 2 deletions frontend/src/queries/schema.ts
Original file line number Diff line number Diff line change
Expand Up @@ -227,10 +227,8 @@ export interface HogQLQueryModifiers {
| 'person_id_no_override_properties_on_events'
| 'person_id_override_properties_on_events'
| 'person_id_override_properties_joined'
personsArgMaxVersion?: 'auto' | 'v1' | 'v2'
inCohortVia?: 'auto' | 'leftjoin' | 'subquery' | 'leftjoin_conjoined'
materializationMode?: 'auto' | 'legacy_null_as_string' | 'legacy_null_as_null' | 'disabled'
optimizeJoinedFilters?: boolean
dataWarehouseEventsModifiers?: DataWarehouseEventsModifier[]
debug?: boolean
s3TableUseInvalidColumns?: boolean
Expand Down
33 changes: 1 addition & 32 deletions frontend/src/scenes/debug/Modifiers.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -47,22 +47,7 @@ export function Modifiers<Q extends { response?: Record<string, any>; modifiers?
value={query.modifiers?.personsOnEventsMode ?? response?.modifiers?.personsOnEventsMode}
/>
</LemonLabel>
<LemonLabel className={labelClassName}>
<div>Persons ArgMax:</div>
<LemonSelect
options={[
{ value: 'v1', label: 'V1' },
{ value: 'v2', label: 'V2' },
]}
onChange={(value) =>
setQuery({
...query,
modifiers: { ...query.modifiers, personsArgMaxVersion: value },
})
}
value={query.modifiers?.personsArgMaxVersion ?? response?.modifiers?.personsArgMaxVersion}
/>
</LemonLabel>

<LemonLabel className={labelClassName}>
<div>In Cohort Via:</div>
<LemonSelect
Expand Down Expand Up @@ -99,22 +84,6 @@ export function Modifiers<Q extends { response?: Record<string, any>; modifiers?
value={query.modifiers?.materializationMode ?? response?.modifiers?.materializationMode}
/>
</LemonLabel>
<LemonLabel className={labelClassName}>
<div>Optimize joined filters:</div>
<LemonSelect
options={[
{ value: true, label: 'true' },
{ value: false, label: 'false' },
]}
onChange={(value) =>
setQuery({
...query,
modifiers: { ...query.modifiers, optimizeJoinedFilters: value },
})
}
value={query.modifiers?.optimizeJoinedFilters ?? response?.modifiers?.optimizeJoinedFilters}
/>
</LemonLabel>
<LemonLabel className={labelClassName}>
<div>Property Groups:</div>
<LemonSelect
Expand Down
5 changes: 0 additions & 5 deletions mypy-baseline.txt
Original file line number Diff line number Diff line change
Expand Up @@ -528,11 +528,6 @@ posthog/hogql/test/test_modifiers.py:0: error: Unsupported right operand type fo
posthog/hogql/test/test_modifiers.py:0: error: Unsupported right operand type for in ("str | None") [operator]
posthog/hogql/test/test_modifiers.py:0: error: Unsupported right operand type for in ("str | None") [operator]
posthog/hogql/test/test_modifiers.py:0: error: Unsupported right operand type for in ("str | None") [operator]
posthog/hogql/test/test_modifiers.py:0: error: Unsupported right operand type for in ("str | None") [operator]
posthog/hogql/test/test_modifiers.py:0: error: Unsupported right operand type for in ("str | None") [operator]
posthog/hogql/test/test_modifiers.py:0: error: Unsupported right operand type for in ("str | None") [operator]
posthog/hogql/test/test_modifiers.py:0: error: Unsupported right operand type for in ("str | None") [operator]
posthog/hogql/test/test_modifiers.py:0: error: Unsupported right operand type for in ("str | None") [operator]
posthog/hogql/test/_test_parser.py:0: error: Invalid base class [misc]
posthog/hogql/test/_test_parser.py:0: error: Argument "table" to "JoinExpr" has incompatible type "Placeholder"; expected "SelectQuery | SelectUnionQuery | Field | None" [arg-type]
posthog/hogql/test/_test_parser.py:0: error: Item "None" of "JoinExpr | None" has no attribute "table" [union-attr]
Expand Down
34 changes: 20 additions & 14 deletions posthog/api/test/__snapshots__/test_query.ambr
Original file line number Diff line number Diff line change
Expand Up @@ -442,14 +442,17 @@
HAVING ifNull(equals(argMax(person_distinct_id_overrides.is_deleted, person_distinct_id_overrides.version), 0), 0) SETTINGS optimize_aggregation_in_order=1) AS events__override ON equals(events.distinct_id, events__override.distinct_id)
LEFT JOIN
(SELECT person.id AS id,
replaceRegexpAll(nullIf(nullIf(JSONExtractRaw(person.properties, 'email'), ''), 'null'), '^"|"$', '') AS properties___email
argMax(replaceRegexpAll(nullIf(nullIf(JSONExtractRaw(person.properties, 'email'), ''), 'null'), '^"|"$', ''), person.version) AS properties___email
FROM person
WHERE and(equals(person.team_id, 2), ifNull(in(tuple(person.id, person.version),
(SELECT person.id AS id, max(person.version) AS version
FROM person
WHERE equals(person.team_id, 2)
GROUP BY person.id
HAVING and(ifNull(equals(argMax(person.is_deleted, person.version), 0), 0), ifNull(less(argMax(toTimeZone(person.created_at, 'UTC'), person.version), plus(now64(6, 'UTC'), toIntervalDay(1))), 0)))), 0)) SETTINGS optimize_aggregation_in_order=1) AS events__person ON equals(if(not(empty(events__override.distinct_id)), events__override.person_id, events.person_id), events__person.id)
WHERE and(equals(person.team_id, 2), in(person.id,
(SELECT person.id AS id
FROM person
WHERE and(equals(person.team_id, 2), notIn(person.id,
(SELECT person.id AS id
FROM person
WHERE and(equals(person.team_id, 2), equals(person.is_deleted, 1)))), ifNull(equals(replaceRegexpAll(nullIf(nullIf(JSONExtractRaw(person.properties, 'email'), ''), 'null'), '^"|"$', ''), '[email protected]'), 0)))), ifNull(equals(replaceRegexpAll(nullIf(nullIf(JSONExtractRaw(person.properties, 'email'), ''), 'null'), '^"|"$', ''), '[email protected]'), 0))
GROUP BY person.id
HAVING and(ifNull(equals(argMax(person.is_deleted, person.version), 0), 0), ifNull(less(argMax(toTimeZone(person.created_at, 'UTC'), person.version), plus(now64(6, 'UTC'), toIntervalDay(1))), 0)) SETTINGS optimize_aggregation_in_order=1) AS events__person ON equals(if(not(empty(events__override.distinct_id)), events__override.person_id, events.person_id), events__person.id)
WHERE and(equals(events.team_id, 2), ifNull(equals(events__person.properties___email, '[email protected]'), 0), less(toTimeZone(events.timestamp, 'UTC'), toDateTime64('2020-01-10 12:14:05.000000', 6, 'UTC')), greater(toTimeZone(events.timestamp, 'UTC'), toDateTime64('2020-01-09 12:14:00.000000', 6, 'UTC')))
ORDER BY events.event ASC
LIMIT 101
Expand Down Expand Up @@ -480,14 +483,17 @@
HAVING ifNull(equals(argMax(person_distinct_id_overrides.is_deleted, person_distinct_id_overrides.version), 0), 0) SETTINGS optimize_aggregation_in_order=1) AS events__override ON equals(events.distinct_id, events__override.distinct_id)
LEFT JOIN
(SELECT person.id AS id,
nullIf(nullIf(person.pmat_email, ''), 'null') AS properties___email
argMax(nullIf(nullIf(person.pmat_email, ''), 'null'), person.version) AS properties___email
FROM person
WHERE and(equals(person.team_id, 2), ifNull(in(tuple(person.id, person.version),
(SELECT person.id AS id, max(person.version) AS version
FROM person
WHERE equals(person.team_id, 2)
GROUP BY person.id
HAVING and(ifNull(equals(argMax(person.is_deleted, person.version), 0), 0), ifNull(less(argMax(toTimeZone(person.created_at, 'UTC'), person.version), plus(now64(6, 'UTC'), toIntervalDay(1))), 0)))), 0)) SETTINGS optimize_aggregation_in_order=1) AS events__person ON equals(if(not(empty(events__override.distinct_id)), events__override.person_id, events.person_id), events__person.id)
WHERE and(equals(person.team_id, 2), in(person.id,
(SELECT person.id AS id
FROM person
WHERE and(equals(person.team_id, 2), notIn(person.id,
(SELECT person.id AS id
FROM person
WHERE and(equals(person.team_id, 2), equals(person.is_deleted, 1)))), ifNull(equals(nullIf(nullIf(person.pmat_email, ''), 'null'), '[email protected]'), 0)))), ifNull(equals(nullIf(nullIf(person.pmat_email, ''), 'null'), '[email protected]'), 0))
GROUP BY person.id
HAVING and(ifNull(equals(argMax(person.is_deleted, person.version), 0), 0), ifNull(less(argMax(toTimeZone(person.created_at, 'UTC'), person.version), plus(now64(6, 'UTC'), toIntervalDay(1))), 0)) SETTINGS optimize_aggregation_in_order=1) AS events__person ON equals(if(not(empty(events__override.distinct_id)), events__override.person_id, events.person_id), events__person.id)
WHERE and(equals(events.team_id, 2), ifNull(equals(events__person.properties___email, '[email protected]'), 0), less(toTimeZone(events.timestamp, 'UTC'), toDateTime64('2020-01-10 12:14:05.000000', 6, 'UTC')), greater(toTimeZone(events.timestamp, 'UTC'), toDateTime64('2020-01-09 12:14:00.000000', 6, 'UTC')))
ORDER BY events.event ASC
LIMIT 101
Expand Down
12 changes: 12 additions & 0 deletions posthog/hogql/ast.py
Original file line number Diff line number Diff line change
Expand Up @@ -631,6 +631,18 @@ class CompareOperationOp(StrEnum):
NotIRegex = "!~*"


NEGATED_COMPARE_OPS: list[CompareOperationOp] = [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bit of a nitpick, but wouldn't this be marginally faster as a set/frozenset for it's use in where_clause_extractor? (I didn't actually benchmark this myself.)

CompareOperationOp.NotEq,
CompareOperationOp.NotLike,
CompareOperationOp.NotILike,
CompareOperationOp.NotIn,
CompareOperationOp.GlobalNotIn,
CompareOperationOp.NotInCohort,
CompareOperationOp.NotRegex,
CompareOperationOp.NotIRegex,
]


@dataclass(kw_only=True)
class CompareOperation(Expr):
left: Expr
Expand Down
130 changes: 68 additions & 62 deletions posthog/hogql/database/schema/persons.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,6 @@
from posthog.hogql.base import Expr
from posthog.hogql.constants import HogQLQuerySettings
from posthog.hogql.context import HogQLContext
from posthog.hogql.database.argmax import argmax_select
from posthog.hogql.database.models import (
BooleanDatabaseField,
DateTimeDatabaseField,
Expand All @@ -24,7 +23,6 @@
from posthog.hogql.errors import ResolutionError
from posthog.hogql.visitor import clone_expr
from posthog.models.organization import Organization
from posthog.schema import PersonsArgMaxVersion

PERSONS_FIELDS: dict[str, FieldOrTable] = {
"id": StringDatabaseField(name="id"),
Expand All @@ -46,72 +44,73 @@ def select_from_persons_table(
node: SelectQuery,
*,
filter: Optional[Expr] = None,
is_join: Optional[bool] = None,
):
version = context.modifiers.personsArgMaxVersion
if version == PersonsArgMaxVersion.AUTO:
version = PersonsArgMaxVersion.V1
# If selecting properties, use the faster v2 query. Otherwise, v1 is faster.
for field_chain in join_or_table.fields_accessed.values():
if field_chain[0] == "properties":
version = PersonsArgMaxVersion.V2
break

if version == PersonsArgMaxVersion.V2:
from posthog.hogql import ast
from posthog.hogql.parser import parse_select

select = cast(
from posthog.hogql import ast
from posthog.hogql.parser import parse_select

select = cast(
ast.SelectQuery,
parse_select(
"""
SELECT id
FROM raw_persons
GROUP BY id
HAVING equals(argMax(raw_persons.is_deleted, raw_persons.version), 0)
AND argMax(raw_persons.created_at, raw_persons.version) < now() + interval 1 day
"""
),
)
select.settings = HogQLQuerySettings(optimize_aggregation_in_order=True)

# This bit optimizes the query by first selecting all IDs for all persons (regardless of whether it's the latest version), and only then aggregating the results
# We only do this if there are where clauses, _and_ WhereClauseExtractor can extract them
if node.where:
inner_select = cast(
ast.SelectQuery,
parse_select(
"""
SELECT id FROM raw_persons WHERE (id, version) IN (
SELECT id, max(version) as version
FROM raw_persons
GROUP BY id
HAVING equals(argMax(raw_persons.is_deleted, raw_persons.version), 0)
AND argMax(raw_persons.created_at, raw_persons.version) < now() + interval 1 day
)
SELECT id
FROM raw_persons
WHERE
-- Much faster to pre-select out any deleted persons than doing it in aggregation
-- This is correct because there are no instances where we'd un-delete a person (ie there are no cases where one version has is_deleted=1 and a later version has is_deleted = 0)
id NOT IN (select id from raw_persons where is_deleted = 1)
timgl marked this conversation as resolved.
Show resolved Hide resolved
"""
),
)
select.settings = HogQLQuerySettings(optimize_aggregation_in_order=True)
if filter is not None:
cast(ast.SelectQuery, cast(ast.CompareOperation, select.where).right).where = filter

for field_name, field_chain in join_or_table.fields_accessed.items():
# We need to always select the 'id' field for the join constraint. The field name here is likely to
# be "persons__id" if anything, but just in case, let's avoid duplicates.
if field_name != "id":
select.select.append(
ast.Alias(
alias=field_name,
expr=ast.Field(chain=field_chain),
)
)
else:
select = argmax_select(
table_name="raw_persons",
select_fields=join_or_table.fields_accessed,
group_fields=["id"],
argmax_field="version",
deleted_field="is_deleted",
timestamp_field_to_clamp="created_at",
)
select.settings = HogQLQuerySettings(optimize_aggregation_in_order=True)
if filter is not None:
if select.where:
select.where = And(exprs=[select.where, filter])
else:
select.where = filter

if context.modifiers.optimizeJoinedFilters:
extractor = WhereClauseExtractor(context)
extractor = WhereClauseExtractor(context, is_join=is_join)
extractor.add_local_tables(join_or_table)
where = extractor.get_inner_where(node)
if where and select.where:
select.where = And(exprs=[select.where, where])
elif where:
select.where = where

if where and inner_select.where:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't inner_select.where always be truthy since it's added on L78 above?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be empty if we didn't manage to extract any clauses, in which case we don't want to use this optimization (as you'd just be doing a subquery with all persons).

inner_select.where = ast.And(exprs=[inner_select.where, where])
select.where = ast.And(
exprs=[
ast.CompareOperation(
left=ast.Field(chain=["id"]), right=inner_select, op=ast.CompareOperationOp.In
),
where, # Technically, adding the where clause here is duplicative, because the outer node filters this out _again_. However, if you're trying to debug the results stay consistent throughout the query (otherwise old versions might pop up again in this subquery)
]
)
if filter is not None:
if select.where:
cast(ast.SelectQuery, cast(ast.CompareOperation, select.where).right).where = ast.And(
exprs=[select.where, filter]
)
Comment on lines +98 to +100
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand what this cast(ast.CompareOperation, select.where) is for. It seems like select.where is only set on L88 above (the instantiation on L52 doesn't include a WHERE clause) but ast.And from that line isn't a subtype of ast.CompareOperation, so it doesn't seem like this cast would be valid. Does this line actually have coverage and I'm just missing something? I'm not sure if we have reporting for that.

else:
select.where = filter

for field_name, field_chain in join_or_table.fields_accessed.items():
# We need to always select the 'id' field for the join constraint. The field name here is likely to
# be "persons__id" if anything, but just in case, let's avoid duplicates.
if field_name != "id":
select.select.append(
ast.Alias(
alias=field_name,
expr=ast.Call(name="argMax", args=[ast.Field(chain=field_chain), ast.Field(chain=["version"])]),
)
)

return select

Expand All @@ -125,7 +124,7 @@ def join_with_persons_table(

if not join_to_add.fields_accessed:
raise ResolutionError("No fields requested from persons table")
join_expr = ast.JoinExpr(table=select_from_persons_table(join_to_add, context, node))
join_expr = ast.JoinExpr(table=select_from_persons_table(join_to_add, context, node, is_join=True))

organization: Organization = context.team.organization if context.team else None
# TODO: @raquelmsmith: Remove flag check and use left join for all once deletes are caught up
Expand Down Expand Up @@ -224,9 +223,16 @@ def create_new_table_with_filter(self, join: JoinExpr) -> Self:
return self

def lazy_select(self, table_to_add: LazyTableToAdd, context, node):
# assume that if the select_from is not persons table we're doing a join
try:
is_join = not isinstance(node.select_from.type.table, PersonsTable)
except AttributeError:
is_join = False
if self.filter is not None:
return select_from_persons_table(table_to_add, context, node, filter=clone_expr(self.filter, True))
return select_from_persons_table(table_to_add, context, node)
return select_from_persons_table(
table_to_add, context, node, filter=clone_expr(self.filter, True), is_join=is_join
)
return select_from_persons_table(table_to_add, context, node, is_join=is_join)

def to_printed_clickhouse(self, context):
return "person"
Expand Down
Loading
Loading