Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Support reading from property groups on events #24171

Merged
merged 17 commits into from
Aug 19, 2024

Conversation

tkaemming
Copy link
Contributor

@tkaemming tkaemming commented Aug 2, 2024

Problem

See #24152 for a thorough description of what problem we're trying to solve here and an overview of what property groups are and why we expect them to improve query performance for queries that reference currently non-materialized properties.

Changes

This enables reading applicable properties from materialized property groups when the usePropertyGroups query modifier is provided. (The modifier itself can be enabled for a team via the Team.modifiers field added in #22048, so no need to add a bespoke feature flag here for testing right now.)

  • Adds logic to PropertyGroupManager for determining what property groups contain a provided property key.
  • Adds logic for reading properties from property groups instead of extracting them from the JSON serialized fields.

Expected Impact

Property access for parts that have had the property group materialized should be significantly faster, since those queries will now read less data, avoid JSON parsing, and skip processing property values with replaceRegexpAll(nullIf(nullIf(value, ''), 'null'), '^\"|\"$', '') since string values in the property group mapping do not contain leading or trailing quotes.

Property access for parts that have not had the property group materialized may also be faster — especially for queries that reference multiple properties in the same group, since the properties JSON will only be parsed once during query execution per property group — but I didn't test this directly. Like above, the replaceRegexpAll call is also skipped in this case which can have a notable impact.

The queries generated here cannot utilize the bloom filter indexes yet on the materialized columns — support for that will come in a follow-up change to special case comparison operations that include a property group as one of the operands so that the bloom filter indexes can be used and/or the number of map subcolumns that need to be read can be reduced.

Does this work well for both Cloud and self-hosted?

Yes.

How did you test this code?

  • Added test.
  • Updated debug view to opt-in/out of property group usage for manual testing.

@tkaemming tkaemming force-pushed the property-groups-reading branch 2 times, most recently from e188d4d to f961fb6 Compare August 3, 2024 04:04
# for this property group, an empty string (the default
# value for the `String` type) is returned. Since that
# is a valid property value, we need to check it here.
materialized_property_sql = f"has({printed_column}, {printed_property_name}) ? {printed_column}[{printed_property_name}] : null"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to self for next week: this has expression causes the bloom filter indices to be skipped when a property ends up in the WHERE clause of a query:

EXPLAIN indexes = 1
SELECT count()
FROM events
WHERE (events.team_id = 1) AND ifNull(if(has(events.properties_group_custom, 'file_type'), events.properties_group_custom['file_type'], NULL) = 'audio/vorbis', 0)
LIMIT 0, 101

Query id: d4bf9ebb-5cb0-4f4e-a218-a163621af497

┌─explain────────────────────────────────────────────┐
│ Expression ((Projection + Before ORDER BY))        │
│   Limit (preliminary LIMIT (without OFFSET))       │
│     Aggregating                                    │
│       Expression (Before GROUP BY)                 │
│         ReadFromMergeTree (default.sharded_events) │
│         Indexes:                                   │
│           MinMax                                   │
│             Condition: true                        │
│             Parts: 10/10                           │
│             Granules: 13/13                        │
│           Partition                                │
│             Condition: true                        │
│             Parts: 10/10                           │
│             Granules: 13/13                        │
│           PrimaryKey                               │
│             Keys:                                  │
│               team_id                              │
│             Condition: (team_id in [1, 1])         │
│             Parts: 10/10                           │
│             Granules: 13/13                        │
└────────────────────────────────────────────────────┘

20 rows in set. Elapsed: 0.007 sec.

versus

EXPLAIN indexes = 1
SELECT count()
FROM events
WHERE (events.team_id = 1) AND ((events.properties_group_custom['file_type']) = 'audio/vorbis')
LIMIT 0, 101

Query id: 772d0b63-a13a-4f6f-878e-c42f553b608d

┌─explain─────────────────────────────────────────────┐
│ Expression ((Projection + Before ORDER BY))         │
│   Limit (preliminary LIMIT (without OFFSET))        │
│     Aggregating                                     │
│       Expression (Before GROUP BY)                  │
│         ReadFromMergeTree (default.sharded_events)  │
│         Indexes:                                    │
│           MinMax                                    │
│             Condition: true                         │
│             Parts: 10/10                            │
│             Granules: 13/13                         │
│           Partition                                 │
│             Condition: true                         │
│             Parts: 10/10                            │
│             Granules: 13/13                         │
│           PrimaryKey                                │
│             Keys:                                   │
│               team_id                               │
│             Condition: (team_id in [1, 1])          │
│             Parts: 10/10                            │
│             Granules: 13/13                         │
│           Skip                                      │
│             Name: properties_group_custom_keys_bf   │
│             Description: bloom_filter GRANULARITY 1 │
│             Parts: 9/10                             │
│             Granules: 12/13                         │
│           Skip                                      │
│             Name: properties_group_custom_values_bf │
│             Description: bloom_filter GRANULARITY 1 │
│             Parts: 2/9                              │
│             Granules: 2/12                          │
└─────────────────────────────────────────────────────┘

30 rows in set. Elapsed: 0.012 sec. 

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The draft for optimizing these expressions that impact property group index usage is here: #24381

Base automatically changed from property-groups to master August 6, 2024 21:48
@tkaemming tkaemming force-pushed the property-groups-reading branch 2 times, most recently from 634b006 to bdfa5b0 Compare August 8, 2024 20:56
@tkaemming tkaemming force-pushed the property-groups-reading branch from f63fa8a to 73d0083 Compare August 8, 2024 22:57
@tkaemming tkaemming changed the base branch from master to property-groups-migrations August 8, 2024 22:57
@tkaemming tkaemming force-pushed the property-groups-reading branch from 5ce0e23 to b1161db Compare August 9, 2024 01:09
Copy link
Contributor

github-actions bot commented Aug 9, 2024

Size Change: 0 B

Total Size: 1.07 MB

ℹ️ View Unchanged
Filename Size
frontend/dist/toolbar.js 1.07 MB

compressed-size-action

@posthog-bot

This comment was marked as outdated.

@tkaemming tkaemming changed the title feat(clickhouse): Start reading from property groups on events feat(clickhouse): Support reading from property groups on events Aug 9, 2024
@tkaemming tkaemming marked this pull request as ready for review August 9, 2024 21:17
@tkaemming tkaemming changed the title feat(clickhouse): Support reading from property groups on events feat: Support reading from property groups on events Aug 9, 2024
@posthog-bot

This comment was marked as outdated.

@posthog-bot

This comment was marked as outdated.

@posthog-bot

This comment was marked as outdated.

@posthog-bot

This comment was marked as outdated.

@posthog-bot

This comment was marked as outdated.

Base automatically changed from property-groups-migrations to master August 12, 2024 18:26
@tkaemming tkaemming requested a review from a team as a code owner August 12, 2024 18:26
@tkaemming tkaemming force-pushed the property-groups-reading branch from 7f69528 to 4d60b20 Compare August 12, 2024 18:30
@posthog-bot

This comment was marked as outdated.

@tkaemming

This comment was marked as resolved.

@posthog-bot

This comment was marked as outdated.

@tkaemming tkaemming requested a review from mariusandra August 14, 2024 21:15
@tkaemming
Copy link
Contributor Author

@mariusandra - I added you here as a reviewer as the "general HogQL guy" but feel free to redirect this to somebody else if you'd like. My confidence my understanding of the AST is pretty tenuous, so those parts could probably use a bit of extra attention.

@posthog-bot

This comment was marked as outdated.

@posthog-bot

This comment was marked as outdated.

@@ -204,6 +204,7 @@ export interface HogQLQueryModifiers {
personsJoinMode?: 'inner' | 'left'
bounceRatePageViewMode?: 'count_pageviews' | 'uniq_urls'
sessionTableVersion?: 'auto' | 'v1' | 'v2'
propertyGroupsMode?: 'enabled' | 'disabled'
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Representing a boolean as an Enum reads a bit silly here, but there is a good reason this is the way it is: this is to support adding a third value that enables property groups with expression rewriting for performance reasons (#24381) without adding yet another modifier or needing to change the type of this field (since changing it could cause problems for any teams that had values set explicitly in Team.modifiers.)

I left auto out of this field (in contrast with some of the above values) since this field is already nullable. I also left this out of set_default_modifier_values, since adding new values there invalidates the entire query cache (see b2050df.)

@posthog-bot

This comment was marked as outdated.

@tkaemming tkaemming force-pushed the property-groups-reading branch from 0972e7f to 9d573d1 Compare August 19, 2024 19:23
@posthog-bot
Copy link
Contributor

📸 UI snapshots have been updated

1 snapshot changes in total. 0 added, 1 modified, 0 deleted:

  • chromium: 0 added, 1 modified, 0 deleted (diff for shard 2)
  • webkit: 0 added, 0 modified, 0 deleted

Triggered by this commit.

👉 Review this PR's diff of snapshots.

@posthog-bot
Copy link
Contributor

📸 UI snapshots have been updated

3 snapshot changes in total. 0 added, 3 modified, 0 deleted:

Triggered by this commit.

👉 Review this PR's diff of snapshots.

@posthog-bot
Copy link
Contributor

📸 UI snapshots have been updated

2 snapshot changes in total. 0 added, 2 modified, 0 deleted:

  • chromium: 0 added, 2 modified, 0 deleted (diff for shard 1)
  • webkit: 0 added, 0 modified, 0 deleted

Triggered by this commit.

👉 Review this PR's diff of snapshots.

@posthog-bot
Copy link
Contributor

📸 UI snapshots have been updated

2 snapshot changes in total. 0 added, 2 modified, 0 deleted:

  • chromium: 0 added, 2 modified, 0 deleted (diff for shard 1)
  • webkit: 0 added, 0 modified, 0 deleted

Triggered by this commit.

👉 Review this PR's diff of snapshots.

@posthog-bot
Copy link
Contributor

📸 UI snapshots have been updated

2 snapshot changes in total. 0 added, 2 modified, 0 deleted:

  • chromium: 0 added, 2 modified, 0 deleted (diff for shard 1)
  • webkit: 0 added, 0 modified, 0 deleted

Triggered by this commit.

👉 Review this PR's diff of snapshots.

@posthog-bot
Copy link
Contributor

📸 UI snapshots have been updated

1 snapshot changes in total. 0 added, 1 modified, 0 deleted:

  • chromium: 0 added, 1 modified, 0 deleted (diff for shard 2)
  • webkit: 0 added, 0 modified, 0 deleted

Triggered by this commit.

👉 Review this PR's diff of snapshots.

@posthog-bot
Copy link
Contributor

📸 UI snapshots have been updated

1 snapshot changes in total. 0 added, 1 modified, 0 deleted:

  • chromium: 0 added, 1 modified, 0 deleted (diff for shard 2)
  • webkit: 0 added, 0 modified, 0 deleted

Triggered by this commit.

👉 Review this PR's diff of snapshots.

@tkaemming tkaemming merged commit 3e28b4d into master Aug 19, 2024
93 checks passed
@tkaemming tkaemming deleted the property-groups-reading branch August 19, 2024 22:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants