-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Support nullable materialized columns using native types #26448
Conversation
1c0d495
to
a8269bc
Compare
materialize("events", "withmat_nullable", is_nullable=True) | ||
self.assertEqual( | ||
self._expr("properties.withmat_nullable.json.yet", context), | ||
"replaceRegexpAll(nullIf(nullIf(JSONExtractRaw(events.mat_withmat_nullable, %(hogql_val_0)s, %(hogql_val_1)s), ''), 'null'), '^\"|\"$', '')", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Going to change this so it uses JSONExtract(events.mat_withmat_nullable, %(hogql_val_0)s, %(hogql_val_1)s, 'Nullable(String)')
too (also will apply to the above), but will do that separately as it's not really coupled to this specific change and will cause a bunch of snapshot updates.
class MaterializedColumn(Protocol): | ||
name: ColumnName | ||
is_nullable: bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is necessary because the interface is defined in ee/*
but used throughout the non-ee/
codebase and I don't want to wrap everything in typing.TYPE_CHECKING
guards.
A lot of this would be simpler if we moved it to non-ee/
but not sure what the history is here so just working around it for now.
@@ -40,15 +38,36 @@ | |||
} | |||
|
|||
|
|||
class MaterializedColumn(NamedTuple): | |||
@dataclass | |||
class MaterializedColumn: | |||
name: ColumnName | |||
details: MaterializedColumnDetails |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll probably consolidate these two classes (MaterializedColumn
& MaterializedColumnDetails
) at some point (there's not much benefit to them being separate at this point) but not in a big hurry to do that.
ADD COLUMN IF NOT EXISTS {self.column.name} VARCHAR | ||
MATERIALIZED {TRIM_AND_EXTRACT_PROPERTY.format(table_column=self.column.details.table_column)} | ||
""", | ||
f"ADD COLUMN IF NOT EXISTS {self.column.name} {self.column.type} MATERIALIZED {expression}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:chef-kiss:
posthog/tasks/usage_report.py
Outdated
lib_materialized_column = get_materialized_column_for_property("events", "properties", "$lib") | ||
lib_expression = ( | ||
lib_materialized_column.name if lib_materialized_column is not None else "JSONExtractString(properties, '$lib')" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use that get_property_string_expr
method here is the only feedback I have
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI, the generated query is slightly different than the original, but not in a way that I think really matters all that much: 422d678
422d678
to
78d6041
Compare
Suspect IssuesThis pull request was deployed and Sentry observed the following issues:
Did you find this useful? React with a 👍 or 👎 |
Problem
Properties that are materialized as columns with the current implementation cannot distinguish between null values and the string value "null" and require processing when reading to lossily convert these values to nulls:
posthog/posthog/hogql/printer.py
Lines 1350 to 1354 in cf39f4b
Needing to transform these values when reading makes using data skipping indexes more difficult as the analyzer will not use the indexes for these expressions.
Additionally, property values are materialized as columns using the output of
JSONExtractRaw
with leading and trailing quotes stripped which can result in unnecessarily escaped values being contained within the stored.Also see #19461 for details.
Changes
Adds the ability to materialize properties into
Nullable(String)
types. New columns added via the management command are created as nullable by default.This uses the
JSONExtract(column, key, 'Nullable(String)')
form to avoid escaping errors (like as already happens with property groups):Compare to:
Nullable columns are only used within HogQL queries as the change could affect existing legacy queries in undefined ways due to the format change as well as the introduction as null values.
Does this work well for both Cloud and self-hosted?
Yes.
How did you test this code?
Updated existing tests to account for nullable columns where relevant.