-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(hogql): Allow lazy joins on lazy tables with requested fields #20731
Conversation
Size Change: 0 B Total Size: 815 kB ℹ️ View Unchanged
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good. I did add some 🤔 inline, and I'm also wondering if there's a reason source_table_key_hogql
and joining_table_key_hogql
can't just be rolled into source_table_key
and source_table_key
?
@pytest.mark.usefixtures("unittest_snapshot") | ||
def test_lazy_join_on_lazy_table_with_person_properties(self): | ||
DataWarehouseJoin( | ||
team=self.team, | ||
source_table_name="persons", | ||
source_table_key="$hogql", | ||
source_table_key_hogql="properties.email", | ||
joining_table_name="events", | ||
joining_table_key="event", | ||
field_name="events", | ||
).save() | ||
|
||
printed = self._print_select("select events.event from persons") | ||
assert printed == self.snapshot |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This example seems more artificial than the others. 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, might be good to test these tests with both materialized and non-materialized properties.
and node.type is not None | ||
and node.type.tables.get(join_to_add_table_name) is not None | ||
): | ||
node.type.tables.pop(join_to_add_table_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this cause problems in some case? Is there any time when a legit join with the same name could have been done to a table?
E.g.
select event, person.properties.email as real_email
from events
left join person as person
on (person.properties.email = events.properties.tr00_mail)
left join person as person2
on (person2.properties.email = events.properties.fake_mail)
I haven't followed the code to verify if this would be an issue or not, but would this run?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe in this example, this would be done on the alias and not the table itself, (e.g. person
and person2
). But I'll test this to verify
@mariusandra Gonna re-request your review, managed to redo this without having to resolve types twice etc, feels nicer now and we even support table aliases 😱 |
📸 UI snapshots have been updated3 snapshot changes in total. 0 added, 3 modified, 0 deleted:
Triggered by this commit. |
📸 UI snapshots have been updated2 snapshot changes in total. 0 added, 2 modified, 0 deleted:
Triggered by this commit. |
📸 UI snapshots have been updated1 snapshot changes in total. 0 added, 1 modified, 0 deleted:
Triggered by this commit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments but overall makes sense. Good catch on the aliasing
|
||
def visit_field(self, node: ast.Field): | ||
for constraint in self.overrides: | ||
if node.chain == constraint.chain_to_replace: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is traversing the entire query again, is it possible that the chain_to_replace matches something unintentionally? It's a super nit edge case but if an outer query reuses an alias/table name that exists on an inner query, it could get overwritten by this override?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1b99050
to
bf80a20
Compare
📸 UI snapshots have been updated2 snapshot changes in total. 0 added, 2 modified, 0 deleted:
Triggered by this commit. |
📸 UI snapshots have been updated1 snapshot changes in total. 0 added, 1 modified, 0 deleted:
Triggered by this commit. |
📸 UI snapshots have been updated1 snapshot changes in total. 0 added, 1 modified, 0 deleted:
Triggered by this commit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It got hard to follow in the lazy_table.py
part, but I'm going to assume all is good and/or you'll fix it if not 😅.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change is concerning
|
||
# When joining a lazy table to another lazy table, the joined table doesn't get resolved | ||
# Doing another pass solves this for us | ||
if self.lazy_finder_counter < 20: | ||
lazy_finder = LazyFinder() | ||
lazy_finder.visit(node) | ||
if lazy_finder.found_lazy: | ||
self.lazy_finder_counter = self.lazy_finder_counter + 1 | ||
self.visit_select_query(node) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels like code nobody will dare touch for years 😅. I wonder if there's a smarter way to just flag tables that should be revisited, instead of doing it for everything?
Problem
LazyJoin
ontopersons
that uses any constraint field other thanperson.id
, thelazy_tables
resolver doesn't include the constraint fields into thepersons
LazyTable
subquery, and thus causes the lazy join to fail field resolutionChanges
Passes in an alias to the subquery to cater for selecting the likes ofproperties.field
, and update the same alias in the join function constraintlazy_tables
, but I've pushed this functionality down to the individualjoin_function
instead. This isn't ideal, as each new lazy join may need to implement the same logic, but the complexity of finding the rightast.Field
's and their associated types from withinlazy_tables
to update felt hackier and more prone to errorlazy_tables
Edit:
TableAliasType
nodes with a child of eitherLazyJoinType
orLazyTableType
- turns out lazy tables just wasn't running on anything that was aliased before, god knows how this managed to run without erroring (or maybe it just returned bad data..)LazyJoin
from_field
field now supports full field chains (e.g.[properties.email]
), and have added an optionalto_field
to help support the data warehouse joins - this is only used in the override logic and is only needed when a dot notated field is used on the right side of the join constraintHow did you test this code?