-
Notifications
You must be signed in to change notification settings - Fork 543
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enhance the nested type access for Generic and DuckDB dialect #1541
Closed
Closed
Changes from all commits
Commits
Show all changes
5 commits
Select commit
Hold shift + click to select a range
221b9dc
extract `support_period_map_access_key` config
goldmedal 2778211
handle the chain of the subscript and map accesses for generic and du…
goldmedal cd7b567
Merge branch 'main' into feature/1533-dereference-expr-2
goldmedal a4a5448
fix the doc test
goldmedal dc5e540
fix doc
goldmedal File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah to your comment I'm thinking it makes sense to already in this PR merge the subscript behavior/representation into mapaccess? thinking that looks like it'll resolve both issues and adding a new dialect flag and extending the two codepaths compounds the issue it seems.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank @iffyio. Indeed, if we merge them in this PR, we can fix many things. It could be a big refactor 🤔
I have two candidate proposals for it:
Merge
Subscript
intoMapAcess
and renameMapAccess
Expr::Subscript
and add a newMapAccessSyntax::Slice
for[1:5]
SQLMapAcess
toElementAccess
for the elements access ofMap
andArray
.Remove
MapAccess
and integrate withCompositeAccess
CompositeAccess
is a syntax structure forexpr1.expr2
. I think we can use it to represent the period map access. We can useCompositeAccess
andSubscript
to present the access chain likeexpr1.expr2[1].expr3
Then, we don't need
MapAccess
for the chain.What do you think? Which one do you prefer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The first option to merge subscript into mapaccess sounds reasonable! I'm thinking we could skip the rename at least to start with to keep the breakage minimal and I'm imagining it shouldn't be as large of a change in that case, wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think removing
Subscript
causes more breakage than removingMapAccess
. 🤔Do you know how many downstream projects use
MapAccess
? I found thatDataFusion
hasn’t implemented it, butSubscript
is used to handle array syntax. I'm not sure which one is better.I drafted a version for option 2: #1551.
It still has some issues with
Expr::Method
parsing, but I think it preservesSubscript
and avoids a significant breaking change for downstream projects.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the way, #1551 can also support the syntax like
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer to keep the nested representation for the following reasons:
a[1]
) and maps (a['field']
).Subscript
andCompositeAccess
(and possiblyMethod
, if needed 🤔), we can cover the entire syntax of access chains without requiring users to introduce additionalExpr
. This makes the SQL syntax more stable. Some examples include:There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was primarily thinking it would be easier and more efficient to traverse the ast if the above variants are expressed as a chain without nesting. Currently, both the parser and downstream crates tend to struggle with the recursive nature when there's a lot of nesting going on in large/complex sql.
Offhand, not sure I have a full picture of either approach though, its not super clear to me what the disadvantage would be with linear, or if there are advantages to using a nested representation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's just my assumption. If I'm wrong, please correct me.
I mean if we accept the nested presentation, we don't need to change
Subscript
and won't make breaking changes for the array or map syntax. I'm not sure about other downstream projects. At least, DataFusion won't be broken. If it would cause big breaking change, I think reconsidering about it 🤔When working on this issue, I found we implemented many different
Expr
for similar purposes (access chain). For exampleMapAccess
fora[1][2][3]
ora[1].b.c[3]
Subscript
fora[1]
ora[1][2]
, ...JsonAccess
fora.b[0].c
, ...CompoundIdentifier
fora.b.c
CompositeAccess
for( .. ).a
Method
forfunc1().func2().func3()
They are the same thing, with different combinations and orderings, and maybe for various dialects. I think it also means we have various code paths for the same purposes.
I hope to use some basic components to present all of them.
I think it's possible to use a linear representation to do it but it could make a huge breaking change.
It makes sense to me. Indeed, the complex nested representation is a potential issue for performance or usage 🤔
I tried to draft a new linear representation like:
I think it can cover many cases of the access chain. I'm not sure about naming but I don't prefer to keep using
Expr::SubScript
orExpr::MapAcces
because it has turned to a different meaning. I prefer to remove both of them.What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice yeah I think the
CompoundExpr
example to represent the different variants would make a lot of sense!There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I'll follow this design in #1551. Let's close this PR 👍