Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support path semantic in non-recursive-path #4405

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

excaliburwyj
Copy link
Contributor

@excaliburwyj excaliburwyj commented Oct 22, 2024

Description

For non-recursive paths,such as "match (a)-[b]-(c)-[d]-(e) return e;", path semantics do not work.
support has been added:

  1. If a path name is not explicitly defined, it will be implicitly created for use in the path semantics filter.
  2. An additional path semantic filter plan, such as no_equals of nodes or relationships, will append on top of the hash join or cross process.

Fixes # (issue)

Contributor agreement

Copy link
Contributor

@andyfengHKU andyfengHKU left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @excaliburwyj,

Thanks for the contribution. I think the overall idea make sense. Though I'm in middle of migrating recursive join to a different parallel computation framework (#4404). I'll review this in detail during the weekend. Hope this works fine for you.

@andyfengHKU
Copy link
Contributor

Hi @excaliburwyj,

Thanks for the contribution. I think the overall idea make sense. Though I'm in middle of migrating recursive join to a different parallel computation framework (#4404). I'll review this in detail during the weekend. Hope this works fine for you.

Whoops, forgot to review this PR. Let me do it today. Sorry about this.

Copy link
Contributor

@andyfengHKU andyfengHKU left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @excaliburwyj,

As I'm reviewing this PR, another idea pops into my mind. Instead of first finding a join operator on top of recursive join and then append a filter (which is the logic you implemented in PathSemanticRewriter), I wonder if the following approach will be simpler.

First we add a scalar function hasDuplication in the system. It will be a function that take arbitrary number of nodes, or relations (a relation can be recursive), e.g.

MATCH (a)-[e]->(b)-[]->(c) WHERE NOT hasDuplication(a, b, c)

The above is an equivalent form of

MATCH p = (a)-[e]->(b)-[]->(c)  WHERE is_acyclic(p)

The difference is that it doesn't require a path variable.

Once this is done, for a given MATCH query, we can check if the current semantic is trail or acyclic. If so, and binding stage, we directly add a predicate expression hasDuplication with either nodes or relations depends on the semantic.

In this approach, we can limited change in function module and binder module. And we don't need to worry about planning and optimization.

Let me know what you think.

@excaliburwyj
Copy link
Contributor Author

Hi @andyfengHKU,
I added the PathSemanticRewriter in the Optimizer mainly for the following reason: From the user's perspective, when I set call recursive_pattern_semantic='TRAIL'/'ACYCLIC'. I would naturally expect the following query to produce different results compared to the 'WALK' mode.
MATCH (a)-[e]->(b)-[]->(c) return c
If must modify the query code by manually adding a filter function, although it can still achieve the path semantics, it could potentially cause confusion. I am curious to know whether you think it is reasonable.

@andyfengHKU
Copy link
Contributor

Hi @excaliburwyj,

I totally agree with the idea that we should generate a different result when the semantic is set to TRAIL/ACYCLIC.

What I'm suggesting is that, at implementation level, we construct an additional filter in the binder and this is hidden from the user. For example, user inputs are

CALL recursive_pattern_semantic='TRAIL';
MATCH (a)-[e1]->(b)-[e2]->(c) return c

Internally we perform a rewrite to the second query as

MATCH (a)-[e1]->(b)-[e2]->(c) WHERE NOT has_duplication(e1, e2) return c

User is not aware of this rewrite but we can achieve a trail semantic output without the need of a PathSemanticRewriter

@excaliburwyj
Copy link
Contributor Author

Hi @andyfengHKU
Indeed, implementing it at the binder stage makes more sense than PathSemanticRewriter in Optimizer. I will modify it according to your suggestion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants