-
Notifications
You must be signed in to change notification settings - Fork 808
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scp4 #2898
base: master
Are you sure you want to change the base?
Scp4 #2898
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #2898 +/- ##
============================================
- Coverage 77.87% 77.86% -0.01%
+ Complexity 13578 13577 -1
============================================
Files 1015 1015
Lines 59308 59310 +2
Branches 6835 6837 +2
============================================
- Hits 46184 46181 -3
- Misses 10817 10821 +4
- Partials 2307 2308 +1 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for putting this together Michael. I think this is a great suggestion to push into our docs. The gremlin language is currently stuck in a weird limbo where the specification is partially defined by docs, partially by tests, and partially by the reference implementation. This leaves many areas unclear as to what should be part of the language specification, and what is simply an implementation detail.
I agree that the language should not enforce any evaluation model on providers, and that users should use explicit steps to enforce a particular evaluation order when desired.
Unfortunately I don't think we currently have a perfect tool to force lazy evaluation in all cases (detailed below). We will either need to add more explicit documentation detailing the nuances of forcing a lazy evaluation order on bulked traversers or add a dedicated step with the explicit purpose of enforcing a lazy evaluation.
|
||
=== Introduction === | ||
|
||
Gremlin comes with conventions and mechanisms to control the flow strategy for traversal processing: _lazy evaluation_ is conceptually a depth-first evaluation paradigm that follows as a natural result from the pull-based stacked iterator model (as implemented in the Apache Tinkerpop OLTP engine), whereas _eager evaluation_ enforces a Gremlin step to process all its incoming traversers before passing any results to the subsequent step. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit:
Gremlin comes with conventions and mechanisms to control the flow strategy for traversal processing: _lazy evaluation_ is conceptually a depth-first evaluation paradigm that follows as a natural result from the pull-based stacked iterator model (as implemented in the Apache Tinkerpop OLTP engine), whereas _eager evaluation_ enforces a Gremlin step to process all its incoming traversers before passing any results to the subsequent step. | |
Gremlin comes with conventions and mechanisms to control the flow strategy for traversal processing: _lazy evaluation_ is conceptually a depth-first evaluation paradigm that follows as a natural result from the pull-based stacked iterator model (as implemented in the Apache TinkerPop OLTP engine), whereas _eager evaluation_ enforces a Gremlin step to process all its incoming traversers before passing any results to the subsequent step. |
|
||
# By wrapping the groupCount() and select() into a local() step, users can enforce lazy | ||
# execution behavior: | ||
gremlin> g.V().hasLabel('person').local(groupCount('x').select('x')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't believe local()
as it currently exists is a perfect tool to enforce lazy evaluation. When running in TinkerGraph, it does indeed produce the intended result for this example, however it can break down when dealing with bulked traversers. Local step processes one (possibly bulked) traverser at a time which allows it to sometimes operate on multiple values at once.
With LazyBarrierStrategy disabled (to avoid hidden barrier() steps), the following example works as expected with a lazy evaluation:
gremlin> g.withoutStrategies(LazyBarrierStrategy).V().both().hasLabel('person').local(groupCount('x').select('x'))
==>[v[2]:1]
==>[v[2]:1,v[4]:1]
==>[v[1]:1,v[2]:1,v[4]:1]
==>[v[1]:2,v[2]:1,v[4]:1]
==>[v[1]:2,v[2]:1,v[4]:2]
==>[v[1]:2,v[2]:1,v[4]:2,v[6]:1]
==>[v[1]:3,v[2]:1,v[4]:2,v[6]:1]
==>[v[1]:3,v[2]:1,v[4]:3,v[6]:1]
However, if a barrier is injected prior to the local()
step, the result is a mix of lazy and eager evaluation:
gremlin> g.withoutStrategies(LazyBarrierStrategy).V().both().hasLabel('person').barrier().local(groupCount('x').select('x'))
==>[v[2]:1]
==>[v[2]:1,v[4]:3]
==>[v[2]:1,v[4]:3]
==>[v[2]:1,v[4]:3]
==>[v[1]:3,v[2]:1,v[4]:3]
==>[v[1]:3,v[2]:1,v[4]:3]
==>[v[1]:3,v[2]:1,v[4]:3]
==>[v[1]:3,v[2]:1,v[4]:3,v[6]:1]
Unfortunately flatMap()
also does not produce the intended results in this case as it pushes a single value from a bulked traverser through the child traversal, and then reapply the bulk to the result, instead of processing each value individually:
gremlin> g.withoutStrategies(LazyBarrierStrategy).V().both().hasLabel('person').barrier().flatMap(groupCount('x').select('x'))
==>[v[2]:1]
==>[v[2]:1,v[4]:1]
==>[v[2]:1,v[4]:1]
==>[v[2]:1,v[4]:1]
==>[v[1]:1,v[2]:1,v[4]:1]
==>[v[1]:1,v[2]:1,v[4]:1]
==>[v[1]:1,v[2]:1,v[4]:1]
==>[v[1]:1,v[2]:1,v[4]:1,v[6]:1]
I'm not aware of any steps in TinkerPop that address this specific concern as I don't believe there have been significant attempts to control evaluation ordering in the past. In order to produce the intended behaviour here, we will need a step which works similarly to flatMap()
, but instead actually executes the child traversal on each individual value instead of applying a bulk to the result. This would be inefficient and unhelpful for all cases except those which have some sort of aggregation in the child traversal.
No description provided.