Scp4 #2898

mschmidt00 · 2024-11-12T16:04:57Z

No description provided.

codecov-commenter · 2024-11-12T18:01:36Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 77.86%. Comparing base (cfd6889) to head (69fb750).
Report is 18 commits behind head on master.

Additional details and impacted files

@@             Coverage Diff              @@
##             master    #2898      +/-   ##
============================================
- Coverage     77.87%   77.86%   -0.01%     
+ Complexity    13578    13577       -1     
============================================
  Files          1015     1015              
  Lines         59308    59310       +2     
  Branches       6835     6837       +2     
============================================
- Hits          46184    46181       -3     
- Misses        10817    10821       +4     
- Partials       2307     2308       +1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Cole-Greer

Thanks for putting this together Michael. I think this is a great suggestion to push into our docs. The gremlin language is currently stuck in a weird limbo where the specification is partially defined by docs, partially by tests, and partially by the reference implementation. This leaves many areas unclear as to what should be part of the language specification, and what is simply an implementation detail.

I agree that the language should not enforce any evaluation model on providers, and that users should use explicit steps to enforce a particular evaluation order when desired.

Unfortunately I don't think we currently have a perfect tool to force lazy evaluation in all cases (detailed below). We will either need to add more explicit documentation detailing the nuances of forcing a lazy evaluation order on bulked traversers or add a dedicated step with the explicit purpose of enforcing a lazy evaluation.

Cole-Greer · 2024-11-13T17:43:24Z

docs/src/dev/future/proposal-scoping-5.asciidoc

+
+=== Introduction ===
+
+Gremlin comes with conventions and mechanisms to control the flow strategy for traversal processing: _lazy evaluation_ is conceptually a depth-first evaluation paradigm that follows as a natural result from the pull-based stacked iterator model (as implemented in the Apache Tinkerpop OLTP engine), whereas _eager evaluation_ enforces a Gremlin step to process all its incoming traversers before passing any results to the subsequent step.


Nit:

Suggested change

Gremlin comes with conventions and mechanisms to control the flow strategy for traversal processing: _lazy evaluation_ is conceptually a depth-first evaluation paradigm that follows as a natural result from the pull-based stacked iterator model (as implemented in the Apache Tinkerpop OLTP engine), whereas _eager evaluation_ enforces a Gremlin step to process all its incoming traversers before passing any results to the subsequent step.

Gremlin comes with conventions and mechanisms to control the flow strategy for traversal processing: _lazy evaluation_ is conceptually a depth-first evaluation paradigm that follows as a natural result from the pull-based stacked iterator model (as implemented in the Apache TinkerPop OLTP engine), whereas _eager evaluation_ enforces a Gremlin step to process all its incoming traversers before passing any results to the subsequent step.

Cole-Greer · 2024-11-13T18:20:47Z

docs/src/dev/future/proposal-scoping-5.asciidoc

+
+# By wrapping the groupCount() and select() into a local() step, users can enforce lazy
+# execution behavior:
+gremlin> g.V().hasLabel('person').local(groupCount('x').select('x'))


I don't believe local() as it currently exists is a perfect tool to enforce lazy evaluation. When running in TinkerGraph, it does indeed produce the intended result for this example, however it can break down when dealing with bulked traversers. Local step processes one (possibly bulked) traverser at a time which allows it to sometimes operate on multiple values at once.

With LazyBarrierStrategy disabled (to avoid hidden barrier() steps), the following example works as expected with a lazy evaluation:

gremlin> g.withoutStrategies(LazyBarrierStrategy).V().both().hasLabel('person').local(groupCount('x').select('x')) ==>[v[2]:1] ==>[v[2]:1,v[4]:1] ==>[v[1]:1,v[2]:1,v[4]:1] ==>[v[1]:2,v[2]:1,v[4]:1] ==>[v[1]:2,v[2]:1,v[4]:2] ==>[v[1]:2,v[2]:1,v[4]:2,v[6]:1] ==>[v[1]:3,v[2]:1,v[4]:2,v[6]:1] ==>[v[1]:3,v[2]:1,v[4]:3,v[6]:1]

However, if a barrier is injected prior to the local() step, the result is a mix of lazy and eager evaluation:

gremlin> g.withoutStrategies(LazyBarrierStrategy).V().both().hasLabel('person').barrier().local(groupCount('x').select('x')) ==>[v[2]:1] ==>[v[2]:1,v[4]:3] ==>[v[2]:1,v[4]:3] ==>[v[2]:1,v[4]:3] ==>[v[1]:3,v[2]:1,v[4]:3] ==>[v[1]:3,v[2]:1,v[4]:3] ==>[v[1]:3,v[2]:1,v[4]:3] ==>[v[1]:3,v[2]:1,v[4]:3,v[6]:1]

Unfortunately flatMap() also does not produce the intended results in this case as it pushes a single value from a bulked traverser through the child traversal, and then reapply the bulk to the result, instead of processing each value individually:

gremlin> g.withoutStrategies(LazyBarrierStrategy).V().both().hasLabel('person').barrier().flatMap(groupCount('x').select('x')) ==>[v[2]:1] ==>[v[2]:1,v[4]:1] ==>[v[2]:1,v[4]:1] ==>[v[2]:1,v[4]:1] ==>[v[1]:1,v[2]:1,v[4]:1] ==>[v[1]:1,v[2]:1,v[4]:1] ==>[v[1]:1,v[2]:1,v[4]:1] ==>[v[1]:1,v[2]:1,v[4]:1,v[6]:1]

I'm not aware of any steps in TinkerPop that address this specific concern as I don't believe there have been significant attempts to control evaluation ordering in the past. In order to produce the intended behaviour here, we will need a step which works similarly to flatMap(), but instead actually executes the child traversal on each individual value instead of applying a bulk to the result. This would be inefficient and unhelpful for all cases except those which have some sort of aggregation in the child traversal.

Schmidt and others added 2 commits November 7, 2024 16:42

proposal: eager vs. lazy execution in TP4

49bfda2

Merge branch 'apache:master' into scp4

69fb750

Cole-Greer reviewed Nov 13, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scp4 #2898

Scp4 #2898

mschmidt00 commented Nov 12, 2024

codecov-commenter commented Nov 12, 2024

Cole-Greer left a comment

Cole-Greer Nov 13, 2024

Cole-Greer Nov 13, 2024


		=== Introduction ===

		Gremlin comes with conventions and mechanisms to control the flow strategy for traversal processing: _lazy evaluation_ is conceptually a depth-first evaluation paradigm that follows as a natural result from the pull-based stacked iterator model (as implemented in the Apache Tinkerpop OLTP engine), whereas _eager evaluation_ enforces a Gremlin step to process all its incoming traversers before passing any results to the subsequent step.

Scp4 #2898

Are you sure you want to change the base?

Scp4 #2898

Conversation

mschmidt00 commented Nov 12, 2024

codecov-commenter commented Nov 12, 2024

Codecov Report

Cole-Greer left a comment

Choose a reason for hiding this comment

Cole-Greer Nov 13, 2024

Choose a reason for hiding this comment

Cole-Greer Nov 13, 2024

Choose a reason for hiding this comment