Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scp4 #2898

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Scp4 #2898

wants to merge 2 commits into from

Conversation

mschmidt00
Copy link

No description provided.

@codecov-commenter
Copy link

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 77.86%. Comparing base (cfd6889) to head (69fb750).
Report is 18 commits behind head on master.

Additional details and impacted files
@@             Coverage Diff              @@
##             master    #2898      +/-   ##
============================================
- Coverage     77.87%   77.86%   -0.01%     
+ Complexity    13578    13577       -1     
============================================
  Files          1015     1015              
  Lines         59308    59310       +2     
  Branches       6835     6837       +2     
============================================
- Hits          46184    46181       -3     
- Misses        10817    10821       +4     
- Partials       2307     2308       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@Cole-Greer Cole-Greer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for putting this together Michael. I think this is a great suggestion to push into our docs. The gremlin language is currently stuck in a weird limbo where the specification is partially defined by docs, partially by tests, and partially by the reference implementation. This leaves many areas unclear as to what should be part of the language specification, and what is simply an implementation detail.

I agree that the language should not enforce any evaluation model on providers, and that users should use explicit steps to enforce a particular evaluation order when desired.

Unfortunately I don't think we currently have a perfect tool to force lazy evaluation in all cases (detailed below). We will either need to add more explicit documentation detailing the nuances of forcing a lazy evaluation order on bulked traversers or add a dedicated step with the explicit purpose of enforcing a lazy evaluation.


=== Introduction ===

Gremlin comes with conventions and mechanisms to control the flow strategy for traversal processing: _lazy evaluation_ is conceptually a depth-first evaluation paradigm that follows as a natural result from the pull-based stacked iterator model (as implemented in the Apache Tinkerpop OLTP engine), whereas _eager evaluation_ enforces a Gremlin step to process all its incoming traversers before passing any results to the subsequent step.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit:

Suggested change
Gremlin comes with conventions and mechanisms to control the flow strategy for traversal processing: _lazy evaluation_ is conceptually a depth-first evaluation paradigm that follows as a natural result from the pull-based stacked iterator model (as implemented in the Apache Tinkerpop OLTP engine), whereas _eager evaluation_ enforces a Gremlin step to process all its incoming traversers before passing any results to the subsequent step.
Gremlin comes with conventions and mechanisms to control the flow strategy for traversal processing: _lazy evaluation_ is conceptually a depth-first evaluation paradigm that follows as a natural result from the pull-based stacked iterator model (as implemented in the Apache TinkerPop OLTP engine), whereas _eager evaluation_ enforces a Gremlin step to process all its incoming traversers before passing any results to the subsequent step.


# By wrapping the groupCount() and select() into a local() step, users can enforce lazy
# execution behavior:
gremlin> g.V().hasLabel('person').local(groupCount('x').select('x'))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe local() as it currently exists is a perfect tool to enforce lazy evaluation. When running in TinkerGraph, it does indeed produce the intended result for this example, however it can break down when dealing with bulked traversers. Local step processes one (possibly bulked) traverser at a time which allows it to sometimes operate on multiple values at once.

With LazyBarrierStrategy disabled (to avoid hidden barrier() steps), the following example works as expected with a lazy evaluation:

gremlin> g.withoutStrategies(LazyBarrierStrategy).V().both().hasLabel('person').local(groupCount('x').select('x'))
==>[v[2]:1]
==>[v[2]:1,v[4]:1]
==>[v[1]:1,v[2]:1,v[4]:1]
==>[v[1]:2,v[2]:1,v[4]:1]
==>[v[1]:2,v[2]:1,v[4]:2]
==>[v[1]:2,v[2]:1,v[4]:2,v[6]:1]
==>[v[1]:3,v[2]:1,v[4]:2,v[6]:1]
==>[v[1]:3,v[2]:1,v[4]:3,v[6]:1]

However, if a barrier is injected prior to the local() step, the result is a mix of lazy and eager evaluation:

gremlin> g.withoutStrategies(LazyBarrierStrategy).V().both().hasLabel('person').barrier().local(groupCount('x').select('x'))
==>[v[2]:1]
==>[v[2]:1,v[4]:3]
==>[v[2]:1,v[4]:3]
==>[v[2]:1,v[4]:3]
==>[v[1]:3,v[2]:1,v[4]:3]
==>[v[1]:3,v[2]:1,v[4]:3]
==>[v[1]:3,v[2]:1,v[4]:3]
==>[v[1]:3,v[2]:1,v[4]:3,v[6]:1]

Unfortunately flatMap() also does not produce the intended results in this case as it pushes a single value from a bulked traverser through the child traversal, and then reapply the bulk to the result, instead of processing each value individually:

gremlin> g.withoutStrategies(LazyBarrierStrategy).V().both().hasLabel('person').barrier().flatMap(groupCount('x').select('x'))
==>[v[2]:1]
==>[v[2]:1,v[4]:1]
==>[v[2]:1,v[4]:1]
==>[v[2]:1,v[4]:1]
==>[v[1]:1,v[2]:1,v[4]:1]
==>[v[1]:1,v[2]:1,v[4]:1]
==>[v[1]:1,v[2]:1,v[4]:1]
==>[v[1]:1,v[2]:1,v[4]:1,v[6]:1]

I'm not aware of any steps in TinkerPop that address this specific concern as I don't believe there have been significant attempts to control evaluation ordering in the past. In order to produce the intended behaviour here, we will need a step which works similarly to flatMap(), but instead actually executes the child traversal on each individual value instead of applying a bulk to the result. This would be inefficient and unhelpful for all cases except those which have some sort of aggregation in the child traversal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants