Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Aggregations #218

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
110 changes: 110 additions & 0 deletions cip/1.accepted/CIP2017-04-13-Aggregations.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,110 @@
= CIP2017-04-13 Aggregations
:numbered:
:toc:
:toc-placement: macro
:source-highlighter: codemirror

*Author:* Tobias Lindaaker <[email protected]>

toc::[]

== Aggregations

=== Syntax

[source, ebnf]
----
Return = 'RETURN', ['DISTINCT'], ReturnBody, Filter ;
With = 'WITH', ['DISTINCT'], ReturnBody, Filter ;
SingleValueReturn = 'RETURN', (Expression | ProjectedMap | Aggregation), Filter ;

ReturnBody = ('*' | ReturnItem), {',', ReturnItem} ;
ReturnItem = (Expression, ['AS', Variable])
| (Aggregation, ['AS', Variable])
| (ProjectedMap, 'AS', Variable)
;

Filter = [Where], [Order], [Skip], [Limit] ;

Aggregation = Aggregator, 'OF', Expression ;
Aggregator = SymbolicName | ExtensionName ;
ExtensionName = {SymbolicName, '.'}-, SymbolicName ;
----

=== Examples

[source, cypher]
.Aggregation using `avg`
----
MATCH (employee:Employee)
RETURN avg OF employee.salary
----

[source, cypher]
.Aggregation using `collect`
----
MATCH (person:Person)-[:FRIEND]-(friend)
RETURN person.email, collect OF friend {.name,.email} AS friends
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the OF syntax, but collect is the operation I would like to change if we proceed with OF. collect is a verb, but a noun fits better here, which makes me think we should rename this to collection, or even list:

MATCH (person:Person {name: $name})
RETURN list OF person.age

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having just read through, I was thinking exactly the same thing, both "nounifying" the term and switching to list. It strikes me that this also highlights the return type of the operation.

----

[source, cypher]
.Aggregation using `count`
----
MATCH (nodes) RETURN count OF nodes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For count(*), are you considering count OF *?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had not thought of that, but that seems sensible.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been wondering if we shouldn't just mandate always writing a kind-of list comprehension syntax for aggregation, to make more visible what's going on

collect [ n.prop | * ]
max [ n.weight | * ]

this would allow re-using aggregation function calling syntax with regular lists!

count [ n | n IN [ 1, 2, 3, 4 ] ]

or even shorter

count [ n | 1, 2, 3, 4 ]

I agree, we should just use aggrF(args) ... to pass in args to the aggregation function.

But I think it's important to have syntax that allows us to call aggregation functions for aggregations as well as over ordinary lists.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That syntax seems pretty far away from what this CIP suggests. Could we adapt it to a better fit?

Just allow any expression after the OF, where only property expressions and expressions that evaluate to a list are valid:

max OF n.weight 
max OF [1, 2, 3, 4] // semantics depend on expression value

Although this is not ideal when considering list properties...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some similar ideas are discussed in the Python generator expression PEP -> https://www.python.org/dev/peps/pep-0289/

----

[source, cypher]
.Aggregation using `max`
----
MATCH (person:Person)-[:LIVES_IN]->({country:$country})
RETURN max OF person.age
----

[source, cypher]
.Aggregation using `min`
----
MATCH (movie:Movie)
RETURN min OF movie.duration
----

[source, cypher]
.Aggregation using `percentileCont`
----
BREAKS DOWN IN THIS SYNTAX
Copy link
Member

@Mats-SX Mats-SX Apr 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we could consider allowing arguments to the aggregator on the left-hand side of the OF?

MATCH (n)
RETURN percentileCont($percentile) OF n.property

Copy link

@petraselmer petraselmer Apr 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice ^^

How about - so we have a few options to give food for thought - a parentheses-free version:

MATCH (n)
RETURN percentileCont OF n.property GIVEN $percentile

I also played around with using WITH instead of GIVEN, but that is way too overloaded.

If we ever have aggregations with more than one argument of this type, e.g. myAgg(expression, arg1, arg2) in today's syntax, these are the options we'd have (continuing with the same examples as above):

  1. ... RETURN myAgg($arg1, @arg2) OF n.property
  2. ... RETURN myAgg OF n.property GIVEN $arg1, @arg2

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I thought about similar things on my way home after having pushed this. What I thought of then was that, at least in this case, the parameter describes which particular aggregate value to get, so perhaps a subscript operator would be appropriate RETURN percentileCont[0.4] OF n.property

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, the general - or multi-arg - case would be RETURN myAgg[$arg1, $arg2]? Or, is this not really 'general-izable'?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you bracket the expression on the RHS of OF? For example:

RETURN map OF (key, value)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@technige Interesting idea. Are you considering the case when both key and value are parameters that change across records, or when only one of them are?

I kind of like using OF as a separator between arguments that are read once versus arguments that are the actual substance of the aggregation.

----

[source, cypher]
.Aggregation using `percentileDisc`
----
BREAKS DOWN IN THIS SYNTAX
----

[source, cypher]
.Aggregation using `stDev`
----
MATCH (person:Person)
RETURN stDev OF n.age
----

[source, cypher]
.Aggregation using `stDevP`
----
MATCH (person:Person)
WHERE person.name IN $names
RETURN stDevP OF n.age
----

[source, cypher]
.Aggregation using `sum`
----
MATCH (sale:Sale)
WHERE date({quarterOf:date()}) = date({quarterOf:sale.date})
RETURN sum OF sale.value AS `sales this quarter`
----

[source, cypher]
.Aggregation using user-defined aggregator
----
MATCH (node)
RETURN org.thobe.FancyComputation OF node
----