Skip to content
This repository has been archived by the owner on Apr 4, 2021. It is now read-only.

How to calculate and produce count of empty aggregations #318

Open
szarnyasg opened this issue Feb 23, 2018 · 5 comments
Open

How to calculate and produce count of empty aggregations #318

szarnyasg opened this issue Feb 23, 2018 · 5 comments
Assignees
Milestone

Comments

@szarnyasg
Copy link
Member

Handling aggregations incrementally is tricky. For example, the simple query of

MATCH (n)
RETURN count(n)

should return a single row (containing 0) for an empty database. So, the philosophical question is, for an empty aggregation set, do we return an 0 or nothing?

The issue can be demonstrated with a PostgreSQL console:

postgres=#
  select count(a)
  from (select 1 as a) as subq
  where a = 2;

count
-------
0
(1 row)

However, if we aggregate for a:

postgres=#
  select count(a)
  from (select 1 as a) as subq
  where a = 2
  group by a;

 count 
-------
(0 rows)

Or, if you think aggregating for a is ugly, we can aggregate for b:

postgres=#
  select count(a)
  from (select 1 as a, 1 as b) as subq
  where a = 2
  group by b;

 count 
-------
(0 rows)

The issues of handling nulls for OPTIONAL MATCH clauses is also related...

@szarnyasg
Copy link
Member Author

It's worth checking the Postgres docs:

If a query contains aggregate function calls, but no GROUP BY clause, grouping still occurs: the result is a single group row (or perhaps no rows at all, if the single row is then eliminated by HAVING). The same is true if it contains a HAVING clause, even without any aggregate function calls or GROUP BY clause.

end of 7.2.3: https://www.postgresql.org/docs/9.6/static/queries-table-expressions.html#QUERIES-GROUP

via @jmarton

@szarnyasg szarnyasg added this to the spring 2018 milestone Feb 26, 2018
@szarnyasg szarnyasg self-assigned this Mar 2, 2018
@szarnyasg
Copy link
Member Author

szarnyasg commented Mar 4, 2018

Related literature (thanks to @bergmanngabor): https://dl.acm.org/citation.cfm?id=137852

@szarnyasg
Copy link
Member Author

szarnyasg commented Mar 4, 2018

This causes BI Q7 to break.

A simplified version shows the issue:

MATCH (message2:Message)
OPTIONAL MATCH (message2:Message)<-[like:LIKES]-(p3:Person)
RETURN message2.id AS m, count(like) AS likes
ingraph results: List(ArrayBuffer((likes,1), (m,44)), ArrayBuffer((likes,1), (m,88)))
neo4j results: List(ArrayBuffer((likes,1), (m,44)), ArrayBuffer((likes,1), (m,88)), ArrayBuffer((likes,0), (m,99)))

@szarnyasg
Copy link
Member Author

The problem can be pinpointed to the following condition in AggregationNodes:

if (oldValues != newValues)

Of course, this line is there for a reason...

@szarnyasg
Copy link
Member Author

Currently, this breaks BI Q21.
image

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant