-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Draft] [Star Tree] [Search] Resolving Date histogram with metric aggregation using star-tree #16674
base: main
Are you sure you want to change the base?
Conversation
❌ Gradle check result for d7fbc39: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
@@ -136,6 +140,35 @@ public void collect(int doc, long bucket) throws IOException { | |||
public LeafBucketCollector getStarTreeCollector(LeafReaderContext ctx, LeafBucketCollector sub, CompositeIndexFieldInfo starTree) | |||
throws IOException { | |||
final CompensatedSum kahanSummation = new CompensatedSum(sums.get(0), 0); | |||
if (parent != null && subAggregators.length == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason why this can't be the default collector regardless of parent collectors ?
when there is only metric aggregation, there will be a single bucket otherwise multiple buckets.
|
||
@Override | ||
public void collect(int doc, long bucket) throws IOException { | ||
sub.collect(doc, bucket); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we throw UnSupportedException
here - star tree collectors can strictly enforce this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense.
|
||
@Override | ||
public void setScorer(Scorable s) throws IOException { | ||
sub.setScorer(s); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even here we should be able to throw UnSupportedException
since there is no scorer for star tree
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now I am not using StarTreeLeafBucketCollectorBase
at all. I am thinking of getting rid of it if if does not serves much utility.
Earlier I thought I might need to delegate operations to original collector if required.
long bucketOrd = bucketOrds.add(0, dimensionValue); | ||
if (bucketOrd < 0) { | ||
bucketOrd = -1 - bucketOrd; | ||
collectStarTreeBucket((StarTreeBucketCollector) sub, metricValue, bucketOrd, bit); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to use collectExistingStarBucket
/ collectStarBucket
paradigm here ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can refactor to it if required. Both the utilities differ by grow(bucketOrd + 1);
which I had used manually for now during POC but will abstract it in a better way.
@@ -164,7 +172,8 @@ private static StarTreeResult traverseStarTree(StarTreeValues starTreeValues, Ma | |||
|
|||
String childDimension = dimensionNames.get(dimensionId + 1); | |||
StarTreeNode starNode = null; | |||
if (globalRemainingPredicateColumns == null || !globalRemainingPredicateColumns.contains(childDimension)) { | |||
if (globalRemainingPredicateColumns == null |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if( (globalRemainingPredicateColumns == null
|| !globalRemainingPredicateColumns.contains(childDimension)) && !remainingGroupByColumns.contains(childDimension) )
Predicate conditions should be grouped together
CompositeDataCubeFieldType compositeIndexFieldInfo, | ||
AggregatorFactory aggregatorFactory | ||
) { | ||
if (aggregatorFactory instanceof DateHistogramAggregatorFactory && aggregatorFactory.getSubFactories().getFactories().length == 1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can probably have a recursive method which checks if parent aggregator is star tree supported and if children aggregators are supported and so on.
And we can make the supported aggregators configurable.
|
||
public abstract class StarTreeBucketCollector extends LeafBucketCollector { | ||
|
||
public abstract void collectStarEntry(int starTreeEntry, long bucket) throws IOException; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it would be good to have comment around what's a star tree entry, unless already specified somewhere
} | ||
|
||
@Override | ||
public void collectStarEntry(int starTreeEntry, long bucket) throws IOException {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this mean parent aggregator will always have the entire context to traverse over the star tree ?
What if there is a filter sub aggregation which needs to further reduce the number of applicable dimension values ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, I don't like the deepening association with the Collector
interface, given that we're not collecting documents from a Lucene query.
Instead, could we try to work at the Aggregator
level more?
I think the place where you should hook more of the star tree stuff in would be the getLeafCollector(LeafReaderContext)
method in AggregatorBase
:
OpenSearch/server/src/main/java/org/opensearch/search/aggregations/AggregatorBase.java
Line 202 in c0852f8
public final LeafBucketCollector getLeafCollector(LeafReaderContext ctx) throws IOException { |
I think that's where you could (per-segment) make your choice about whether you are going to delegate to the aggregation logic to return a real LeafBucketCollector
, or you're going to throw a CollectionTerminatedException
. You can even pass the subAggregators
to that logic so the parent has easy access to its children.
Essentially, in the star tree case, you should never need to return a collector. You can just read the values directly from the segment.
public final void collectStarTreeBucket(StarTreeBucketCollector subCollector, long docCount, long bucketOrd, int entryBit) | ||
throws IOException { | ||
if (docCounts.increment(bucketOrd, docCount) == docCount) { | ||
multiBucketConsumer.accept(0); | ||
} | ||
subCollector.collectStarEntry(entryBit, bucketOrd); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been thinking through this and it mostly makes sense.
I guess my one objection is the use of the word "collect", since it's not really related to collectors (or rather, it doesn't need to be).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That said, see my comment above -- I think you could just intercept the whole thing earlier and not have to do a custom collector.
Hi @msfroh @sandeshkr419 , Currently, lets say we have nested aggregation as follows :
Totally makes sense to throw It works well if there are no nested collectors. But am just wondering how to handle nested aggs , for the above example we need Do we need to return a new interface/class specific for star tree in that case for aggregation ?
Because we can't throw And if we don't have new interface/class, then still subCollector is of type Please correct me if my understanding is wrong. |
Signed-off-by: Sandesh Kumar <[email protected]>
Signed-off-by: Sandesh Kumar <[email protected]>
@msfroh @bharath-techie Refactored the changes to not extend LeafBucketCollectors. However, I have introduced a new interface Now, introducing this new method can compliment with Froh's idea on unifying pre-computations, I'll leave comments on that PR itself to make it more friendly while pre-computing sub-aggregations. |
❌ Gradle check result for e1898f1: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
@sandeshkr419 Can we please validate |
|
||
import java.io.IOException; | ||
|
||
public interface StarTreeCollector { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe call this StarTreeAggregator
, since it's expected that Aggregator
subclasses would implement this interface?
// Resolve via star-tree | ||
// if (queryStarTree(ctx, leafCollector)) { | ||
// return; | ||
// } | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, in practice, you won't need any more changes in ContextIndexSearcher
in this PR, right?
multiBucketConsumer.accept(0); | ||
} | ||
// Only collect own bucket & not sub-aggregator buckets | ||
// subCollector.collectStarEntry(entryBit, bucketOrd); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line won't exist, right?
Could we also do this by calling collectExistingBucket
with LeafBucketCollector.NO_OP_COLLECTOR
?
Comparing the 2 solutions proposed in 2 commits:
Response (redacted):
Now to update the sum bucket, we'd have to iterate over Basically, a collector like interface which has hold of sub-collectors (and we recursively call up collectStarTreeEntry - which will be an analogous method to collect/collectBucjket what we have for Lucene documents) might simplify the solution what @bharath-techie also proposed earlier. Let me know your thoughts on this. |
I don't understand what benefit you get from extending |
Hi @sandeshkr419 ,
Will this work ? Basically the parent must check if all subAggregators are of type So all aggregators will have [ We can have better interfaces as well but the idea is to enable this within our interfaces and make it work similar to existing leafCollector interface ] Edit : [ 25/12 ]
|
Description
This is the draft set of changes which I am utilizing to discuss solution approach here.
The primary challenge to resolve a date histogram with metric aggregation was to figure out how sub-aggregators will get resolve. When resolving a query by star-tree, we lose lthe need of ucene documents and don't utilize
collect()
calls which are used internally to delegate the collection of sub-aggregations to sub-collectors.To mitigate this challenge, I have introduced a wrapper class -
StarTreeBucketCollector
to basically introduce acollectStarEntry(int starTreeEntry, long bucket)
method. This method is then overridden in metric aggregator methods and invoked from parent aggregators (here DateHistogramAggregator).The benefit of this strategy is that this is easily extensible by other bucket aggregators where metric aggregations will be nested. Also, other bucket related utilities are re-useable as it is, it saves the effort of having a separate set of utilities for star tree buckets as the old buckets are utilized here.
Want to take early feedback on this approach.
Note: Things are hard-coded for one example query shape right now
Related Issues
Resolves (#16552)
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.