-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compute multiple float aggregations in one go #12547
base: main
Are you sure you want to change the base?
Conversation
|
||
void initializeValueCounters() { | ||
if (values == null) { | ||
values = new float[aggregationFunctions.size()][taxoReader.getSize()]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems like a scary amount of RAM we could end up requiring. Are we sure that all the labels in taxoReader will have nonzero values? I wonder if we ought to switch to a sparse approach?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a great point. IntTaxonomyFacets
has the ability to choose sparse values if the taxonomy is large and there aren't a lot of hits. We can have the same functionality in FloatTaxonomyFacets
. This was also mentioned recently in another issue, which puts into question the way we decide between sparse and dense values.
Fundamentally, I think the user of this feature will have to decide if they can make the space for time tradeoff for computing multiple aggregations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, you mean FloatTaxonomyFacets
today is never sparse in its aggregation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is correct. Compare initializeValueCounters
for IntTaxonomyFacets
and FloatTaxonomyFacets
. I don't think there's a good reason for Int/FloatTaxonomyFacets
to differ here. Maybe sparse values just never got implemented for FloatTaxonomyFacets
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe open a spinoff to implement sparse values for FloatTaxonomyFacets
? But let's not block this otherwise great change?
This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the [email protected] list. Thank you for your contribution! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @stefanvodita.
This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the [email protected] list. Thank you for your contribution! |
@stefanvodita do you think this change is still worth moving forward after the new sandbox faceting implementation was added? Being able to compute an arbitrary number of aggregations in one pass is part of what that implementation was designed to do, so I'm not sure if it also makes sense to add this into the existing module? Curious what you think. |
I'm not sure either. Since the new aggregation engine is in sandbox, it makes sense to keep developing the old aggregation engine. On the other hand, that's not very productive if we anticipate moving the new aggregation engine out of sandbox. Personally, I like the new aggregation engine and would prefer to see it promoted out of sandbox. This PR is also badly out of date after #12966. |
This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the [email protected] list. Thank you for your contribution! |
Usually facets maintain a one-dimensional array indexed by ordinal which keeps the values they're supposed to compute.
The change here is simple in principle - use a two-dimensional array, indexed by aggregation and ordinal, so we can do multiple aggregations at once.
For the methods that get top children / all children / spcific values / dims, the default is to get the values corresponding to the first aggregation, but the aggregation can be specified.
There is one tricky bit when we aggregate using provided values sources. In this case, we advance the values sources and get the values for each aggregation before iterating through the ordinals in each doc. When we iterate through the ordinals, we load each of the values we've already retrieved and update the corresponding accumulator.
Addresses #12546