Compute multiple float aggregations in one go #12547

stefanvodita · 2023-09-09T21:18:35Z

Usually facets maintain a one-dimensional array indexed by ordinal which keeps the values they're supposed to compute.
The change here is simple in principle - use a two-dimensional array, indexed by aggregation and ordinal, so we can do multiple aggregations at once.

For the methods that get top children / all children / spcific values / dims, the default is to get the values corresponding to the first aggregation, but the aggregation can be specified.

There is one tricky bit when we aggregate using provided values sources. In this case, we advance the values sources and get the values for each aggregation before iterating through the ordinals in each doc. When we iterate through the ordinals, we load each of the values we've already retrieved and update the corresponding accumulator.

Addresses #12546

msokolov · 2023-09-19T13:03:07Z

lucene/facet/src/java/org/apache/lucene/facet/taxonomy/FloatTaxonomyFacets.java

+
+  void initializeValueCounters() {
+    if (values == null) {
+      values = new float[aggregationFunctions.size()][taxoReader.getSize()];


this seems like a scary amount of RAM we could end up requiring. Are we sure that all the labels in taxoReader will have nonzero values? I wonder if we ought to switch to a sparse approach?

This is a great point. IntTaxonomyFacets has the ability to choose sparse values if the taxonomy is large and there aren't a lot of hits. We can have the same functionality in FloatTaxonomyFacets. This was also mentioned recently in another issue, which puts into question the way we decide between sparse and dense values.
Fundamentally, I think the user of this feature will have to decide if they can make the space for time tradeoff for computing multiple aggregations.

Hmm, you mean FloatTaxonomyFacets today is never sparse in its aggregation?

That is correct. Compare initializeValueCounters for IntTaxonomyFacets and FloatTaxonomyFacets. I don't think there's a good reason for Int/FloatTaxonomyFacets to differ here. Maybe sparse values just never got implemented for FloatTaxonomyFacets.

Maybe open a spinoff to implement sparse values for FloatTaxonomyFacets? But let's not block this otherwise great change?

github-actions · 2024-01-08T12:23:40Z

This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the [email protected] list. Thank you for your contribution!

mikemccand

Thanks @stefanvodita.

stefanvodita · 2024-02-15T15:04:00Z

Thank you for the approval! I want to leave this open for now while iterating over #12966. I think I prefer doing #12966 first, since it's a more complicated change and doing it first should allow us to have a more unified solution across int/float facets.

github-actions · 2024-03-01T00:19:57Z

This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the [email protected] list. Thank you for your contribution!

gsmiller · 2024-09-27T16:24:05Z

@stefanvodita do you think this change is still worth moving forward after the new sandbox faceting implementation was added? Being able to compute an arbitrary number of aggregations in one pass is part of what that implementation was designed to do, so I'm not sure if it also makes sense to add this into the existing module? Curious what you think.

stefanvodita · 2024-09-27T16:35:48Z

I'm not sure either. Since the new aggregation engine is in sandbox, it makes sense to keep developing the old aggregation engine. On the other hand, that's not very productive if we anticipate moving the new aggregation engine out of sandbox. Personally, I like the new aggregation engine and would prefer to see it promoted out of sandbox. This PR is also badly out of date after #12966.

github-actions · 2024-10-13T00:25:36Z

This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the [email protected] list. Thank you for your contribution!

Compute multiple float aggregations in one go

be5f700

msokolov reviewed Sep 19, 2023

View reviewed changes

github-actions bot added the Stale label Jan 8, 2024

mikemccand approved these changes Feb 5, 2024

View reviewed changes

github-actions bot removed the Stale label Feb 6, 2024

github-actions bot added the Stale label Mar 1, 2024

github-actions bot removed the Stale label Sep 28, 2024

github-actions bot added the Stale label Oct 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compute multiple float aggregations in one go #12547

Compute multiple float aggregations in one go #12547

stefanvodita commented Sep 9, 2023

msokolov Sep 19, 2023

stefanvodita Sep 23, 2023

mikemccand Oct 31, 2023

stefanvodita Oct 31, 2023

mikemccand Feb 5, 2024

github-actions bot commented Jan 8, 2024

mikemccand left a comment

stefanvodita commented Feb 15, 2024

github-actions bot commented Mar 1, 2024

gsmiller commented Sep 27, 2024

stefanvodita commented Sep 27, 2024

github-actions bot commented Oct 13, 2024

Compute multiple float aggregations in one go #12547

Are you sure you want to change the base?

Compute multiple float aggregations in one go #12547

Conversation

stefanvodita commented Sep 9, 2023

msokolov Sep 19, 2023

Choose a reason for hiding this comment

stefanvodita Sep 23, 2023

Choose a reason for hiding this comment

mikemccand Oct 31, 2023

Choose a reason for hiding this comment

stefanvodita Oct 31, 2023

Choose a reason for hiding this comment

mikemccand Feb 5, 2024

Choose a reason for hiding this comment

github-actions bot commented Jan 8, 2024

mikemccand left a comment

Choose a reason for hiding this comment

stefanvodita commented Feb 15, 2024

github-actions bot commented Mar 1, 2024

gsmiller commented Sep 27, 2024

stefanvodita commented Sep 27, 2024

github-actions bot commented Oct 13, 2024