Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compute multiple float aggregations in one go #12547

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

stefanvodita
Copy link
Contributor

Usually facets maintain a one-dimensional array indexed by ordinal which keeps the values they're supposed to compute.
The change here is simple in principle - use a two-dimensional array, indexed by aggregation and ordinal, so we can do multiple aggregations at once.

For the methods that get top children / all children / spcific values / dims, the default is to get the values corresponding to the first aggregation, but the aggregation can be specified.

There is one tricky bit when we aggregate using provided values sources. In this case, we advance the values sources and get the values for each aggregation before iterating through the ordinals in each doc. When we iterate through the ordinals, we load each of the values we've already retrieved and update the corresponding accumulator.

Addresses #12546


void initializeValueCounters() {
if (values == null) {
values = new float[aggregationFunctions.size()][taxoReader.getSize()];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems like a scary amount of RAM we could end up requiring. Are we sure that all the labels in taxoReader will have nonzero values? I wonder if we ought to switch to a sparse approach?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great point. IntTaxonomyFacets has the ability to choose sparse values if the taxonomy is large and there aren't a lot of hits. We can have the same functionality in FloatTaxonomyFacets. This was also mentioned recently in another issue, which puts into question the way we decide between sparse and dense values.
Fundamentally, I think the user of this feature will have to decide if they can make the space for time tradeoff for computing multiple aggregations.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, you mean FloatTaxonomyFacets today is never sparse in its aggregation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is correct. Compare initializeValueCounters for IntTaxonomyFacets and FloatTaxonomyFacets. I don't think there's a good reason for Int/FloatTaxonomyFacets to differ here. Maybe sparse values just never got implemented for FloatTaxonomyFacets.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe open a spinoff to implement sparse values for FloatTaxonomyFacets? But let's not block this otherwise great change?

Copy link

github-actions bot commented Jan 8, 2024

This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the [email protected] list. Thank you for your contribution!

@github-actions github-actions bot added the Stale label Jan 8, 2024
Copy link
Member

@mikemccand mikemccand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @stefanvodita.

@github-actions github-actions bot removed the Stale label Feb 6, 2024
@stefanvodita
Copy link
Contributor Author

Thank you for the approval! I want to leave this open for now while iterating over #12966. I think I prefer doing #12966 first, since it's a more complicated change and doing it first should allow us to have a more unified solution across int/float facets.

Copy link

github-actions bot commented Mar 1, 2024

This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the [email protected] list. Thank you for your contribution!

@github-actions github-actions bot added the Stale label Mar 1, 2024
@gsmiller
Copy link
Contributor

@stefanvodita do you think this change is still worth moving forward after the new sandbox faceting implementation was added? Being able to compute an arbitrary number of aggregations in one pass is part of what that implementation was designed to do, so I'm not sure if it also makes sense to add this into the existing module? Curious what you think.

@stefanvodita
Copy link
Contributor Author

I'm not sure either. Since the new aggregation engine is in sandbox, it makes sense to keep developing the old aggregation engine. On the other hand, that's not very productive if we anticipate moving the new aggregation engine out of sandbox. Personally, I like the new aggregation engine and would prefer to see it promoted out of sandbox. This PR is also badly out of date after #12966.

@github-actions github-actions bot removed the Stale label Sep 28, 2024
Copy link

This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the [email protected] list. Thank you for your contribution!

@github-actions github-actions bot added the Stale label Oct 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants