Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce duplication in taxonomy facets; always do counts #12966

Merged
merged 2 commits into from
Apr 5, 2024

Conversation

stefanvodita
Copy link
Contributor

@stefanvodita stefanvodita commented Dec 22, 2023

Note

This is a large change, refactoring most of the taxonomy facets code and changing internal behavior, without changing the API. There are specific API changes this sets us up to do later, e.g. retrieving counts from aggregation facets.

What does this PR do well?

  1. Moves most of the responsibility from TaxonomyFacets implementations to TaxonomyFacets itself. This reduces code duplication and enables future development. Addresses genericity issue mentioned in [DISCUSS] Identifying Gaps in Lucene’s Faceting #12553.
  2. As a consequence, it introduces sparse values to FloatTaxonomyFacets, which previously used dense values always. This issue is part of Always collect sparsely in TaxonomyFacets & switch to dense if there are enough unique labels #12576.
  3. It computes counts for all taxonomy facets always, which enables us to add an API to retrieve counts for association facets in the future. Addresses Support getting counts from "association" facets [LUCENE-10246] #11282.
  4. As a consequence of having counts, we can check whether we encountered a label while faceting (count > 0), while previously we relied on the aggregation value to be positive. Closes Is it correct for facets to assume positive aggregation values? #12585.
  5. It introduces the idea of doing multiple aggregations in one go, with association facets doing the aggregation they were already doing, plus a count. We can extend to an arbitrary number of aggregations, as suggested in Compute multiple aggregations in one iteration of the match-set #12546.
  6. It doesn't change the API. The only change in behavior users should notice is the fix for non-positive aggregation values, which were previously discarded.
  7. It adds tests which were missing for sparse/dense values and non-positive aggregations.

What's not ideal about this approach?

  1. We could see some performance decreases. The more critical part of the work, aggregating, should be unaffected. There are a few extra method calls / dispatches / branches. Ranking and collecting results might be impacted because we are boxing / unboxing results to / from Number to avoid the primitive types.
  2. The way the TopOrdAndNumberQueues work is a bit awkward and inefficient. It required small changes to classes outside the scope of this change. Maybe we can come up with something better.

What is next?

  1. I'd like to know if the approach makes sense to others.
  2. We can try running some benchmarks to see if there are any performance changes.
  3. Is it important to preserve a default aggregation value of the right type in the results (i.e. -1 for int aggregations, -1f for float aggregations)? If not, we can make a small simplification to always return -1.

Copy link

github-actions bot commented Jan 8, 2024

This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the [email protected] list. Thank you for your contribution!

@github-actions github-actions bot added the Stale label Jan 8, 2024
Copy link
Member

@mikemccand mikemccand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Net/net this looks like a great change to me -- removing tons of code dup, at a possible small perf hit due to added boxing/unboxing while collecting top N. I think the tradeoff is worth it, and we can watch the nightly benchy to see if facet performance was unduly impacted?

@@ -202,7 +202,7 @@ public FacetResult getTopChildren(int topN, String dim, String... path) throws I
}
reuse = q.insertWithOverflow(reuse);
if (q.size() == topN) {
bottomCount = q.top().value;
bottomCount = (int) q.top().value;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm why is this cast necessary? Oh -- I see, this value is now a Number. Hence the warning about added boxing/unboxing in hotspots here... thanks.

@mikemccand
Copy link
Member

3. Is it important to preserve a default aggregation value of the right type in the results (i.e. -1 for int aggregations, -1f for float aggregations)? If not, we can make a small simplification to always return -1.

Maybe defer this to a separate issue? I can see callers expecting a consistent type, though, if you cast (float) Number where Number is an int, the cast would be fine.

@github-actions github-actions bot removed the Stale label Jan 9, 2024
@stefanvodita
Copy link
Contributor Author

I found a fun HeisenBug in one of the tests. When we iterate cursors from IntFloatHashMap, the order is not deterministic. Float summation is not commutative, so the result we get by aggregating the floats in the map can be different depending on the order in which we perform the iteration. For a particular seed, running the test was producing an ordering that was not favorable, while running the debugger produced an ordering that was. The test is fixed in the latest commit and I've opened an issue to do Kahan summation over the floats instead, to reduce the error we're seeing.

For those who want to follow along, here are the exact numbers we are adding in the test in two orderings which produce different results:

class FloatSunIsNotCommutative {
    public static void main(String[] args) {
        float x = 177182.61f;
        float y = 238089.27f;
        float z = 255214.66f;
        float acc;
        
        acc = 0;
        acc += x;
        acc += y;
        acc += z;
        System.out.println(acc);
        
        acc = 0;
        acc += z;
        acc += y;
        acc += x;
        System.out.println(acc);
    }
}

@stefanvodita
Copy link
Contributor Author

I've also run the benchmarks (python3 src/python/localrun.py -source wikimediumall). There is measurable regression in the BrowseRandomLabelTaxoFacets task, but not in other taxonomy tasks. The benchmarker also reports improvements in PKLookup, Wildcard, Respell, Fuzzy2, Fuzzy1.

The regression in the taxo task is explained in the profiler. Boxing is not cheap:
11.24% 10402M java.lang.Integer#valueOf()

@mikecan (thank you for the review!) - how should I interpret the other tasks which show a significant change? Are they just noisy?

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
     BrowseRandomLabelTaxoFacets        3.75      (1.8%)        3.53      (1.6%)   -6.0% (  -9% -   -2%) 0.000
          OrHighMedDayTaxoFacets        1.35      (7.4%)        1.31      (9.2%)   -2.7% ( -17% -   15%) 0.308
                          IntNRQ       21.64      (7.0%)       21.35      (7.4%)   -1.3% ( -14% -   14%) 0.561
                      AndHighLow      366.49     (11.2%)      362.21     (10.3%)   -1.2% ( -20% -   22%) 0.731
                    OrHighNotLow      271.40      (5.3%)      269.03      (4.5%)   -0.9% ( -10% -    9%) 0.573
                         LowTerm      604.77      (5.9%)      599.96      (4.8%)   -0.8% ( -10% -   10%) 0.640
                      TermDTSort      140.65      (2.3%)      139.58      (1.4%)   -0.8% (  -4% -    3%) 0.210
                     LowSpanNear        5.00      (2.8%)        4.96      (4.1%)   -0.7% (  -7% -    6%) 0.522
                    HighSpanNear        4.77      (3.0%)        4.74      (3.6%)   -0.7% (  -7% -    6%) 0.522
                     MedSpanNear       11.24      (2.1%)       11.18      (2.5%)   -0.6% (  -5% -    4%) 0.432
                       MedPhrase      242.61      (2.2%)      241.23      (2.0%)   -0.6% (  -4% -    3%) 0.386
                      HighPhrase       83.17      (2.1%)       82.75      (2.9%)   -0.5% (  -5% -    4%) 0.538
                   OrHighNotHigh      160.48      (4.5%)      159.81      (3.5%)   -0.4% (  -8% -    7%) 0.744
           HighTermDayOfYearSort      215.60      (2.2%)      214.81      (2.0%)   -0.4% (  -4% -    3%) 0.576
                 MedSloppyPhrase       14.07      (2.0%)       14.03      (2.4%)   -0.3% (  -4% -    4%) 0.655
                       LowPhrase       21.15      (1.3%)       21.09      (1.5%)   -0.3% (  -3% -    2%) 0.508
        AndHighHighDayTaxoFacets       10.49      (1.2%)       10.46      (1.6%)   -0.3% (  -3% -    2%) 0.547
                HighSloppyPhrase       13.80      (3.0%)       13.77      (3.1%)   -0.3% (  -6% -    5%) 0.791
                         MedTerm      479.88      (5.1%)      478.82      (4.8%)   -0.2% (  -9% -   10%) 0.887
                    OrHighNotMed      329.08      (4.5%)      328.39      (3.5%)   -0.2% (  -7% -    8%) 0.870
                        HighTerm      264.78      (5.3%)      264.27      (5.2%)   -0.2% ( -10% -   10%) 0.908
               HighTermMonthSort     1930.74      (4.4%)     1928.03      (5.2%)   -0.1% (  -9% -    9%) 0.926
                    OrNotHighMed      217.72      (2.9%)      217.51      (2.2%)   -0.1% (  -5% -    5%) 0.905
            MedTermDayTaxoFacets       16.72      (2.1%)       16.71      (1.7%)   -0.1% (  -3% -    3%) 0.892
       BrowseDayOfYearSSDVFacets        4.12      (2.7%)        4.11      (2.9%)   -0.1% (  -5% -    5%) 0.931
            BrowseDateTaxoFacets        4.68      (5.1%)        4.67      (4.6%)   -0.1% (  -9% -   10%) 0.970
                   OrNotHighHigh      231.09      (4.5%)      230.99      (3.5%)   -0.0% (  -7% -    8%) 0.975
         AndHighMedDayTaxoFacets       16.88      (1.1%)       16.88      (1.5%)   -0.0% (  -2% -    2%) 0.963
       BrowseDayOfYearTaxoFacets        4.76      (5.2%)        4.76      (4.6%)    0.0% (  -9% -   10%) 1.000
                    OrNotHighLow      464.54      (2.6%)      464.56      (2.3%)    0.0% (  -4% -    5%) 0.995
            HighIntervalsOrdered        1.81      (4.6%)        1.81      (5.0%)    0.0% (  -9% -   10%) 0.990
            HighTermTitleBDVSort        5.39      (4.8%)        5.40      (4.4%)    0.1% (  -8% -    9%) 0.968
           BrowseMonthSSDVFacets        4.40      (2.6%)        4.40      (2.6%)    0.1% (  -4% -    5%) 0.873
             MedIntervalsOrdered        1.84      (5.5%)        1.84      (5.8%)    0.2% ( -10% -   12%) 0.918
             LowIntervalsOrdered       32.12      (5.4%)       32.18      (5.6%)    0.2% ( -10% -   11%) 0.913
                       OrHighMed       67.77      (3.1%)       67.97      (3.4%)    0.3% (  -5% -    6%) 0.779
     BrowseRandomLabelSSDVFacets        2.89      (2.0%)        2.90      (1.4%)    0.3% (  -3% -    3%) 0.569
           BrowseMonthTaxoFacets        9.36     (10.9%)        9.40     (10.4%)    0.4% ( -18% -   24%) 0.896
               HighTermTitleSort      132.89      (1.9%)      133.56      (3.9%)    0.5% (  -5% -    6%) 0.600
                      OrHighHigh       20.24      (3.5%)       20.37      (3.9%)    0.6% (  -6% -    8%) 0.608
                      AndHighMed       81.65      (8.6%)       82.65      (9.8%)    1.2% ( -15% -   21%) 0.676
                 LowSloppyPhrase        4.92      (5.9%)        5.01      (6.4%)    1.6% ( -10% -   14%) 0.397
            BrowseDateSSDVFacets        1.20     (11.5%)        1.22      (9.1%)    2.1% ( -16% -   25%) 0.529
                         Prefix3      138.46      (4.9%)      141.54      (4.5%)    2.2% (  -6% -   12%) 0.138
                       OrHighLow      167.60      (7.5%)      171.65      (4.2%)    2.4% (  -8% -   15%) 0.211
                        PKLookup      169.39      (4.5%)      174.22      (4.5%)    2.9% (  -5% -   12%) 0.043
                     AndHighHigh       31.23      (9.5%)       32.15     (12.4%)    2.9% ( -17% -   27%) 0.399
                        Wildcard       66.79      (3.4%)       69.28      (3.6%)    3.7% (  -3% -   11%) 0.001
                         Respell       48.03      (2.0%)       50.35      (2.3%)    4.8% (   0% -    9%) 0.000
                          Fuzzy2       68.13      (1.3%)       71.67      (1.4%)    5.2% (   2% -    7%) 0.000
                          Fuzzy1       74.70      (1.5%)       79.47      (1.8%)    6.4% (   3% -    9%) 0.000

@mikemccand
Copy link
Member

I found a fun HeisenBug in one of the tests.

Oh the joys of floating point math.

For those who want to follow along, here are the exact numbers we are adding in the test in two orderings which produce different results:

Thank you for diving deep here and making such a simple reproduction.

how should I interpret the other tasks which show a significant change? Are they just noisy?

Good question -- it makes no sense that e.g. Respell/Fuzzy1/2 got faster with this change, though the benchy seems to think it is significant (p=0.000). I'm not sure what to make of it!

@mikemccand
Copy link
Member

The regression in the taxo task is explained in the profiler. Boxing is not cheap:
11.24% 10402M java.lang.Integer#valueOf()

Hmm this is sort of spooky -- should we aim to keep the specialization somehow (avoid the boxing)? Is there a middle ground where we can avoid the boxing but still remove much of / some of this duplicated code? Java is annoying sometimes :)

@stefanvodita
Copy link
Contributor Author

What I've done is I've only taken advantage of the boxing for genericity when collecting results getTop... and not use it while performing the aggregations themselves. Most of the taxonomy tasks are not showing a significant performance change. I wonder if the one that has slowed down spends more time collecting the aggregation values than calculating them.

/** Intermediate result to store top children for a given path before resolving labels, etc. */
record TopChildrenForPath(Number pathValue, int childCount, TopOrdAndNumberQueue childQueue) {}

private static class DimValue {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit] should we call this just Dim and String dimPath instead of String dim? I see later that we've used int dimValue and this is getting quickly overloaded?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we called it dim and not dimPath because it's just one label in the path, just the dimension, so it doesn't feel right to call it a path.


/** Get the aggregation value for this ordinal. */
protected Number getAggregationValue(int ordinal) {
// By default, this is just the count.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the default implementation of this method and getValue should be same as that in IntTaxonomyFacets and FloatTaxonomyFacets to reduce duplication further? FastTaxonomyFacets can either extend from IntTaxonomyFacets or do this sort of a count based customisation to these methods.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a good point, but I think it's better for the default behaviour to be getting counts. We need the getAggregationValue level of abstraction to be able to call getValue with different signatures for IntTaxonomyFacets and FloatTaxonomyFacets.

* the aggregation values, keeping aggregation efficient.
*/
protected void updateValueFromRollup(int ordinal, int childOrdinal) throws IOException {
setCount(ordinal, getCount(ordinal) + rollup(childOrdinal));

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we assume an aggregationFunction is passed in this parent class and implement this method similar to IntTaxonomyFacets and FloatTaxonomyFacets since this bit seems to be duplicated in both?

Further, FastTaxonomyFacetCounts can either override this and do a count based updateValuefromRollup since it doesn't use an aggregation function or even continue to extend from IntTaxonomyFacets.

@@ -67,6 +91,17 @@ public int compare(FacetResult a, FacetResult b) {
/** Maps an ordinal to its parent, or -1 if there is no parent (root node). */
final int[] parents;

/** Dense ordinal counts. */
int[] counts;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make this Number[] values so that IntTaxonomyFacets and FloatTaxonomyFacets don't need to define their own values data structure and this class is generic?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's important that IntTaxonomyFacets and FloatTaxonomyFacets have their own data structures for efficiency. This array here only keep counts and not other aggregations.

/** Apply an aggregation to the two values and return the result. */
protected Number aggregate(Number existingVal, Number newVal) {
// By default, we are computing counts, so the values are interpreted as integers and summed.
return (int) existingVal + (int) newVal;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use the concept of an aggregation function while combining in this method. (In line with my previous comment about making the logic for IntTaxonomyFacets and FloatTaxonomyFacets the default)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a tricky bit. You'll see that when we override, we do use an aggregation function, but the default implementation is to count.

float currentValue = getValue(ord);
float newValue = aggregationFunction.aggregate(currentValue, value);
setValue(ord, newValue);
setCount(ord, getCount(ord) + 1);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we want to always track counts too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it has some nice advantages, e.g. it will resolve #11282 and #12585.

return new FacetResult(dim, path, aggregatedValue, labelValues, ordinals.size());
}

private TopOrdAndNumberQueue.OrdAndValue insertIntoQueue(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great! This bit was often duplicated. Can we make this a utility method or maybe even a method like insert* method on the Queue so StringValueFacetCounts and AbstractSortedSetDocValue can use it too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Added to #13175, where we can target improvements related to the way we access these queues.

* Determine the top-n children for a specified dimension + path. Results are in an intermediate
* form.
*/
protected TopChildrenForPath getTopChildrenForPath(DimConfig dimConfig, int pathOrd, int topN)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add an abstract signature for this method to the Facets class?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to avoid making API changes in this PR. It's an interesting question whether all Facets should have this.

Comment on lines 350 to 351
bottomCount = (int) q.top().value;
bottomOrd = (int) q.top().value;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we can remove these bottomX optimizations here and in other places, I think insertWithOverflow essentially does the same?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, opened #13175.

public abstract class TopOrdAndNumberQueue extends PriorityQueue<TopOrdAndNumberQueue.OrdAndValue> {

/** Holds a single entry. */
public static final class OrdAndValue {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of making this class final and lessThan abstract, maybe we should make this class abstract, with abstract compare method which we implement separately for floats/ints/multi-aggregations? This way we can use primitive types in OrdAndValue implementations and hopefully reduce some boxing costs?


LabelAndValue[] labelValues = new LabelAndValue[q.size()];
int[] ordinals = new int[labelValues.length];
Number[] values = new Number[labelValues.length];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe using Number here and in LabelAndValue is one of the things that limits our ability to add new types of facet results, for example, multi-aggregate facets that you've mentioned. I'd suggest that we use generic <T> (which may need to implement Comparable?) in these classes, and use Integer, Float, etc in TaxonomyFacets implementation. This would require API changes though...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can consider this separately? I hope we can avoid all API changes in this PR.

Copy link

github-actions bot commented Feb 2, 2024

This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the [email protected] list. Thank you for your contribution!

@stefanvodita
Copy link
Contributor Author

Thank you all for reviewing! I confirmed that the performance impact was from result collection, not from the aggregations themselves, and I've managed to claw back the performance hit. Most of the improvement comes from the changes to getTopChildrenForPath, which no longer usese intermediary Numbers. I've also integrated the performance-related suggestions from @epotyom (thank you for those!). I'll address the rest of the comments too, just wanted to get this out while it's fresh to see if you all have more feedback on the performance front.

python3 src/python/localrun.py -source wikimediumall

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
            BrowseDateSSDVFacets        1.24      (6.6%)        1.21      (9.6%)   -2.5% ( -17% -   14%) 0.334
     BrowseRandomLabelTaxoFacets        3.76      (3.7%)        3.69      (3.5%)   -1.8% (  -8% -    5%) 0.120
                       MedPhrase       11.46      (2.8%)       11.30      (2.6%)   -1.3% (  -6% -    4%) 0.112
               HighTermMonthSort     2290.51      (4.4%)     2262.12      (4.2%)   -1.2% (  -9% -    7%) 0.360
                    OrHighNotMed      327.20      (3.3%)      323.36      (3.2%)   -1.2% (  -7% -    5%) 0.252
                    OrHighNotLow      318.99      (3.7%)      315.45      (4.2%)   -1.1% (  -8% -    7%) 0.377
                       LowPhrase        4.74      (3.1%)        4.69      (3.0%)   -1.0% (  -6% -    5%) 0.310
                   OrNotHighHigh      244.33      (3.1%)      242.52      (3.0%)   -0.7% (  -6% -    5%) 0.443
                   OrHighNotHigh      227.54      (2.9%)      225.86      (3.2%)   -0.7% (  -6% -    5%) 0.438
                    OrNotHighMed      333.78      (2.6%)      331.35      (2.8%)   -0.7% (  -5% -    4%) 0.391
                      HighPhrase       70.04      (3.2%)       69.53      (3.3%)   -0.7% (  -6% -    5%) 0.478
                     AndHighHigh       23.27      (7.9%)       23.11      (7.1%)   -0.7% ( -14% -   15%) 0.777
                        Wildcard       51.02      (4.3%)       50.71      (4.2%)   -0.6% (  -8% -    8%) 0.652
                     MedSpanNear       29.20      (3.0%)       29.05      (2.5%)   -0.5% (  -5% -    5%) 0.561
                        HighTerm      475.59      (4.1%)      473.22      (4.7%)   -0.5% (  -8% -    8%) 0.721
                        PKLookup      176.36      (3.0%)      175.50      (2.7%)   -0.5% (  -6% -    5%) 0.589
                    HighSpanNear       10.52      (2.7%)       10.47      (2.2%)   -0.4% (  -5% -    4%) 0.612
                         MedTerm      470.14      (4.4%)      468.33      (5.4%)   -0.4% (  -9% -    9%) 0.804
       BrowseDayOfYearSSDVFacets        4.08      (3.9%)        4.06      (4.2%)   -0.4% (  -8% -    8%) 0.775
                    OrNotHighLow      322.80      (2.9%)      321.71      (2.4%)   -0.3% (  -5% -    5%) 0.692
            HighIntervalsOrdered        3.60      (4.8%)        3.59      (4.8%)   -0.3% (  -9% -    9%) 0.868
                      AndHighMed       83.14      (3.5%)       82.93      (3.9%)   -0.2% (  -7% -    7%) 0.833
       BrowseDayOfYearTaxoFacets        4.69      (4.5%)        4.68      (4.4%)   -0.2% (  -8% -    9%) 0.902
            BrowseDateTaxoFacets        4.61      (4.5%)        4.60      (4.3%)   -0.1% (  -8% -    9%) 0.937
                         Respell       53.50      (2.2%)       53.46      (1.8%)   -0.1% (  -3% -    4%) 0.902
         AndHighMedDayTaxoFacets       43.57      (1.5%)       43.54      (1.6%)   -0.1% (  -3% -    3%) 0.891
                          Fuzzy1       66.17      (2.4%)       66.20      (2.0%)    0.0% (  -4% -    4%) 0.951
                      AndHighLow      525.57      (2.6%)      525.90      (4.2%)    0.1% (  -6% -    7%) 0.955
                       OrHighMed       76.00      (3.2%)       76.05      (3.9%)    0.1% (  -6% -    7%) 0.953
            HighTermTitleBDVSort        6.93      (7.3%)        6.94      (6.8%)    0.2% ( -13% -   15%) 0.943
             MedIntervalsOrdered        2.77      (3.6%)        2.78      (3.2%)    0.2% (  -6% -    7%) 0.883
                          Fuzzy2       43.83      (1.9%)       43.90      (1.7%)    0.2% (  -3% -    3%) 0.770
                     LowSpanNear        6.13      (2.1%)        6.14      (1.9%)    0.2% (  -3% -    4%) 0.785
                HighSloppyPhrase        5.52      (3.4%)        5.53      (3.7%)    0.2% (  -6% -    7%) 0.851
           BrowseMonthSSDVFacets        4.34      (5.1%)        4.35      (4.7%)    0.2% (  -9% -   10%) 0.891
                         Prefix3       68.56      (4.6%)       68.70      (6.0%)    0.2% (  -9% -   11%) 0.899
             LowIntervalsOrdered       18.33      (2.8%)       18.38      (2.5%)    0.3% (  -4% -    5%) 0.737
                 LowSloppyPhrase       20.67      (2.2%)       20.73      (1.9%)    0.3% (  -3% -    4%) 0.627
        AndHighHighDayTaxoFacets        7.57      (2.3%)        7.59      (2.5%)    0.3% (  -4% -    5%) 0.669
           HighTermDayOfYearSort      206.91      (2.9%)      207.68      (2.6%)    0.4% (  -5% -    6%) 0.670
               HighTermTitleSort      140.79      (1.6%)      141.32      (2.0%)    0.4% (  -3% -    3%) 0.508
                         LowTerm      438.67      (7.1%)      441.44      (7.9%)    0.6% ( -13% -   16%) 0.790
                 MedSloppyPhrase       21.78      (3.1%)       21.95      (3.4%)    0.8% (  -5% -    7%) 0.454
            MedTermDayTaxoFacets       21.51      (2.2%)       21.71      (1.6%)    0.9% (  -2% -    4%) 0.122
                      TermDTSort      118.13      (3.0%)      119.30      (3.4%)    1.0% (  -5% -    7%) 0.329
           BrowseMonthTaxoFacets        9.58      (8.6%)        9.68      (8.8%)    1.1% ( -14% -   20%) 0.691
     BrowseRandomLabelSSDVFacets        2.88      (2.3%)        2.91      (1.8%)    1.1% (  -2% -    5%) 0.093
                      OrHighHigh       33.81      (7.6%)       34.24      (8.4%)    1.3% ( -13% -   18%) 0.618
                       OrHighLow      319.44      (6.2%)      323.88      (3.9%)    1.4% (  -8% -   12%) 0.393
                          IntNRQ       27.52      (5.2%)       27.96      (5.9%)    1.6% (  -8% -   13%) 0.360
          OrHighMedDayTaxoFacets        2.83      (3.3%)        2.88      (5.2%)    1.6% (  -6% -   10%) 0.243

@stefanvodita
Copy link
Contributor Author

@gsmiller - I know you may not have time to review, but I want to at least notify you, since this is a big change and you've been very invovled in this area of the code.

Copy link

This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the [email protected] list. Thank you for your contribution!

@github-actions github-actions bot added the Stale label Mar 29, 2024
@stefanvodita
Copy link
Contributor Author

Hi reviewers! This PR has become stale. Could anyone have a look at it? It has several nice improvements for taxonomy facets, with no API changes, and it sets us up to launch new features in a future release: multiple aggregations in one go and retrieving counts with aggregation facets.

@github-actions github-actions bot removed the Stale label Mar 30, 2024
Copy link
Member

@mikemccand mikemccand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great -- thank you for clawing back that performance loss by adding a bit of non-generics specialization back. I like this compromise.

I left a minor comment, not a blocker for merging.

@stefanvodita I think you should merge this in a day or two if there's no more feedback? Lazy consensus ...

@Override
public boolean lessThan(OrdAndValue other) {
OrdAndInt otherOrdAndInt = (OrdAndInt) other;
if (value < otherOrdAndInt.value) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might use Integer.compare here -- not sure if it's actually faster. You'd still need to get the result and check if it's != 0 for the tiebreak (which could also be Integer.compare).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is how Integer.compare is implemented:

     public static int compare(int x, int y) {
        return (x < y) ? -1 : ((x == y) ? 0 : 1);
     }

And lessThan would become:

    public boolean lessThan(OrdAndValue other) {
      OrdAndInt otherOrdAndInt = (OrdAndInt) other;
      int cmp = Integer.compare(value, otherOrdAndInt.value);
      if (cmp == 0) {
        cmp = Integer.compare(otherOrdAndInt.value, ord);
      }
      return cmp < 0;
    }

I think we end up doing more comparisons overall? I might be missing something though.

@stefanvodita
Copy link
Contributor Author

Thank you for reviewing @mikemccand! I had to rebase after #12966. I'll push tomorrow maybe if there are no objections.

@stefanvodita
Copy link
Contributor Author

I did another benchmark run after the rebase just to make sure I haven't broken anything when integrating the split taxo arrays change. I see no significant changes.

python3 src/python/localrun.py -source wikimediumall

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
           BrowseMonthTaxoFacets        8.68      (8.6%)        8.41      (8.6%)   -3.1% ( -18% -   15%) 0.257
                      OrHighHigh       24.38      (4.8%)       24.09      (4.9%)   -1.2% ( -10% -    8%) 0.424
                     AndHighHigh       26.10      (4.6%)       25.80      (2.2%)   -1.1% (  -7% -    5%) 0.315
                        HighTerm      254.91      (7.0%)      252.20      (5.9%)   -1.1% ( -13% -   12%) 0.604
           HighTermDayOfYearSort      307.54      (2.0%)      305.21      (2.1%)   -0.8% (  -4% -    3%) 0.249
                    OrNotHighLow      506.28      (2.2%)      502.52      (2.6%)   -0.7% (  -5% -    4%) 0.327
                         LowTerm      497.25      (6.3%)      493.71      (5.7%)   -0.7% ( -11% -   12%) 0.709
                       OrHighMed      102.21      (3.8%)      101.52      (4.2%)   -0.7% (  -8% -    7%) 0.589
                         MedTerm      505.87      (6.8%)      502.44      (5.9%)   -0.7% ( -12% -   12%) 0.737
                      TermDTSort      130.10      (2.4%)      129.27      (2.0%)   -0.6% (  -4% -    3%) 0.359
                    OrHighNotLow      420.65      (3.9%)      418.28      (3.8%)   -0.6% (  -7% -    7%) 0.644
                      AndHighMed       89.03      (2.4%)       88.53      (1.4%)   -0.6% (  -4% -    3%) 0.365
     BrowseRandomLabelTaxoFacets        3.72      (1.8%)        3.70      (1.4%)   -0.5% (  -3% -    2%) 0.303
            HighTermTitleBDVSort       10.39      (4.7%)       10.34      (4.4%)   -0.4% (  -9% -    9%) 0.775
                         Prefix3      131.17      (2.0%)      130.64      (3.3%)   -0.4% (  -5% -    5%) 0.645
               HighTermTitleSort      155.59      (2.2%)      155.00      (2.2%)   -0.4% (  -4% -    4%) 0.590
          OrHighMedDayTaxoFacets        4.50      (5.4%)        4.49      (5.5%)   -0.4% ( -10% -   11%) 0.825
         AndHighMedDayTaxoFacets       17.89      (1.9%)       17.85      (1.5%)   -0.3% (  -3% -    3%) 0.636
            BrowseDateTaxoFacets        4.57      (1.8%)        4.56      (1.5%)   -0.3% (  -3% -    3%) 0.639
                      AndHighLow      677.34      (2.6%)      675.67      (1.8%)   -0.2% (  -4% -    4%) 0.729
                    OrHighNotMed      349.74      (3.7%)      348.93      (2.8%)   -0.2% (  -6% -    6%) 0.823
                   OrHighNotHigh      321.44      (3.1%)      320.71      (3.0%)   -0.2% (  -6% -    6%) 0.815
                   OrNotHighHigh      229.84      (2.9%)      229.33      (2.7%)   -0.2% (  -5% -    5%) 0.805
       BrowseDayOfYearTaxoFacets        4.63      (1.7%)        4.62      (1.5%)   -0.2% (  -3% -    3%) 0.675
                       OrHighLow      377.28      (1.3%)      376.48      (1.3%)   -0.2% (  -2% -    2%) 0.601
                       MedPhrase      447.55      (2.2%)      446.61      (2.6%)   -0.2% (  -4% -    4%) 0.781
        AndHighHighDayTaxoFacets        2.48      (3.9%)        2.47      (2.7%)   -0.2% (  -6% -    6%) 0.882
                    HighSpanNear        2.84      (2.2%)        2.84      (2.0%)   -0.1% (  -4% -    4%) 0.835
                        Wildcard      294.36      (2.4%)      293.99      (2.8%)   -0.1% (  -5% -    5%) 0.879
                          Fuzzy2       61.91      (1.2%)       61.85      (1.3%)   -0.1% (  -2% -    2%) 0.814
                     LowSpanNear       36.58      (1.9%)       36.56      (1.8%)   -0.1% (  -3% -    3%) 0.923
                       LowPhrase       41.87      (1.2%)       41.85      (1.6%)   -0.0% (  -2% -    2%) 0.925
            MedTermDayTaxoFacets       23.10      (2.5%)       23.10      (2.5%)    0.0% (  -4% -    5%) 0.991
                          Fuzzy1       88.20      (0.9%)       88.23      (1.3%)    0.0% (  -2% -    2%) 0.935
                         Respell       46.76      (1.8%)       46.77      (1.8%)    0.0% (  -3% -    3%) 0.950
                    OrNotHighMed      325.18      (2.3%)      325.71      (2.0%)    0.2% (  -4% -    4%) 0.811
                     MedSpanNear        6.23      (4.0%)        6.24      (3.8%)    0.2% (  -7% -    8%) 0.846
                      HighPhrase       20.42      (1.9%)       20.47      (2.8%)    0.3% (  -4% -    5%) 0.737
            HighIntervalsOrdered        9.90      (4.4%)        9.94      (2.9%)    0.4% (  -6% -    8%) 0.763
             LowIntervalsOrdered       14.11      (4.2%)       14.17      (2.4%)    0.4% (  -5% -    7%) 0.698
           BrowseMonthSSDVFacets        4.15      (1.5%)        4.17      (2.1%)    0.4% (  -3% -    4%) 0.438
                        PKLookup      190.68      (1.8%)      191.62      (1.7%)    0.5% (  -2% -    4%) 0.381
             MedIntervalsOrdered        4.54      (4.3%)        4.57      (2.9%)    0.5% (  -6% -    8%) 0.649
                HighSloppyPhrase       14.51      (2.0%)       14.62      (2.1%)    0.7% (  -3% -    4%) 0.243
     BrowseRandomLabelSSDVFacets        2.83      (6.1%)        2.85      (5.7%)    0.8% ( -10% -   13%) 0.674
                 LowSloppyPhrase       13.09      (2.1%)       13.20      (2.4%)    0.8% (  -3% -    5%) 0.231
               HighTermMonthSort     2155.96      (3.5%)     2177.02      (3.6%)    1.0% (  -5% -    8%) 0.382
       BrowseDayOfYearSSDVFacets        4.00      (2.2%)        4.05      (2.1%)    1.2% (  -3% -    5%) 0.073
                 MedSloppyPhrase       12.84      (4.2%)       13.04      (4.7%)    1.6% (  -7% -   10%) 0.260
            BrowseDateSSDVFacets        1.17      (9.3%)        1.19      (7.0%)    1.9% ( -13% -   20%) 0.458
                          IntNRQ       21.04     (26.3%)       22.13     (25.7%)    5.2% ( -37% -   77%) 0.531

@stefanvodita stefanvodita merged commit 9ba4af7 into apache:main Apr 5, 2024
3 checks passed
@stefanvodita
Copy link
Contributor Author

I'm finding this difficult to port to 9x because of the way the classes have diverged and I'm not sure it's worthwhile, since a lot of the benefits here are for future development and to support API changes that would go in Lucene 10. I'll move the CHANGES entries and milestones to Lucene 10 unless anyone thinks it's worth backporting.

@mikemccand
Copy link
Member

Now that #12408 was backported in #13300 can we now backport this to 9.x? Or was it already done in an un-linked PR or so?

Remembering to backport is proving challenging and error-proned (it always has been), not just in all of us consistently agreeing on the criteria for backport (we should always aim to backport unless it breaks non-experimental/internal public APIs?), but also in actually remembering to do it after a PR is merged to main. I wish GH provided some stronger mechanisms for us here ...

stefanvodita added a commit to stefanvodita/lucene that referenced this pull request May 10, 2024
This is a large change, refactoring most of the taxonomy facets code and changing internal behaviour, without changing the API. There are specific API changes this sets us up to do later, e.g. retrieving counts from aggregation facets.

1. Move most of the responsibility from TaxonomyFacets implementations to TaxonomyFacets itself. This reduces code duplication and enables future development. Addresses genericity issue mentioned in apache#12553.
2. As a consequence, introduce sparse values to FloatTaxonomyFacets, which previously used dense values always. This issue is part of apache#12576.
3. Compute counts for all taxonomy facets always, which enables us to add an API to retrieve counts for association facets in the future. Addresses apache#11282.
4. As a consequence of having counts, we can check whether we encountered a label while faceting (count > 0), while previously we relied on the aggregation value to be positive. Closes apache#12585.
5. Introduce the idea of doing multiple aggregations in one go, with association facets doing the aggregation they were already doing, plus a count. We can extend to an arbitrary number of aggregations, as suggested in apache#12546.
6. Don't change the API. The only change in behaviour users should notice is the fix for non-positive aggregation values, which were previously discarded.
7. Add tests which were missing for sparse/dense values and non-positive aggregations.
@stefanvodita
Copy link
Contributor Author

I was just working on it today actually and finally got it in shape: #13358. Sorry it took so long!

stefanvodita added a commit that referenced this pull request May 14, 2024
#12966 (#13358)

Reduce duplication in taxonomy facets; always do counts (#12966)

This is a large change, refactoring most of the taxonomy facets code and changing internal behaviour, without changing the API. There are specific API changes this sets us up to do later, e.g. retrieving counts from aggregation facets.

1. Move most of the responsibility from TaxonomyFacets implementations to TaxonomyFacets itself. This reduces code duplication and enables future development. Addresses genericity issue mentioned in #12553.
2. As a consequence, introduce sparse values to FloatTaxonomyFacets, which previously used dense values always. This issue is part of #12576.
3. Compute counts for all taxonomy facets always, which enables us to add an API to retrieve counts for association facets in the future. Addresses #11282.
4. As a consequence of having counts, we can check whether we encountered a label while faceting (count > 0), while previously we relied on the aggregation value to be positive. Closes #12585.
5. Introduce the idea of doing multiple aggregations in one go, with association facets doing the aggregation they were already doing, plus a count. We can extend to an arbitrary number of aggregations, as suggested in #12546.
6. Don't change the API. The only change in behaviour users should notice is the fix for non-positive aggregation values, which were previously discarded.
7. Add tests which were missing for sparse/dense values and non-positive aggregations.
@stefanvodita stefanvodita added this to the 9.11.0 milestone May 14, 2024
@stefanvodita
Copy link
Contributor Author

I was skeptical this would work out at first, but I think we have a successful backport in the end, so the changes will go out with 9.11.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Is it correct for facets to assume positive aggregation values?
4 participants