Should we consider adding support for taxonomy indices? #11355
Labels
discuss
Issues intended to help drive brainstorming and decision making
enhancement
Enhancement or improvement to existing feature or request
Search:Aggregations
Is your feature request related to a problem? Please describe.
In one of our OpenSearch Lucene Study Group Meetings, we talked about improvements made to Lucene's faceting with taxonomy indices. A question that came up was whether it makes sense to add support for taxonomy indices to OpenSearch. I promised to create an issue to discuss it, so here we are.
Background
More broadly, it's not that we would consider adding support for taxonomy indices, but rather we could try leveraging Lucene's facets module: https://lucene.apache.org/core/9_8_0/demo/org/apache/lucene/demo/facet/package-summary.html.
Lucene's facets satisfy a similar niche to aggregations in OpenSearch. For historical reasons unknown to me, aggregations were implemented as a separate thing, unrelated to Lucene's facet module. Currently, OpenSearch depends on most Lucene modules, but not facets.
Pros
OrdinalMap
instances occupy heap for every possible value (mapping from per-segment values to a global ordinal value that works for the whole shard).OrdinalMap
price instead).Cons
SearcherTaxonomyManager
that simplifies some of this, but it wouldn't be a small change. I'm a little scared to think about what managing two Lucene indices per shard would mean for segment replication.What's next?
The above are just my opinions of arguments for and against using the Lucene facets module. I would love to move the heavy lifting of aggregations out of OpenSearch, but I also see it as a huge effort with potentially little payback.
What do y'all think?
The text was updated successfully, but these errors were encountered: