boolean
field type should have a parameter hinting at the more common value
#11143
Labels
enhancement
Enhancement or improvement to existing feature or request
Search:Performance
Search:Query Capabilities
Is your feature request related to a problem? Please describe.
The
boolean
field type is essentially just a specializedkeyword
field type, where the only possible values aretrue
andfalse
. In practice, there are many cases, though, where 90+% of documents have one value or the other:is_deleted:false
,is_visible:true
, etc.Because Lucene skips through sparse values much more cheaply (even in a negation), we should (when possible) only index the less common term. A query matching the more common term would be rewritten as a NOT of the less common term. (That is, assuming your documents are almost all
is_visible:true
, then a query foris_visible:true
becomesNOT is_visible:false
, since the latter will only need to skip through a small number of matching docs to exclude them.)Describe the solution you'd like
The
boolean
field type should accept a parameter that provides a hint saying "This field is usually (true/false
)". I don't have a good name for the parameter -- maybe "usually", like:Then we would only index (or write doc values for)
false
values. As mentioned above, a query foris_visible:true
gets rewritten toNOT is_visible:false
.Describe alternatives you've considered
I'm going to write an idea to the lucene-dev mailing list that would count the values when writing a segment and just write the less common value and a
DocsEnum
for its docs. As you merge segments, you always just write out the doc IDs for the less common value for the resulting segment.That way, you don't need to provide a hint upfront and the rewrite could be done per segment. (If you have a closer to 50/50 split, you could also do a clever merge that splits the
true/false
values into different segments, so you have segments that are entirelytrue
or entirelyfalse
, such that queries become match-all or match-none.)Additional context
N/A
The text was updated successfully, but these errors were encountered: