You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The Lucene FSTOrdPostingsFormat (Solr schema postingsFormat="FSTOrd50") Is like FSTPostingsFormat but has "ordinals" -- term ordinals. Ordinals are not supported by most postings formats but this one has it. In TermPrefixCursor.java I left a comment that it could be more efficient we we could use ordinals. I think this might be true. Instead of eagerly reading & caching the postings (list of docIDs), we could just capture the ordinal (an int). This'd replace some of the "IntsRef" with this integer ordinal. TPC wouldn't need docIdsCache either. Later when we resolve it in getDocIds(), that's when we do the actual work which is perhaps not expensive. Sometimes we're never consulted to even do that, thus saving some time. The tag may have been eliminated due to overlapping, or it may have effectively been cached at a higher level (TaggerRequestHandler transforms to the uniqueKey values then caches that).
I'm not sure how much benefit this would bring; it could be net loss; hard to be sure.
Down side is we'd basically be limited to this PostingsFormat. At least the PostingsWriterBase aspect of this one is pluggable (kinda) should we want some future improvements to allow a total in-memory option. To ameliorate this down-side, we could support any PF via grabbing the "TermsState" instead, and presumably the termState of FSTOrdPostingsFormat is effectively the ordinal.
The text was updated successfully, but these errors were encountered:
Upon further inspection of FSTOrdPostringsFormat (actually FSTOrdTermsReader), it has TODOs for ord() which is bizarre -- why does this postingsFormat even exist if it doesn't yet support ords?
I filed an issue: https://issues.apache.org/jira/browse/LUCENE-8285
The Lucene
FSTOrdPostingsFormat
(Solr schemapostingsFormat="FSTOrd50"
) Is like FSTPostingsFormat but has "ordinals" -- term ordinals. Ordinals are not supported by most postings formats but this one has it. In TermPrefixCursor.java I left a comment that it could be more efficient we we could use ordinals. I think this might be true. Instead of eagerly reading & caching the postings (list of docIDs), we could just capture the ordinal (an int). This'd replace some of the "IntsRef" with this integer ordinal. TPC wouldn't need docIdsCache either. Later when we resolve it ingetDocIds()
, that's when we do the actual work which is perhaps not expensive. Sometimes we're never consulted to even do that, thus saving some time. The tag may have been eliminated due to overlapping, or it may have effectively been cached at a higher level (TaggerRequestHandler transforms to the uniqueKey values then caches that).I'm not sure how much benefit this would bring; it could be net loss; hard to be sure.
Down side is we'd basically be limited to this PostingsFormat. At least the PostingsWriterBase aspect of this one is pluggable (kinda) should we want some future improvements to allow a total in-memory option. To ameliorate this down-side, we could support any PF via grabbing the "TermsState" instead, and presumably the termState of FSTOrdPostingsFormat is effectively the ordinal.
The text was updated successfully, but these errors were encountered: