Impact index slower than BM25 scoring #1561

LuisPB7 · 2023-06-26T20:13:26Z

LuisPB7
Jun 26, 2023

Hi everyone, thank you for this amazing library.

I am doing experiments on MSMARCOv1. I indexed the collection and searched with BM25, and got the 0.184 MRR@10 as expected. On my computer, search took around 2 minutes.

Then I switched to an impact index. I only added the "-impact" flag on my indexing and search script. Everything else is the same. Search now takes slightly more than 3 minutes.

In other impact experiments, when I increase the TF of certain terms in a document according to the weights output by some neural model (e.g., SPLADE), the difference between impact scoring and BM25 scoring becomes very large. Notice I'm not doing query/document expansion with new terms - just reweighing the existing terms, i.e., changing their query/document term frequency.

What is the intuition behind search taking longer with an impact index? Am I missing something that's being done under the hood? The posting lists do not get larger, only the stored TFs are different. Thanks!

Answered by lintool

Jun 26, 2023

I think this will answer your question: https://dl.acm.org/doi/10.1145/3576922

Please read first and then ask if you have follow-up? @JMMackenzie @andrewtrotman

View full answer

lintool · 2023-06-26T21:46:31Z

lintool
Jun 26, 2023
Maintainer

I think this will answer your question: https://dl.acm.org/doi/10.1145/3576922

Please read first and then ask if you have follow-up? @JMMackenzie @andrewtrotman

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Impact index slower than BM25 scoring #1561

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Impact index slower than BM25 scoring #1561

LuisPB7 Jun 26, 2023

Replies: 1 comment

lintool Jun 26, 2023 Maintainer

LuisPB7
Jun 26, 2023

lintool
Jun 26, 2023
Maintainer