forked from jason-wolfe/search-index-benchmark-game
-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use same BM25 k1/b parameters across engines. #45
Comments
jpountz
added a commit
to jpountz/search-benchmark-game
that referenced
this issue
Sep 25, 2023
Currently different engines use different parameters for BM25, e.g. Tantivy and Lucene use (k1=1.2,b=0.75) while PISA uses (k1=0.9,b=0.4). Robertson et al. had initially suggested that 1.2/0.75 would make good defaults for BM25 but Trotman et al. later suggested that 0.9/0.4 would make better defaults and this seems to be the consensus nowadays. The ranking function matters because it affects which hits may be skipped via dynamic pruninng, which in-turn affects search performance. Closes quickwit-oss#45
To get a sense of the influence of these parameters on query performance, I compared Lucene-9.8 with 1.2/0.75 against 0.9/0.4 on the
So it's not huge but significant and extremely consistent:
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
The k1 and b parameters of BM25 can influence what hits may be dynamically pruned and thus performance numbers, so it would be good to use the same values across engines. Currently it looks like engines use their own defaults, which seem to be k1=0.9 and b=0.4 for PISA, and k1=1.2 and b=0.75 for Lucene and Tantivy.
The text was updated successfully, but these errors were encountered: