Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use BM25 for simple search #158

Open
lefnire opened this issue Jun 24, 2023 · 0 comments
Open

Use BM25 for simple search #158

lefnire opened this issue Jun 24, 2023 · 0 comments
Labels
🔍Search Search and question-answering

Comments

@lefnire
Copy link
Collaborator

lefnire commented Jun 24, 2023

The search bar currently does the following:

  1. If the user typed < n_words (3 currently I think?), do a literal search (entries.filter(e => e.texts.toLowerCase().includes(search.toLowerCase())
  2. If > n_words, do a cosine similarity search using sentence_transformers.semantic_search.

I was previously using Haystack for this, which has a few conditionals which would make this ticket easier. But the project was a lot beefier for too little payoff than I anticipated, so I just went back to the basics. So, if we want a more sophisticated search pipline, we can consider:

  1. Try Haystack or Jina again. I'd lean towards Jina, they're making strides.
  2. Consider using a Vector Database. This would be a re-architecting of the current system which uses PyArrow with vectors stored on S3 as parquet files. My solution is infinitely scalable and cheap. But a true Vector Database would add a lot of utility, like BM25 searching, ML utilities like sentiment analysis, question-answering, etc. It would require a Docker setup, and maintenance though.
@lefnire lefnire added the 🔍Search Search and question-answering label Jun 24, 2023
@lefnire lefnire added this to Gnothi Jun 24, 2023
@github-project-automation github-project-automation bot moved this to Next in Gnothi Jun 24, 2023
@lefnire lefnire moved this from Next to Later in Gnothi Jun 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🔍Search Search and question-answering
Projects
Status: Pie in the sky
Development

No branches or pull requests

1 participant