You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If the user typed < n_words (3 currently I think?), do a literal search (entries.filter(e => e.texts.toLowerCase().includes(search.toLowerCase())
If > n_words, do a cosine similarity search using sentence_transformers.semantic_search.
I was previously using Haystack for this, which has a few conditionals which would make this ticket easier. But the project was a lot beefier for too little payoff than I anticipated, so I just went back to the basics. So, if we want a more sophisticated search pipline, we can consider:
Try Haystack or Jina again. I'd lean towards Jina, they're making strides.
Consider using a Vector Database. This would be a re-architecting of the current system which uses PyArrow with vectors stored on S3 as parquet files. My solution is infinitely scalable and cheap. But a true Vector Database would add a lot of utility, like BM25 searching, ML utilities like sentiment analysis, question-answering, etc. It would require a Docker setup, and maintenance though.
The text was updated successfully, but these errors were encountered:
The search bar currently does the following:
< n_words
(3 currently I think?), do a literal search (entries.filter(e => e.texts.toLowerCase().includes(search.toLowerCase())
> n_words
, do a cosine similarity search usingsentence_transformers.semantic_search
.I was previously using Haystack for this, which has a few conditionals which would make this ticket easier. But the project was a lot beefier for too little payoff than I anticipated, so I just went back to the basics. So, if we want a more sophisticated search pipline, we can consider:
The text was updated successfully, but these errors were encountered: