-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
hackerllama/blog/posts/sentence_embeddings/ #4
Comments
Hello, author, many thanks for your explaination. You said "We’ll start using all-MiniLM-L6-v2. It’s not the best open-source embedding model", I want to know which model is the best and how to find the best model list? I am fresh, sorry for bringing this question, thank you very much! |
Hi @songxujay, author is covering this in the "Selecting and evaluating models" part. Have a look at it. One of the main source is still the MTEB Leaderboard - https://huggingface.co/spaces/mteb/leaderboard |
Hi! Thank you for the great article. To better understand the differences between word2vec- and Transformer-based embeddings, could you elaborate how the masked language modelling objective of BERT is different from the CBOW objective in word2vec (which as I understand is also about "filling in a blank"). Is it that the objectives are similar but the neural net architectures differ in these two approaches, allowing BERT to add contextual info? |
Hey @arnoldlayne0! Overall you're right, BERT and CBOW objectives have some similarities. Here are some differences
|
I think something has changed about the quora dataset used in the colab example. I'm getting this error:
TypeError: http_get() got an unexpected keyword argument 'displayed_filename' |
Just what I needed entering the world of LLMs, thank you a lot! |
Hi I would like to know other than using Sentence Transformers (sbert), what other open source sentence embeddings methods can I choose? I find two other options, InferSent and google's USE. But InferSent seems dead now and USE is not widely used too. In 2024 I don't think I should use Doc2Vec or Word2Vec, right ? So why does sbert take over sentence embeddings methods ? |
hackerllama - Sentence Embeddings
Everything you wanted to know about sentence embeddings (and maybe a bit more)
https://osanseviero.github.io/hackerllama/blog/posts/sentence_embeddings/
The text was updated successfully, but these errors were encountered: