Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] enable handling synonyms in neural ingestion pipelines #545

Open
asfoorial opened this issue Jan 22, 2024 · 1 comment
Open

[FEATURE] enable handling synonyms in neural ingestion pipelines #545

asfoorial opened this issue Jan 22, 2024 · 1 comment

Comments

@asfoorial
Copy link

Handling synonyms is like an ABC thing when it comes to semantic search, however the current neural search plugin does not handle synonyms.

I suggest a feature to enable that out of the box without really going through the expense of fine-tuning embedding model. One simple way to do it is by simply replacing a word with both the word and its synonyms before generate a vector for them. The source text remains the same but the embedding gets to reflect the synonyms. The option can also be disabled in case the embedding model already understands synonyms.

Example:

Src="OS is amazing"
dst="[[Open Search]]os is amazing"
vec= getEmbedding(dst)

POST test/_doc/1
{"content":src,
"Vec":vec

}

@heemin32
Copy link
Collaborator

@asfoorial This is quite interesting! Do you happen to have any benchmark tests that demonstrate how handling synonyms improves search accuracy in semantic search?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Backlog
Development

No branches or pull requests

3 participants