-
Notifications
You must be signed in to change notification settings - Fork 46
Dynamically apply the right analyser based on language detected at index time, possible? #49
Comments
Hi, |
With the removal of _analyzer being specified in the query (in elastic/elasticsearch#9279), auto selection of the analyzer for a field doesn't really make sense as far as I can tell. Each field has only a single analyzer associated with it, so you can't really analyze on the fly based on lang detect. So either you are putting your content into a field that is agnostic about the analyzer and doing to lang detection to filter on, or you make one call to determine the language of your content, and then index your data to the appropriate field for the appropriate analyzer. So for instance we have separate fields like:
|
I can implement the following. The scenario is like this: First, configure a mapping with languages you want to detect in
In this example, submitting a text It is up to the user to configure the field analyzers and the Another issue is indexing multilanguage text into a single field. Here I recommend the ICU analyzer. ICU can apply normalization / folding / tokenization based on Unicode scripts which is the best method to search for multilanguage in a single field. Stemming is not applied. |
Released version 2.4.4.1 with the |
@jprante for language detection can you provide a default/fallback |
Hi jprante,
Small question that might be useful for some people I guess.
Is there a way, at index time, to apply the right analyser based on the result of the language detection? If yes, could you provide us with a code example?
Thanks in advance,
F
The text was updated successfully, but these errors were encountered: