The esperanto_stemmer
is an Elasticsearch filter that provides stemming for the
Esperanto language.
- Folding: If you use generic folding (ICU folding conveniently handles both combining and precomposed diacrtics), be sure not to fold Ĉ/ĉ, Ĝ/ĝ, Ĥ/ĥ, Ĵ/ĵ, Ŝ/ŝ, and Ŭ/ŭ which should be kept distinct from C/c, G/G. H/h, J/j, S/s, and U/u. See more about Esperanto orthogrpahy on Wikipedia.
- Transliterations: The stemmer does not support H-system or X-system transliterations. This affect stemming exceptions and number recognition.
-
The original implementation in Java is "Esperanto Stemmer", by Declan Whitford Jones.
-
That stemmer was wrapped into this Elasticsearch plugin by Trey Jones to provide the
esperanto_stemmer
filter.