Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change french stemmer in order to have better results #8500

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

f-necas
Copy link
Contributor

@f-necas f-necas commented Nov 15, 2024

It solves issues with elasticsearch requests.

Stemmer (as I understand) is used to optimize word computation when requests are excuted. Words with the same roots are then reduced to their root "version". So, words like "continua", "continuait" are reduced to "continu".

The lightest the stemmer is, the closest it is to a dictionnary stemmer (all words, with all declinaisons...) but it's also slower.

For a record with French word "Etablissements", some requests didn't work. It works for all languagues and default but not French.

I've set record with the same translation "Etablissements de Cleaux" for different lang and tried those ES requests :

{
  "query": {
    "bool" : {
      "should": [
        { "prefix": { "resourceTitleObject.langfre": "etabliss"}}
      ]
    }
  },
  "_source": [
    "resourceTitleObject"
  ],
  "from": 0,
  "size": 20
}

OR

{
  "query": {
    "match_phrase_prefix" : {
      "resourceTitleObject.langfre": {
         "query": "etabliss"
      }
    }
  },
  "_source": [
    "resourceTitleObject"
  ],
  "from": 0,
  "size": 20
}

Checklist

  • I have read the contribution guidelines
  • Pull request provided for main branch, backports managed with label
  • Good housekeeping of code, cleaning up comments, tests, and documentation
  • Clean commit history broken into understandable chucks, avoiding big commits with hundreds of files, cautious of reformatting and whitespace changes
  • Clean commit messages, longer verbose messages are encouraged

@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants