Change french stemmer in order to have better results #8500

f-necas · 2024-11-15T12:16:17Z

It solves issues with elasticsearch requests.

Stemmer (as I understand) is used to optimize word computation when requests are excuted. Words with the same roots are then reduced to their root "version". So, words like "continua", "continuait" are reduced to "continu".

The lightest the stemmer is, the closest it is to a dictionnary stemmer (all words, with all declinaisons...) but it's also slower.

For a record with French word "Etablissements", some requests didn't work. It works for all languagues and default but not French.

I've set record with the same translation "Etablissements de Cleaux" for different lang and tried those ES requests :

{
  "query": {
    "bool" : {
      "should": [
        { "prefix": { "resourceTitleObject.langfre": "etabliss"}}
      ]
    }
  },
  "_source": [
    "resourceTitleObject"
  ],
  "from": 0,
  "size": 20
}

OR

{
  "query": {
    "match_phrase_prefix" : {
      "resourceTitleObject.langfre": {
         "query": "etabliss"
      }
    }
  },
  "_source": [
    "resourceTitleObject"
  ],
  "from": 0,
  "size": 20
}

Checklist

I have read the contribution guidelines
Pull request provided for main branch, backports managed with label
Good housekeeping of code, cleaning up comments, tests, and documentation
Clean commit history broken into understandable chucks, avoiding big commits with hundreds of files, cautious of reformatting and whitespace changes
Clean commit messages, longer verbose messages are encouraged

CLAassistant · 2024-12-08T03:41:00Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

fix: change french stemmer in order to have better results

1bac306

f-necas mentioned this pull request Nov 18, 2024

feat: set stemmer to minimal georchestra/geonetwork_minimal_datadir#12

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change french stemmer in order to have better results #8500

Change french stemmer in order to have better results #8500

f-necas commented Nov 15, 2024 •

edited

Loading

CLAassistant commented Dec 8, 2024

Change french stemmer in order to have better results #8500

Are you sure you want to change the base?

Change french stemmer in order to have better results #8500

Conversation

f-necas commented Nov 15, 2024 • edited Loading

Checklist

CLAassistant commented Dec 8, 2024

f-necas commented Nov 15, 2024 •

edited

Loading