You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Oct 31, 2023. It is now read-only.
I see that it is possible to use MUSS with other languages:
If you are going to add a new language to this project, in folder resources/models/language_models/wikipedia donwload the files of the target language from https://huggingface.co/edugp/kenlm/tree/main/wikipedia. These language models are used to filter high quality sentences in the paraphrase mining phase.
But what if the target language is not listed in the kenlm repository? I would like to try this system on Italian
The text was updated successfully, but these errors were encountered:
Kenlm is only used to clean the common crawl data if I remember correctly.
You can probably find other ways to clean the data using other heuristics, or not clean it at all (but get potentially worse performance).
Another solution is also to use the ChatGPT API which is very good at text simplification in multiple languages.
Hi!
I see that it is possible to use MUSS with other languages:
But what if the target language is not listed in the kenlm repository? I would like to try this system on Italian
The text was updated successfully, but these errors were encountered: