Replies: 2 comments 2 replies
-
Yes there is.
As you may already guess handling compounds in Voikko is a bit complicated. In fact we would likely not even have Voikko if 17 years ago it would have been possible to use any open source tool that existed at that time to build a reasonably working Finnish morphology for spell checkers. Handling inflection, while also complicated, was not the problem. It was compounds. So Voikko has compounding rules that work well for spell checking. That involves a lot of compromises. One of them is that we have largely blocked the use of short and rare words in automatic compounding. "itu" is one such words and thus we need to add real world compounds containing "itu" as separate entries in the vocabulary. Sadly we are missing "ituhippi" which I will add once our vocabulary management application https://joukahainen.puimula.org/ is restored (currently undergoing technology migration from mod_python to Flask).
Yes, libreoffice-voikko does that. But with you example it will hyphenate "ruis-kui-tu" which I guess is correct as the ambiguity here is mostly theoretical. |
Beta Was this translation helpful? Give feedback.
-
As for example where both ambigous hyphenations are avoided I looked up this from our test suite: "pöytähienosto" can be interpreted either as "pöytä-hienosto" or "pöytä-hien-osto" so we only hyphenate it "pöy-tä-hienos-to". |
Beta Was this translation helpful? Give feedback.
-
I think that Voikko is a great project. Some 30 years ago I was thinking of developing something similar, but in the end I specialized in other type of software.
https://oikofix.com/analysis is a great service. Is anything similar also available via the command line, say, on Debian GNU/Linux?
I was curious how it would interpret this sentence with 3 ambiguous compound words, playing on the ambiguity between "ruis-kui-tu" and "ruis-ku-i-tu":
For the 3-part compound word, the alternative "ruisku-itu-hippi" is not being recognized.
How are compound words handled? Is there some built-in database?
Also, I wonder if there is any hyphenation interface with (say) LibreOffice, to avoid hyphenations that would mislead the reader (say, avoiding both "ruisku-itu" and "ruiskui-tu" to play it safe).
Beta Was this translation helpful? Give feedback.
All reactions