[hunspell dictionary] "iconv * 0" lines cause any Latin-characters-words to be treated as correctly spelled #306

er13 · 2022-07-11T08:15:52Z

iconv * 0 lines in distr/hunspell/header/affix_header.txt (from L12 to L73) cause any word consisting of Latin alphabet characters to be converted to a sequence of zeros, which in turn causes the original word to be treated as correctly spelled.

This behavior is incorrect - words consisting of Latin characters are not correct Ukrainian words and should thus be marked as misspelled. This is also very annoying in the multi-language documents as the Ukrainian dictionary effectively disables spell-checking for the Latin alphabet based languages.

Could you (@arysin?) please

check if these lines are really necessary and remove them if not
or convert them to iconv * \0 (not sure if the syntax is correct) if you actually wanted to remove Latin characters instead of converting them to a sequence of zero digits.

The text was updated successfully, but these errors were encountered:

arysin · 2022-07-11T14:41:02Z

This was intentional. Ukrainian texts often contain lots of words in Latin (particularly English proper nouns, abbreviations and terms) and highlighting them creates a lot of noise without much value. The idea for Ukrainian spellchecker to concentrate on Ukrainian words that are misspelled. With this rule the spellchecker should still catch errors where Ukrainian words contain Latin letters (e.g. Latin «i» in «гранiтний») which is very common error.

er13 · 2022-07-11T23:16:24Z

Hmm, do all Ukrainians always write foreign language words (used untranslated in the Ukrainian language) properly, without errors/typos? Would not that be better to (let the corresponding language dictionary) spellcheck these words instead of marking them as properly spelled and effectively ignoring the errors in them?

As mentioned before my actual use case are multi-language texts containing English, German, Russian, and Ukrainian words at the same time. These behavior of the Ukrainian dictionary causes English and German words to be marked as properly spelled (even if they are not), which is not very helpful.

Could you please think about your decision again and maybe change it? Thanks!

p.s. I would say Latin «i» instead of Ukrainian «і» in words like «гранiтний» would be perfectly caught also without these iconv-rules.

arysin · 2022-07-12T00:53:51Z

I am open to discussion. But this was a matter of practicality - a long time ago when we just created this hunspell dictionary we didn't have this option and with mixed-language texts (e.g. on wikipedia) it was extremely annoying to see everything in Latin red.
In general, there are two use-cases for hunspell: powerful text processors, like LibreOffice/OpenOffice - in this case you can mark words with appropriate language, and then they will be checked with appropriate hunspell dictionary. So current behavior is not in a way.
The other case - simple text editors - text fields in Firefox, simple text editors, and other open-source software. They usually don't operate on multiple languages, and in this case, you just want to concentrate on your main language and don't have extra noise coming from words spelled in Latin.
But also when I work with big multilingual texts I usually use LibreOffice and LanguageTool. The first one gives me a way to mark text chunks with appropriate language and the second provides grammar checking which is much more powerful than simple dictionary-based check.
If you can describe your case where this logic does not work maybe we can come up with a solution.
Alternatively we could also create a separate hunspell dictionary with this option off. Sometimes I need to check dictionaries with Russian words in them and the standard Russian hunspell does not work because these texts have accented characters. So I modify their hunspell dictionary to include IGNORE option (of course I have to also convert it from koi8-r to utf-8 but it's worth the effort).

er13 · 2022-07-12T08:07:50Z

My main use case is Notepad++ with DSpellCheck plugin, i.e. a text editor without a special file format for text documents like .doc or .odt and thus no way to save the annotated language. The logic applied in the DSpellCheck plugin is the following:

check every word against every activated dictionary,
if any of the dictionaries considers the word as properly spelled then the word is properly spelled.

I like this logic much more than the behavior of MS-Office/LibreOffice/OpenOffice forcing me to explicitly annotate each word with the language it is written in and effectively increasing my efforts needed to get the text spellchecked.

But even in LibreOffice/OpenOffice (your main use-case) the proper way to do spellchecking (as you write yourself) is to annotate the words coming from languages other than Ukrainian with the original language instead of marking the whole text as being Ukrainian and expecting the Ukrainian dictionary to ignore the foreign words for you. This is kinda absurd, a dictionary intended to support spellchecking actively ignores spelling errors. The main reason for this behavior is, as you said, it would produce a lot of noise, but from my point of view it was more your laziness - the effort needed to annotate every foreign language word was not worth it for you the added value - properly spellchecked every foreign language word.

As to the spellchecking in Firefox. Also in Firefox people use multi-language spellchecking, see e.g. this bug report

By the way, neither English nor German dictionaries have the options to ignore the words written in Cyrillic or Greek or any other non-Latin alphabet.

arysin · 2022-07-12T12:37:23Z

Multilingual spellchecking was added very recently there, and this comment confirms the practicality of our original approach. Multilingual check is still broken in Firefox and will only be fixed in 103.
I'll consider adjusting this for the next release of Ukrainian hunspell dictionaries.
I think the solution for your case is simple: open C:\Users\anrysi\AppData\Roaming\Notepad++\plugins\Config\Hunspell, open uk_UA.aff, remove offending ICONV lines and adjust ICONV count.

er13 · 2022-07-13T07:42:40Z

I think the solution for your case is simple: open C:\Users${user}\AppData\Roaming\Notepad++\plugins\Config\Hunspell, open uk_UA.aff, remove offending ICONV lines and adjust ICONV count.

Yeah, works perfectly.

I'll consider adjusting this for the next release of Ukrainian hunspell dictionaries.

Thanks a lot, looking forward for having it in the official release.

arysin · 2022-07-13T14:58:31Z

BTW the plugin seems to pull pretty old version of hunspell_uk, the newest is here: https://github.com/brown-uk/dict_uk/releases/tag/v5.8.0
We may want to update their location with the latest version

er13 · 2022-07-13T15:29:04Z

As far as I understand the DSpellCheck plugin has only one source for all dictionaries and that is LibreOffice. It simply doesn't support a separate source per dictionary, which is to be honest quite understandable.

Do you have a process for pushing the updates of the Ukrainian dictionary to LibreOffice (s. LibreOffice/dictionaries@06a28cf, LibreOffice/dictionaries@cbda6f4) or do you expect the LibreOffice developers to pull the updates themselves from time to time?

In either case thanks for pointing out the location of the most up-to-date Ukrainian dictionary. I can off course (and will) update it manually.

arysin · 2022-07-13T15:41:02Z

I am uploading LibreOffice extension with each release here: https://extensions.libreoffice.org/en/extensions/show/ukrainian-spelling-dictionary-and-thesaurus
I suspect the developers of that page pull the updates from time to time (not sure where their primary source is though). So we may just need to ping them so they update it now.

er13 · 2022-07-13T16:09:16Z

Hmm, based on the commit log I would say LibreOffice (even the as-of-now most recent version 7.4.0.1) unfortunately still contains 5.3.1 and not 5.8.0 you mentioned.

Would be great if you could ping the LibreOffice developers and clarify the dictionary update process with them.

arysin · 2022-07-13T16:15:32Z

I've created LibreOffice/dictionaries#41
... and https://bugs.documentfoundation.org/show_bug.cgi?id=149980

er13 mentioned this issue Jul 11, 2022

mult lang spell don't work Predelnik/DSpellCheck#270

Open

ghost mentioned this issue Oct 7, 2022

English spelling stops working when it's used together with a Ukrainian dictionary sublimehq/sublime_text#5570

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[hunspell dictionary] "iconv * 0" lines cause any Latin-characters-words to be treated as correctly spelled #306

[hunspell dictionary] "iconv * 0" lines cause any Latin-characters-words to be treated as correctly spelled #306

er13 commented Jul 11, 2022

arysin commented Jul 11, 2022

er13 commented Jul 11, 2022

arysin commented Jul 12, 2022

er13 commented Jul 12, 2022

arysin commented Jul 12, 2022

er13 commented Jul 13, 2022

arysin commented Jul 13, 2022

er13 commented Jul 13, 2022

arysin commented Jul 13, 2022

er13 commented Jul 13, 2022

arysin commented Jul 13, 2022 •

edited

Loading

[hunspell dictionary] "iconv * 0" lines cause any Latin-characters-words to be treated as correctly spelled #306

[hunspell dictionary] "iconv * 0" lines cause any Latin-characters-words to be treated as correctly spelled #306

Comments

er13 commented Jul 11, 2022

arysin commented Jul 11, 2022

er13 commented Jul 11, 2022

arysin commented Jul 12, 2022

er13 commented Jul 12, 2022

arysin commented Jul 12, 2022

er13 commented Jul 13, 2022

arysin commented Jul 13, 2022

er13 commented Jul 13, 2022

arysin commented Jul 13, 2022

er13 commented Jul 13, 2022

arysin commented Jul 13, 2022 • edited Loading

arysin commented Jul 13, 2022 •

edited

Loading