-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[hunspell dictionary] "iconv * 0" lines cause any Latin-characters-words to be treated as correctly spelled #306
Comments
This was intentional. Ukrainian texts often contain lots of words in Latin (particularly English proper nouns, abbreviations and terms) and highlighting them creates a lot of noise without much value. The idea for Ukrainian spellchecker to concentrate on Ukrainian words that are misspelled. With this rule the spellchecker should still catch errors where Ukrainian words contain Latin letters (e.g. Latin «i» in «гранiтний») which is very common error. |
Hmm, do all Ukrainians always write foreign language words (used untranslated in the Ukrainian language) properly, without errors/typos? Would not that be better to (let the corresponding language dictionary) spellcheck these words instead of marking them as properly spelled and effectively ignoring the errors in them? As mentioned before my actual use case are multi-language texts containing English, German, Russian, and Ukrainian words at the same time. These behavior of the Ukrainian dictionary causes English and German words to be marked as properly spelled (even if they are not), which is not very helpful. Could you please think about your decision again and maybe change it? Thanks! p.s. I would say Latin «i» instead of Ukrainian «і» in words like «гранiтний» would be perfectly caught also without these iconv-rules. |
I am open to discussion. But this was a matter of practicality - a long time ago when we just created this hunspell dictionary we didn't have this option and with mixed-language texts (e.g. on wikipedia) it was extremely annoying to see everything in Latin red. |
My main use case is Notepad++ with DSpellCheck plugin, i.e. a text editor without a special file format for text documents like .doc or .odt and thus no way to save the annotated language. The logic applied in the DSpellCheck plugin is the following:
I like this logic much more than the behavior of MS-Office/LibreOffice/OpenOffice forcing me to explicitly annotate each word with the language it is written in and effectively increasing my efforts needed to get the text spellchecked. But even in LibreOffice/OpenOffice (your main use-case) the proper way to do spellchecking (as you write yourself) is to annotate the words coming from languages other than Ukrainian with the original language instead of marking the whole text as being Ukrainian and expecting the Ukrainian dictionary to ignore the foreign words for you. This is kinda absurd, a dictionary intended to support spellchecking actively ignores spelling errors. The main reason for this behavior is, as you said, it would produce a lot of noise, but from my point of view it was more your laziness - the effort needed to annotate every foreign language word was not worth it for you the added value - properly spellchecked every foreign language word. As to the spellchecking in Firefox. Also in Firefox people use multi-language spellchecking, see e.g. this bug report By the way, neither English nor German dictionaries have the options to ignore the words written in Cyrillic or Greek or any other non-Latin alphabet. |
Multilingual spellchecking was added very recently there, and this comment confirms the practicality of our original approach. Multilingual check is still broken in Firefox and will only be fixed in 103. |
Yeah, works perfectly.
Thanks a lot, looking forward for having it in the official release. |
BTW the plugin seems to pull pretty old version of hunspell_uk, the newest is here: https://github.com/brown-uk/dict_uk/releases/tag/v5.8.0 |
As far as I understand the DSpellCheck plugin has only one source for all dictionaries and that is LibreOffice. It simply doesn't support a separate source per dictionary, which is to be honest quite understandable. Do you have a process for pushing the updates of the Ukrainian dictionary to LibreOffice (s. LibreOffice/dictionaries@06a28cf, LibreOffice/dictionaries@cbda6f4) or do you expect the LibreOffice developers to pull the updates themselves from time to time? In either case thanks for pointing out the location of the most up-to-date Ukrainian dictionary. I can off course (and will) update it manually. |
I am uploading LibreOffice extension with each release here: https://extensions.libreoffice.org/en/extensions/show/ukrainian-spelling-dictionary-and-thesaurus |
Hmm, based on the commit log I would say LibreOffice (even the as-of-now most recent version 7.4.0.1) unfortunately still contains 5.3.1 and not 5.8.0 you mentioned. Would be great if you could ping the LibreOffice developers and clarify the dictionary update process with them. |
I've created LibreOffice/dictionaries#41 |
iconv * 0
lines in distr/hunspell/header/affix_header.txt (from L12 to L73) cause any word consisting of Latin alphabet characters to be converted to a sequence of zeros, which in turn causes the original word to be treated as correctly spelled.This behavior is incorrect - words consisting of Latin characters are not correct Ukrainian words and should thus be marked as misspelled. This is also very annoying in the multi-language documents as the Ukrainian dictionary effectively disables spell-checking for the Latin alphabet based languages.
Could you (@arysin?) please
iconv * \0
(not sure if the syntax is correct) if you actually wanted to remove Latin characters instead of converting them to a sequence of zero digits.The text was updated successfully, but these errors were encountered: