-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multi Language Support #6
base: master
Are you sure you want to change the base?
Conversation
Motivation: When developing Software with an UI other than english, spell checker needs to be able to spell check enligsh and the UI langauge, because many programmers still use english for comments, error messages etc. Implemented Features: - Support for spell checking multiple languages. The checker checks for correct spelling in multiple configurable languages in the same text. - Exclusion of Email addresses. Todo: Currently the languages are statically configured in the file SpellingTagger.Langauge string. I've never implemented a VisualStudio extension, so I don't know how to create a configuration dialog in the VisualStudio configurations.
I don't like this approach, sorry. The WPF bits of spellchecking are the most resource intensive, and I've had tons of complaints about it. This makes it at least twice as expensive (more, if you add other languages). It has the net effect of using the union of both dictionaries as valid words; that works for mixed language cases, but will cause false negatives in non-mixed cases (e.g. words that are misspelled in one language but properly spelled in the other; if spanish was supported, then "si" wouldn't be a misspelling, even if the file was english and that was a misspelling of "is"). I think a better approach is to:
Downsides include that being more, only being triggerable if there are misspellings (likely, if the language is wrong), being really expensive to switch (hopefully only once), and not supporting mixed-language projects. If you are interesting in trying that out, go for it :) If not, I may get to it in the next couple weeks. |
Hi,
The problem is, as a german developer, I can say that mixed languages in text are very common. In a .cs file comments and strings most of the time are in different languages. Even You’re right about the false negatives. Maybe there would be a way to set the language for individual strings or comments, and for whole html files, as html usually has only one language. This could work the way, that when a change of the current language in a string occurs, the minority language words would have to be checked again in the majority language. This way the checker could automatically determine the language. As I see you always pass one word to the TextBox for spell checking. Would it improve performance, to pass the whole text to check for? The approach you suppose might have better performance, but I don’t think the need to switch langauges is acceptable if one has different files with different languages open. This should be an automatic process. Greetings & Blessings, Simon Egli From: Noah Richards I don't like this approach, sorry. The WPF bits of spellchecking are the most resource intensive, and I've had tons of complaints about it. This makes it at least twice as expensive (more, if you add other languages). It has the net effect of using the union of both dictionaries as valid words; that works for mixed language cases, but will cause false negatives in non-mixed cases (e.g. words that are misspelled in one language but properly spelled in the other; if spanish was supported, then "si" wouldn't be a misspelling, even if the file was english and that was a misspelling of "is"). I think a better approach is to:
Downsides include that being more, only being triggerable if there are misspellings (likely, if the language is wrong), being really expensive to switch (hopefully only once), and not supporting mixed-language projects. If you are interesting in trying that out, go for it :) If not, I may get to it in the next couple weeks. — |
different langauage textBoces are only assigned text when really needed.
Pushed another version, with performance improvements. From: Noah Richards I don't like this approach, sorry. The WPF bits of spellchecking are the most resource intensive, and I've had tons of complaints about it. This makes it at least twice as expensive (more, if you add other languages). It has the net effect of using the union of both dictionaries as valid words; that works for mixed language cases, but will cause false negatives in non-mixed cases (e.g. words that are misspelled in one language but properly spelled in the other; if spanish was supported, then "si" wouldn't be a misspelling, even if the file was english and that was a misspelling of "is"). I think a better approach is to:
Downsides include that being more, only being triggerable if there are misspellings (likely, if the language is wrong), being really expensive to switch (hopefully only once), and not supporting mixed-language projects. If you are interesting in trying that out, go for it :) If not, I may get to it in the next couple weeks. — |
Thanks Simon, inlined:
Fair enough, though this was never a use case I was aiming for. I can see how single-language spellchecking would be frustrating.
I'd prefer it to be, in this order:
Ah, my apologies. I believe I was confused at the code, which may be wrong(?). It looks like, on a misspelling, it checks the other languages to see if they have misspellings, but it doesn't seem to be populating them first with the text to check. In any event, as you say, it is only twice as expensive for the number of misspelled words.
That starts to sound really complicated. I'm fine with the language union case, as long as the user explicitly picks it.
Sadly, exactly the opposite. WPF's natural language processing is incredibly processor intensive, and I got a 3-4x performance improvement by pre-splitting words (and removing ignored words).
Agreed, setting the default for multi-language scenarios would be a pain. If you have a solution that offers the three things I listed above (default english, allow select of single other language, allow select of multiple languages), that would be ideal. I'm thinking, then:
That's still kinda ugly, but it would let a user (slowly) right-click to pick multiple languages and/or which languages are used. Thoughts? |
… window. Implemented Features: - Support for spell checking multiple languages. The checker checks for correct spelling in multiple configurable languages in the same text. - Configurable language support through smarttags. - Configuration of custom Dictionaries. Spell checker now determines language on a per sentence base. It tolerates foreign language in a sentence, if it is a group of at least SpellingTagger.MinForeignWordSequence words which is currently set to 3 words. If only one language is enabled performance is identical to the old algorithm, with more languages, it decreqases proportional to the misspells.
Hi, I pushed a new totally revised version. But it still has some minor bugs, will still be half a day of work for the final version. This version has Language Settings support in the SmartTags. One can Enable/Disable any language, use custom dictionaries (not fully tested), supply default custom dictionaries for languages in the vsix, supports .lex & .dic (rundimentarily) formats. The spell checker is smart, and determines the language on a per sentence base. It allows foreign language in a sentence, but only sequences of a minimal length, what is currently set to 3 words, to avoid false negatives. Actually, with many misspells it is really rather slow, but this is all configurable, with only en-US enabled (what is the default) the speed is as before. I’ll be in holidays next week, so I can’t finish it till 16. September. Greetings & Blessings, Simon Egli From: Noah Richards Thanks Simon, inlined: The problem is, as a german developer, I can say that mixed languages in text are very common. In a .cs file comments and strings most of the time are in different languages. Even strings, if one does not use Resources/localization, are in english for error messages, and german for the UI. A spell checker only supporting one language is barely usable in this scenario. Also usually you won’t need more than 2 languages. Fair enough, though this was never a use case I was aiming for. I can see how single-language spellchecking would be frustrating. If the langauges are configurable in a easy way, as you proposed in the smart tag provider, then this feature can easily be turned off, by only specifying one language. I'd prefer it to be, in this order:
For performance, I think the approach is not twice as expensive, because it first checks spelling with the current language, and only on a spelling error it needs to check the other langages. Ah, my apologies. I believe I was confused at the code, which may be wrong(?). It looks like, on a misspelling, it checks the other languages to see if they have misspellings, but it doesn't seem to be populating them first with the text to check. In any event, as you say, it is only twice as expensive for the number of misspelled words. You’re right about the false negatives. Maybe there would be a way to set the language for individual strings or comments, and for whole html files, as html usually has only one language. This could work the way, that when a change of the current language in a string occurs, the minority language words would have to be checked again in the majority language. This way the checker could automatically determine the language. That starts to sound really complicated. I'm fine with the language union case, as long as the user explicitly picks it. As I see you always pass one word to the TextBox for spell checking. Would it improve performance, to pass the whole text to check for? Sadly, exactly the opposite. WPF's natural language processing is incredibly processor intensive, and I got a 3-4x performance improvement by pre-splitting words (and removing ignored words). The approach you suppose might have better performance, but I don’t think the need to switch langauges is acceptable if one has different files with different languages open. This should be an automatic process. Agreed, setting the default for multi-language scenarios would be a pain. If you have a solution that offers the three things I listed above (default english, allow select of single other language, allow select of multiple languages), that would be ideal. I'm thinking, then:
That's still kinda ugly, but it would let a user (slowly) right-click to pick multiple languages and/or which languages are used. Thoughts? — |
No bugs found so far. Ready for testing, or even RTM. Implements multiple language feature. Languages are checked on a per sentence basis. Foreign language sequences of at least MinForeignWordSequence (3) words are allowed in a sentence, smaller sequences are treated as misspelled. Custom Dictionaries are also supported either in the ISpell dic or in the WPF lex format.
Hello Noah, Sorry, I couldn’t reach you via mail and tought you’re not answering, but I checked this now, I think the mails never leaved my mailserver or whatever, as I have my own, and its still in its experimental stage, ugh. For Spellchecker, the thing is finished long ago, maybe apart from some bugs and additional custom dictionaries for langauges the WPF Spellchecker doesn’t support, that could be included in the VSIX. I use it in my VS everyday. A short summay:
Lacking features: Unfortunately, the algorithm for determining the language became a bit complicated and not very handsome. But apart from being ugly, it is smart, and determines the language on a per sentence base, choosing from the active languages. It allows foreign language in a sentence, but only sequences of a minimal length, what is currently set to 3 words, to avoid false negatives. Performace drops proportinal to the misspellings times active languages for single language sentences, and for mixed language sentences there is additional overhead when the checker chooses a language and possibly has to check some words again in that language. For one active language performance is approximately the same as it was with your solution. A small bug: TODO:
I didn’t receive any mails from you, so if I can’t get through to you, I wonder if I should publish my fork in the VS Gallery myself? But I would want to avoid this since re-checking is lacking. Greetings & Blessings, Simon Egli From: Noah Richards Thanks Simon, inlined: The problem is, as a german developer, I can say that mixed languages in text are very common. In a .cs file comments and strings most of the time are in different languages. Even strings, if one does not use Resources/localization, are in english for error messages, and german for the UI. A spell checker only supporting one language is barely usable in this scenario. Also usually you won’t need more than 2 languages. Fair enough, though this was never a use case I was aiming for. I can see how single-language spellchecking would be frustrating. If the langauges are configurable in a easy way, as you proposed in the smart tag provider, then this feature can easily be turned off, by only specifying one language. I'd prefer it to be, in this order:
For performance, I think the approach is not twice as expensive, because it first checks spelling with the current language, and only on a spelling error it needs to check the other langages. Ah, my apologies. I believe I was confused at the code, which may be wrong(?). It looks like, on a misspelling, it checks the other languages to see if they have misspellings, but it doesn't seem to be populating them first with the text to check. In any event, as you say, it is only twice as expensive for the number of misspelled words. You’re right about the false negatives. Maybe there would be a way to set the language for individual strings or comments, and for whole html files, as html usually has only one language. This could work the way, that when a change of the current language in a string occurs, the minority language words would have to be checked again in the majority language. This way the checker could automatically determine the language. That starts to sound really complicated. I'm fine with the language union case, as long as the user explicitly picks it. As I see you always pass one word to the TextBox for spell checking. Would it improve performance, to pass the whole text to check for? Sadly, exactly the opposite. WPF's natural language processing is incredibly processor intensive, and I got a 3-4x performance improvement by pre-splitting words (and removing ignored words). The approach you suppose might have better performance, but I don’t think the need to switch langauges is acceptable if one has different files with different languages open. This should be an automatic process. Agreed, setting the default for multi-language scenarios would be a pain. If you have a solution that offers the three things I listed above (default english, allow select of single other language, allow select of multiple languages), that would be ideal. I'm thinking, then:
That's still kinda ugly, but it would let a user (slowly) right-click to pick multiple languages and/or which languages are used. Thoughts? — |
Hey Simon, good to hear from you :) I'll take another look at the changes ( Let's see, a few answers/thoughts:
I'll go back through the comparison this week. I hope to have some time at Thanks! -Noah On Mon, Nov 26, 2012 at 9:14 AM, simonegli8 [email protected]:
|
…globe image with one I have copyrights.
Hi, I’ll pushed some minor changes to Spellchecker: Changed all formatting of my sources to standard & replaced the nations flags globe with an image I have the copyright of. Greetings & Blessings, Simon Egli P.S. From: Noah Richards Hey Simon, good to hear from you :) I'll take another look at the changes ( Let's see, a few answers/thoughts:
I'll go back through the comparison this week. I hope to have some time at Thanks! -Noah On Mon, Nov 26, 2012 at 9:14 AM, simonegli8 [email protected]:
|
…ation from Assembly.CodeBase.
Conflicts: SpellChecker.Implementation/Spelling/Configuration.cs
Basic Support for SpecFlow
Motivation:
When developing Software with an UI other than english, spell checker needs to be able to spell check english and the UI langauge,
because many programmers still use english for comments, error messages etc.
Implemented Features:
The checker checks for correct spelling in multiple configurable languages in the same text. Languages are
determined on a per sentence basis. Foreign language sequences of less than 3 words are treated as misspellings.