-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Thanglish Word Issue #1
Comments
I don't know Tamil, so can't really fix this problem. This must be a problem with the transliteration scheme. Pinging @Kishore96in since he's familiar with this. The "How To Write A Word" is actually reverse transliteration. It shouldn't give output for english words. It's a bug which I've fixed in subins2000/varnamd@7563a9e |
This issue should be fixed by the changes to the scheme file in varnamproject/libvarnam#152 This is what I get on my system (with the scheme file from that MR): |
Thank you for confirming it @Kishore96in . Have merged it to GoVarnam. The changes are now live at https://varnam.subinsb.com as well |
The issue still seems to be reproducible at https://varnam.subinsb.com . @subins2000 Are you not using any wordlist to train that instance for Tamil? If you are interested, I can provide the wordlist that I am using to train varnam. The 'canonical' way to type 'என்ன' would be 'ennnna', but of course this is not intuitive. The 'root cause' is that like many other Indian languages, Tamil has multiple sounds which would get mapped to the same English string 'na'. The workaround used in the scheme file for Tamil was to map these sounds to 'na', 'Na', 'nna', and so on (I don't know what the other languages do). In an attempt to allow more 'natural' input, I had modified the scheme file so that all these sounds also have 'na' as a 'secondary' transliteration (the ones inside the nested square brackets). Even with the changes, varnam only shows such suggestions if it is trained with a wordlist (before the changes to the scheme, varnam would not show such suggestions even after learning from a wordlist). Is there some better way to implement this? To summarize, completely fixing this issue would require changes to the scheme file (already merged) and training with a wordlist. |
Thank you for the explanation. It makes more sense now. It's kind of difficult to understand since I don't know about the language much.
In the server https://varnam.subinsb.com there were no words in dictionary except for Malayalam. I have now imported some 1 lakh mostly words for Tamil. The suggestion
How many sounds are there for na in Tamil ? In Malayalam there are (this mapping is the same in Malayalam varnam scheme as well) :
From looking at the Tamil scheme file |
In Tamil, for 'na', we have As far as I understand, it seems ந and ன are both denoted by ന in Malayalam. The double na-s which you mention would be written in Tamil as ன்ன and ண்ண, i.e. we don't have dedicated conjoined characters to represent those.
I don't completely understand the concept of 'chill letters' in Malayalam, but it seems to be a variation of other letters that appears only at the end of words. If so, I don't think that concept exists in Tamil. |
enna = என்ன
en = என்
na = ன
The text was updated successfully, but these errors were encountered: