Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correction in Hindi Phrases #16

Open
bazingarj opened this issue Jul 17, 2019 · 1 comment
Open

Correction in Hindi Phrases #16

bazingarj opened this issue Jul 17, 2019 · 1 comment

Comments

@bazingarj
Copy link

मिठाई - mithai (coming up as mitha-i)
खुशबू - khushbu ( coming up as khasaba)
लेना - lena ( coming up as lana)
पैसे - paise (comping up as pasa)
अब - aba (must be ab)

@ausi
Copy link
Owner

ausi commented Jul 18, 2019

The transliteration between scripts, like Devanagari to Latin in this case, is performed by the ICU library which uses the data of the Unicode CLDR.

The Devanagari-Latin transform internally transforms to InterIndic first and afterwards from InterIndic to Latin.

Taking “अब” for example, you can see that “अ” gets transformed to \uE005 in Devanagari-InterIndic.xml:20 and “ब” to \uE02C in Devanagari-InterIndic.xml:59.
The Codepoints \uE005 and \uE02C get assigned to $wa in InterIndic-Latin.xml:21 and $ba in InterIndic-Latin.xml:60.
And finally $wa to “a” in InterIndic-Latin.xml:446 and $ba to “ba” in InterIndic-Latin.xml:298.

In short:

अ -> \uE005 -> $wa -> a
ब -> \uE02C -> $ba -> ba

As I have no knowledge about Devanagari I can’t spot at which point the transformations are wrong.
It would be great, if you can file a ticket directly at the CLDR: http://cldr.unicode.org/index/bug-reports

You can reproduce the issue with a single line of PHP code:

echo \Transliterator::create('Deva-Latn')->transliterate('अब');

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants