Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
wikipediahakimi97 authored Dec 5, 2024
1 parent 97f26d2 commit 5149623
Showing 1 changed file with 8 additions and 6 deletions.
14 changes: 8 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,12 @@
# MediaWiki Malay Latin script (ms-Latn) to Malay Arabic script (ms-Arab)
Currently, there are four types of converters that have been developed for conversion of main text articles of Wikipedia (could be utilized for other Wikimedia-related projects too):
* penukar-rumi-jawi-aksara.js: Converts ms-Latn to ms-Arab on a letter-by-letter basis. It produces largely erroneous results but offers the fastest conversion speed.
* penukar-rumi-jawi-kamus.js: Converts ms-Latn to ms-Arab on a word-by-word basis. It has the highest accuracy among the three types of converters, but its conversion speed is the slowest.
* penukar-rumi-jawi-hibrid.js: Converts ms-Latn to ms-Arab on a word-by-word basis using letter-number codepoint intermediaries. These codepoints are then converted into Arabic letters on a letter-by-letter basis. This method uses the smallest dictionary database to maintain accuracy while ensuring all letters can be converted without exclusion. It has the advantage of accuracy similar to the Kamus converter, while ensuring that all letters are converted into Jawi script like the Aksara converter. Theoretically this converter should be faster than kamus converter since the hybriddictionaryforconverter.js would store less entries than fulldictionaryforconverter.js.
* penukar-rumi-jawi-wikidata.js: Converts ms-Latn to ms-Arab on a word-by-word basis by utilizing Wikidata Query Service to fetch the Wikidata's lexicographical data (with namespace prefix Lexeme:). The SPARQL query code has been updated so that it will generate a list of ms forms with corresponding ms-arab forms prior conversion. This converter is potential to be faster than hibrid converter, with accuracy highly dependent on Wikidata's lexicographical data.
Currently, four types of converters have been developed for the conversion of main text articles on Wikipedia (these can also be utilized for other Wikimedia-related projects):

* penukar-rumi-jawi-aksara.js: Converts ms-Latn to ms-Arab on a letter-by-letter basis. While it offers the fastest conversion speed, its results are largely erroneous.
* penukar-rumi-jawi-kamus.js: Converts ms-Latn to ms-Arab on a word-by-word basis. It has the highest accuracy among the four types of converters but is the slowest in terms of conversion speed.
* penukar-rumi-jawi-hibrid.js: Converts ms-Latn to ms-Arab on a word-by-word basis using letter-number codepoint intermediaries, which are then converted into Arabic letters on a letter-by-letter basis. This method uses the smallest dictionary database to maintain accuracy while ensuring that all letters are converted without exclusion. It combines the accuracy of the kamus converter with the comprehensive coverage of the aksara converter. Theoretically, this converter should be faster than the kamus converter because the hybriddictionaryforconverter.js stores fewer entries than fulldictionaryforconverter.js.
* penukar-rumi-jawi-wikidata.js: Converts ms-Latn to ms-Arab on a word-by-word basis by utilizing the Wikidata Query Service to fetch lexicographical data (with the namespace prefix Lexeme:). The SPARQL query code has been updated to generate a list of ms forms with corresponding ms-Arab forms before conversion. This converter has the potential to be faster than the hibrid converter, with its accuracy depending heavily on Wikidata's lexicographical data.

In addition to the converters, two dictionary databases are available:

In addition to the converters, there are two dictionary databases:
* fulldictionaryforconverter.js: Stores the dictionary for penukar-rumi-jawi-kamus.js.
* hybriddictionaryforconverter.js: Stores the dictionary for penukar-rumi-jawi-hibrid.js.

0 comments on commit 5149623

Please sign in to comment.