Skip to content

Commit

Permalink
Remove Sanscript due to Active Record dependency
Browse files Browse the repository at this point in the history
  • Loading branch information
wyugue committed Sep 9, 2020
1 parent 126c60b commit 9eff924
Show file tree
Hide file tree
Showing 3 changed files with 3 additions and 26 deletions.
9 changes: 1 addition & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,6 @@ Basically a wrapper for more specific gems. It uses unicode/scripts to detect th
The romanization of the following scripts is currently supported:
- Chinese Characters, with the gem [ChinesePinyin](https://github.com/flyerhzm/chinese_pinyin)
- Arabic, with a conversion table based on [URoman](https://github.com/isi-nlp/uroman)
- Devanagari, Ghurmukhi, Gujarati, Malayalam, Telugu and Tamil with the gem [Sanscript](https://github.com/ubcsanskrit/sanscript.rb). Since Sanscript is optmized for Sanskrit, southern Bhramic script support may be incomplete
- Cyrillic, with the gem [Translit](https://github.com/tjbladez/translit)
- Japanese, with the gems [Romaji](https://github.com/makimoto/romaji) and [Mecab Standalone](https://github.com/wyugue/mecab_standalone) (a Ruby wrapper for Mecab). Mecab is also used to ensure correct kanji readings and tokenization.

Expand All @@ -28,23 +27,17 @@ Latinizer.t('漢語,又稱中文、唐話、華語为整个汉语族,')
Latinizer.t('اللُّغَة العَرَبِيّة هي أكثر اللغات السامية تحدثاً')
=> "allughaa al'arabiya hy akthr allghat alsamya thdtha"
Latinizer.t('हिन्दी विश्व की एक प्रमुख भाषा है एवं भारत की राजभाषा है')
=> "hindī viśva kī eka pramukha bhāṣā hai evaṃ bhārata kī rājabhāṣā hai"
Latinizer.t('平仮名は、日本語の表記に用いられる音節文字')
=> "hiragana ha, nihongo no hyouki ni mochii rareru onsetsu moji"
Latinizer.t('Ру́сский язы́к один из восточнославянских языков, национальный язык русского народа.')
=> "Rússkij qzýk odin iz wostochnoslawqnskih qzykow, nacional'nyj qzyk russkogo naroda."
```

Use option `:ascii` for ascii only output. This will remove tones in Chinese, and force ITRANS romanization on Bhramic scripts:
Use option `:ascii` for ascii only output. This will remove tones in Chinese:
```
Latinizer.t('漢語,又稱中文、唐話、華語为整个汉语族,', :ascii)
=> "han yu you cheng zhong wen tang hua hua yu wei zheng ge han yu zu"
Latinizer.t('हिन्दी विश्व की एक प्रमुख भाषा है एवं भारत की राजभाषा है', :ascii)
=> "hindI vishva kI eka pramukha bhAShA hai evaM bhArata kI rAjabhAShA hai"
```

Use option `:ja` to force Japanese romanization on kanji-only strings
Expand Down
5 changes: 2 additions & 3 deletions latinizer.gemspec
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
Gem::Specification.new do |s|
s.name = 'latinizer'
s.version = '0.1.3'
s.version = '0.1.4'
s.date = '2020-09-08'
s.summary = 'latinizer'
s.description = 'A simple general latinization / romanization / transliteration gem wrapping Mecab, Sanscript, Chinese Pinyin and other more specific romanization gems'
s.description = 'A simple general latinization / romanization / transliteration gem wrapping Mecab, Chinese Pinyin and other more specific romanization gems'
s.authors = ['William Yugue']
s.email = '[email protected]'
s.license = 'MIT'
Expand All @@ -13,7 +13,6 @@ Gem::Specification.new do |s|
s.add_runtime_dependency 'chinese_pinyin', '~> 1.0'
s.add_runtime_dependency 'mecab_standalone', '~> 0.1', '>= 0.1.2'
s.add_runtime_dependency 'romaji', '~> 0.2'
s.add_runtime_dependency 'sanscript', '~> 0.10'
s.add_runtime_dependency 'translit', '~> 0.1'
s.add_runtime_dependency 'unicode-scripts', '~> 1.6'
s.add_runtime_dependency 'babosa', '~> 1.0'
Expand Down
15 changes: 0 additions & 15 deletions lib/latinizer.rb
Original file line number Diff line number Diff line change
Expand Up @@ -2,18 +2,15 @@ class Latinizer
require 'chinese_pinyin'
require 'mecab_standalone'
require 'romaji'
require 'sanscript'
require 'translit'
require 'unicode/scripts'
require 'babosa'

def self.t(text, opt = nil)
scripts = Unicode::Scripts.scripts(text) - ['Common', 'Inherited', 'Latin']
indic_options = :iast
pinyin_options = {tonemarks: true}

if opt == :ascii
indic_options = :itrans
pinyin_options = {}
elsif opt == :ja
return romanize_japanese(text)
Expand All @@ -26,18 +23,6 @@ def self.t(text, opt = nil)
when 'Cyrillic'
latinized = Translit.convert(text, :english)
return opt == :ascii ? latinized.to_slug.to_ascii.to_s : latinized
when 'Devanagari'
return Sanscript.transliterate(text, :devanagari, indic_options)
when 'Malayalam'
return Sanscript.transliterate(text, :malayalam, indic_options)
when 'Tamil'
return Sanscript.transliterate(text, :tamil, indic_options)
when 'Telugu'
return Sanscript.transliterate(text, :telugu, indic_options)
when 'Gurmukhi'
return Sanscript.transliterate(text, :gurmukhi, indic_options)
when 'Gujarati'
return Sanscript.transliterate(text, :gujarati, indic_options)
when 'Han'
return Pinyin.t(text, pinyin_options)
end
Expand Down

0 comments on commit 9eff924

Please sign in to comment.