Skip to content

Commit

Permalink
Merge remote-tracking branch 'origin/develop'
Browse files Browse the repository at this point in the history
  • Loading branch information
mozillazg committed Jun 1, 2019
2 parents a72d7a6 + 70940bb commit c6eacfe
Show file tree
Hide file tree
Showing 7 changed files with 61 additions and 45 deletions.
9 changes: 8 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,10 @@
# ChangeLog

## [0.8.0] (2019-06-01)

* 增加 `kanji.txt` 日本自造汉字的拼音数据 via [#32]. Thanks [@LuoZijun](https://github.com/LuoZijun)
* 去掉几个有误的轻声数据


## [0.7.0] (2019-03-31)

Expand All @@ -23,7 +28,7 @@

## [0.5.1] (2018-04-19)

* 更正 ```` 的拼音数据 via [#26] 。Thanks [shibingli](https://github.com/shibingli)
* 更正 ```` 的拼音数据 via [#26] 。Thanks [@shibingli](https://github.com/shibingli)
* 更新 `` 的拼音数据 via [#27]


Expand Down Expand Up @@ -80,6 +85,7 @@
[#27]: https://github.com/mozillazg/pinyin-data/pull/27
[68dc169]: https://github.com/mozillazg/pinyin-data/commit/68dc169c3f0f02cb9bf53290edab2d2d2463e0c5
[8802f31]: https://github.com/mozillazg/pinyin-data/commit/8802f31e0e65c6e34a497adb55993425741a9d41
[#32]: https://github.com/mozillazg/pinyin-data/pull/32

[0.2.0]: https://github.com/mozillazg/pinyin-data/compare/v0.1.0...v0.2.0
[0.3.0]: https://github.com/mozillazg/pinyin-data/compare/v0.2.0...v0.3.0
Expand All @@ -91,3 +97,4 @@
[0.6.1]: https://github.com/mozillazg/pinyin-data/compare/v0.6.0...v0.6.1
[0.6.2]: https://github.com/mozillazg/pinyin-data/compare/v0.6.1...v0.6.2
[0.7.0]: https://github.com/mozillazg/pinyin-data/compare/v0.6.2...v0.7.0
[0.8.0]: https://github.com/mozillazg/pinyin-data/compare/v0.7.0...v0.8.0
6 changes: 5 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,14 +29,16 @@
* `kMandarin_overwrite.txt`: 手工纠正 `kMandarin.txt` 中有误的拼音数据(**可以修改**
* `GBK_PUA.txt`: [Private Use Area](https://en.wikipedia.org/wiki/Private_Use_Areas) 中有拼音的汉字,参考 [GB 18030 - 维基百科,自由的百科全书](https://zh.wikipedia.org/wiki/GB_18030#PUA)**可以修改**
* `nonCJKUI.txt`: 不属于 [CJK Unified Ideograph](https://en.wikipedia.org/wiki/CJK_Unified_Ideographs) 但是却有拼音的字符(**可以修改**
* `kanji.txt`: [日本自造汉字](https://zh.wikipedia.org/wiki/%E6%97%A5%E6%9C%AC%E6%B1%89%E5%AD%97#7_%E6%97%A5%E6%9C%AC%E6%B1%89%E5%AD%97%E7%9A%84%E6%B1%89%E8%AF%AD%E6%99%AE%E9%80%9A%E8%AF%9D%E8%A7%84%E8%8C%83%E8%AF%BB%E9%9F%B3%E8%A1%A8) 的拼音数据 (**可以修改**
* `kMandarin_8105.txt`: [《通用规范汉字表》](https://zh.wikipedia.org/wiki/通用规范汉字表)(2013 年版)里 8105 个汉字最常用的一个读音 (**可以修改**)
* `overwrite.txt`: 手工纠正的拼音数据(**可以修改**
* `pinyin.txt`: 合并上述文件后的拼音数据
* `zdic.txt`: [汉典网](http://zdic.net) 的拼音数据
* `zdic.txt`: [汉典网](http://zdic.net) 的拼音数据**可以修改**


## 参考资料

* [汉语拼音方案](http://www.moe.edu.cn/s78/A19/yxs_left/moe_810/s230/195802/t19580201_186000.html)
* [Unihan Database Lookup](http://www.unicode.org/charts/unihan.html)
* [汉典 zdic.net](http://www.zdic.net/)
* [字海网,叶典网](http://zisea.com/)
Expand All @@ -45,6 +47,8 @@
* [GB 18030 - 维基百科,自由的百科全书](https://zh.wikipedia.org/wiki/GB_18030#PUA)
* [通用规范汉字表 - 维基百科,自由的百科全书](https://zh.wikipedia.org/wiki/%E9%80%9A%E7%94%A8%E8%A7%84%E8%8C%83%E6%B1%89%E5%AD%97%E8%A1%A8)
* [China’s 通用规范汉字表 (Tōngyòng Guīfàn Hànzìbiǎo)](https://blogs.adobe.com/CCJKType/2014/03/china-8105.html)
* [日本汉字的汉语读音规范](http://www.moe.gov.cn/s78/A19/yxs_left/moe_810/s230/201001/t20100115_75698.html)
* [日本汉字的汉语普通话规范读音表- 维基百科](https://zh.wikipedia.org/wiki/%E6%97%A5%E6%9C%AC%E6%B1%89%E5%AD%97#7_%E6%97%A5%E6%9C%AC%E6%B1%89%E5%AD%97%E7%9A%84%E6%B1%89%E8%AF%AD%E6%99%AE%E9%80%9A%E8%AF%9D%E8%A7%84%E8%8C%83%E8%AF%BB%E9%9F%B3%E8%A1%A8)

[unihan]: http://www.unicode.org/charts/unihan.html

Expand Down
34 changes: 0 additions & 34 deletions kMandarin_overwrite.txt
Original file line number Diff line number Diff line change
Expand Up @@ -62,37 +62,3 @@ U+295F5: zhēng # 𩗵
U+29B5D: wǒ # 𩭝
U+2A048: zhuāng # 𪁈
U+2A2A2: shí # 𪊢

# 日本汉字读音
U+5302: xiōng,yún # 匂, yún 为日本汉字读音; xiōng 为现代汉语读音;
U+4E3C: jǐng,dǎn # 丼, dǎn 为日本汉字读音; jǐng 为现代汉语读音;
U+8FBB: shí # 辻
U+8FBC: rù # 込
U+51E7: jīn # 凧
U+6763: shān # 杣
U+67A0: zá # 枠
U+7551: tián # 畑
U+6803: lì # 栃
U+6802: méi # 栂
U+5CE0: kǎ # 峠
U+4FE3: yǔ # 俣
U+7C7E: rèn # 籾
U+7560: tián # 畠
U+96EB: xià # 雫
U+7B39: shì # 笹
U+5840: píng # 塀
U+6919: chāng # 椙
U+7872: yù # 硲
U+86EF: lǎo # 蛯
U+55B0: cān # 喰
U+643E: zhà # 搾
U+698A: shén # 榊
U+50CD: dòng # 働
U+7CC0: huā # 糀
U+9786: bǐng # 鞆
U+69C7: zhēn # 槇
U+6A2B: jīan # 樫
U+9D2B: tián # 鴫
U+567A: xīn # 噺
U+7C17: liáng # 簗
U+9EBF: mó # 麿
32 changes: 32 additions & 0 deletions kanji.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
U+5302: yún # 匂 yún 为日本汉字读音; xiōng 为现代汉语读音;
U+4E3C: dǎn # 丼 dǎn 为日本汉字读音; jǐng 为现代汉语读音;
U+8FBB: shí # 辻
U+8FBC: rù # 込
U+51E7: jīn # 凧
U+6763: shān # 杣
U+67A0: zá # 枠
U+7551: tián # 畑
U+6803: lì # 栃
U+6802: méi # 栂
U+5CE0: kǎ # 峠
U+4FE3: yǔ # 俣
U+7C7E: rèn # 籾
U+7560: tián # 畠
U+96EB: xià # 雫
U+7B39: shì # 笹
U+5840: píng # 塀
U+6919: chāng # 椙
U+7872: yù # 硲
U+86EF: lǎo # 蛯
U+55B0: cān # 喰
U+643E: zhà # 搾
U+698A: shén # 榊
U+50CD: dòng # 働
U+7CC0: huā # 糀
U+9786: bǐng # 鞆
U+69C7: zhēn # 槇
U+6A2B: jiān # 樫
U+9D2B: tián # 鴫
U+567A: xīn # 噺
U+7C17: liáng # 簗
U+9EBF: mó # 麿
10 changes: 6 additions & 4 deletions merge_unihan.py
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,7 @@ def extend_pinyins(old_map, new_map, only_no_exists=False):
else:
old_map.setdefault(code, []).extend(pinyins)


if __name__ == '__main__':
raw_pinyin_map = {}
with open('kHanyuPinyin.txt') as fp:
Expand All @@ -87,13 +88,14 @@ def extend_pinyins(old_map, new_map, only_no_exists=False):
extend_pinyins(raw_pinyin_map, adjust_pinyin_map)
with open('kHanyuPinlu.txt') as fp:
khanyupinyinlu = parse_pinyins(fp)
# 之所以只增加不存在的拼音数据而不更新已有的数据
# 是因为 kHanyuPinlu 的拼音数据中存在一部分不需要的轻声拼音
# 以及部分音调标错了位置,比如把 ``ǒu`` 标成了 ``oǔ``
extend_pinyins(raw_pinyin_map, khanyupinyinlu, only_no_exists=True)
extend_pinyins(adjust_pinyin_map, _map)
extend_pinyins(raw_pinyin_map, adjust_pinyin_map)
with open('GBK_PUA.txt') as fp:
pua_pinyin_map = parse_pinyins(fp)
extend_pinyins(raw_pinyin_map, pua_pinyin_map)
with open('kanji.txt') as fp:
_map = parse_pinyins(fp)
extend_pinyins(raw_pinyin_map, _map, only_no_exists=True)

with open('overwrite.txt') as fp:
overwrite_pinyin_map = parse_pinyins(fp)
Expand Down
5 changes: 5 additions & 0 deletions overwrite.txt
Original file line number Diff line number Diff line change
Expand Up @@ -40,3 +40,8 @@ U+E864: luán # 
U+241FE: yíng # 𤇾
U+275C8: nú # 𧗈
U+47C1: xiāo,chāo # 䟁
U+9EBF: mí # 麿
U+7C17: zhù # 簗
U+8279: cǎo # 艹
U+88CF: lǐ # 裏
U+88E1: lǐ # 裡
10 changes: 5 additions & 5 deletions pinyin.txt
Original file line number Diff line number Diff line change
Expand Up @@ -17554,7 +17554,7 @@ U+7C13: diāo # 簓
U+7C14: suō # 簔
U+7C15: lè # 簕
U+7C16: duàn # 簖
U+7C17: liang # 簗
U+7C17: zhù # 簗
U+7C18: xiāo # 簘
U+7C19: bó # 簙
U+7C1A: mì # 簚
Expand Down Expand Up @@ -19188,7 +19188,7 @@ U+8275: pīng # 艵
U+8276: yàn # 艶
U+8277: yàn # 艷
U+8278: cǎo # 艸
U+8279: cao # 艹
U+8279: cǎo # 艹
U+827A: yì # 艺
U+827B: lè,jí # 艻
U+827C: tīng,dǐng # 艼
Expand Down Expand Up @@ -20810,7 +20810,7 @@ U+88CB: shù # 裋
U+88CC: jiá,jiā,xié # 裌
U+88CD: kǔn # 裍
U+88CE: chéng,chěng # 裎
U+88CF: lǐ,li # 裏
U+88CF: lǐ # 裏
U+88D0: juān # 裐
U+88D1: shēn # 裑
U+88D2: póu,bāo # 裒
Expand All @@ -20828,7 +20828,7 @@ U+88DD: zhuāng # 裝
U+88DE: shuì # 裞
U+88DF: shā # 裟
U+88E0: qún # 裠
U+88E1: lǐ,li # 裡
U+88E1: lǐ # 裡
U+88E2: lián,shāo # 裢
U+88E3: liǎn # 裣
U+88E4: kù # 裤
Expand Down Expand Up @@ -26426,7 +26426,7 @@ U+9EBB: má,mā # 麻
U+9EBC: me # 麼
U+9EBD: mó,má,ma,me # 麽
U+9EBE: huī # 麾
U+9EBF: mo # 麿
U+9EBF: # 麿
U+9EC0: zōu # 黀
U+9EC1: nún # 黁
U+9EC2: fén # 黂
Expand Down

0 comments on commit c6eacfe

Please sign in to comment.