Skip to content

Commit

Permalink
Merge pull request #12 from mozillazg/develop
Browse files Browse the repository at this point in the history
v0.4.0
  • Loading branch information
mozillazg authored Oct 17, 2016
2 parents 1b473fb + b36bd95 commit 0fec9a7
Show file tree
Hide file tree
Showing 14 changed files with 16,857 additions and 99 deletions.
13 changes: 13 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,14 @@
# ChangeLog


## 0.4.0 (2016-10-17):

* Update PUA.txt 详见 [#7](https://github.com/mozillazg/pinyin-data/issues/7) thanks [@Artoria2e5][@Artoria2e5]
* Rename PUA.txt to GBK_PUA.txt 详见 [#7](https://github.com/mozillazg/pinyin-data/issues/7)
* Add kMandarin_8105.txt (《通用规范汉字表》里 8105 个汉字最常用的一个读音) [#9][#9] [#11][#11]
* Update pinyin.txt with latest data


## 0.3.0 (2016-08-19):

* Fixed format of zdic.txt via [b8e4394](https://github.com/mozillazg/pinyin-data/commit/b8e439490d2c6e8c711652983db52fb69136919b).
Expand All @@ -20,3 +28,8 @@
## 0.1.0 (2016-03-11)

* Initial Release


[@Artoria2e5]: https://github.com/Artoria2e5
[#9]: https://github.com/mozillazg/pinyin-data/pull/9
[#11]: https://github.com/mozillazg/pinyin-data/pull/11
82 changes: 82 additions & 0 deletions GBK_PUA.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
# GBK/GB 18030 PUA 映射
# 详见:https://zh.wikipedia.org/wiki/GB_18030#PUA
# U+E815: #  Unihan: U+2E81 ⺁
U+E816: zuǒ #  Unihan: U+20087 𠂇
# U+E817: #  Unihan: U+20089 𠂉
U+E818: gǔn #  Unihan: U+200CC 𠃌
# U+E819: #  Unihan: U+2E84 ⺄
U+E81A: zhòu,zhū #  Unihan: U+3473 㑳
U+E81B: zhòu #  Unihan: U+3447 㑇
# U+E81C: #  Unihan: U+2E88 ⺈
# U+E81D: #  Unihan: U+2E8B ⺋
# U+E81E: #  Unihan: U+9FB4 龴
U+E81F: wāi #  Unihan: U+359E 㖞
U+E820: hǎn #  Unihan: U+361A 㘚
U+E821: hǎn #  Unihan: U+360E 㘎
# U+E822: #  Unihan: U+2E8C ⺌
# U+E823: #  Unihan: U+2E97 ⺗
U+E824: zhòu,chǎo #  Unihan: U+396E 㥮
U+E825: zhòu #  Unihan: U+3918 㤘
# U+E826: #  Unihan: U+9FB5 龵
U+E827: gāng #  Unihan: U+39CF 㧏
U+E828: kuǎi #  Unihan: U+39DF 㧟
U+E829: sǒng #  Unihan: U+3A73 㩳
U+E82A: sǒng #  Unihan: U+39D0 㧐
# U+E82B: #  Unihan: U+9FB6 龶
# U+E82C: #  Unihan: U+9FB7 龷
U+E82D: gāng #  Unihan: U+3B4E 㭎
U+E82E: kuài #  Unihan: U+3C6E 㱮
U+E82F: tà #  Unihan: U+3CE0 㳠
# U+E830: #  Unihan: U+2EA7 ⺧
U+E831: pěng #  Unihan: U+215D7 𡗗
# U+E832: #  Unihan: U+9FB8 龸
# U+E833: #  Unihan: U+2EAA ⺪
U+E834: lōu #  Unihan: U+4056 䁖
U+E835: cǎn #  Unihan: U+415F 䅟
# U+E836: #  Unihan: U+2EAE ⺮
U+E837: chōu,chóu #  Unihan: U+4337 䌷
# U+E838: #  Unihan: U+2EB3 ⺳
# U+E839: #  Unihan: U+2EB6 ⺶
# U+E83A: #  Unihan: U+2EB7 ⺷
U+E83B: zāi #  Unihan: U+2298F 𢦏
U+E83C: bà,bēi #  Unihan: U+43B1 䎱
U+E83D: bà #  Unihan: U+43AC 䎬
# U+E83E: #  Unihan: U+2EBB ⺻
U+E83F: zhuān #  Unihan: U+43DD 䏝
U+E840: qióng #  Unihan: U+44D6 䓖
U+E841: kuì,huì #  Unihan: U+4661 䙡
U+E842: kuì #  Unihan: U+464C 䙌
# U+E843: #  Unihan: U+9FB9 龹
U+E844: xīn #  Unihan: U+4723 䜣
U+E845: yàn #  Unihan: U+4729 䜩
U+E846: jìng,qíng #  Unihan: U+477C 䝼
U+E847: qíng #  Unihan: U+478D 䞍
# U+E848: #  Unihan: U+2ECA ⻊
U+E849: shàn #  Unihan: U+4947 䥇
U+E84A: yé #  Unihan: U+497A 䥺
U+E84B: pō #  Unihan: U+497D 䥽
U+E84C: shàn #  Unihan: U+4982 䦂
U+E84D: zhuō #  Unihan: U+4983 䦃
U+E84E: shàn #  Unihan: U+4985 䦅
U+E84F: jué #  Unihan: U+4986 䦆
U+E850: wěn,chuài #  Unihan: U+499F 䦟
U+E851: zhèng #  Unihan: U+499B 䦛
U+E852: chuài #  Unihan: U+49B7 䦷
U+E853: zhèng #  Unihan: U+49B6 䦶
# U+E854: #  Unihan: U+9FBA 龺
U+E855: yíng #  Unihan: U+241FE 𤇾
U+E856: yú #  Unihan: U+4CA3 䲣
U+E857: yìn #  Unihan: U+4C9F 䲟
U+E858: chūn #  Unihan: U+4CA0 䲠
U+E859: qiū #  Unihan: U+4CA1 䲡
U+E85A: yú #  Unihan: U+4C77 䱷
U+E85B: téng #  Unihan: U+4CA2 䲢
U+E85C: shī #  Unihan: U+4D13 䴓
U+E85D: jiāo #  Unihan: U+4D14 䴔
U+E85E: liè #  Unihan: U+4D15 䴕
U+E85F: jīng #  Unihan: U+4D16 䴖
U+E860: jú #  Unihan: U+4D17 䴗
U+E861: tī #  Unihan: U+4D18 䴘
U+E862: pì #  Unihan: U+4D19 䴙
U+E863: yǎn #  Unihan: U+4DAE 䶮
# U+E864: #  Unihan: U+9FBB 龻
7 changes: 6 additions & 1 deletion Makefile
Original file line number Diff line number Diff line change
@@ -1,7 +1,12 @@
.PHONY: help
help:
@echo "merge_unihan merge Unihan data"
@echo "merge_unihan merge Unihan data"
@echo "pua generate PUA"

.PHONY: merge_unihan
merge_unihan:
python merge_unihan.py

.PHONY: pua
pua:
python tools/gen_gb_pua.py > GBK_PUA.txt
70 changes: 0 additions & 70 deletions PUA.txt

This file was deleted.

8 changes: 6 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@

数据格式:

* 格式:`{code point}: {pinyins} # {hanzi}` (示例:`U+4E2D: zhōng,zhòng # 中`
* 格式:`{code point}: {pinyins} # {hanzi} {comments}` (示例:`U+4E2D: zhōng,zhòng # 中`
*`#` 开头的行是注释


Expand All @@ -19,9 +19,10 @@
* `kXHC1983.txt`: [Unihan Database][unihan][kXHC1983](http://www.unicode.org/reports/tr38/#kXHC1983) 部分的拼音数据(来源于《现代汉语词典》的拼音数据)
* `kHanyuPinlu.txt`: [Unihan Database][unihan][kHanyuPinlu](http://www.unicode.org/reports/tr38/#kHanyuPinlu) 部分的拼音数据(来源于《現代漢語頻率詞典》的拼音数据)
* `kMandarin.txt`: [Unihan Database][unihan][kMandarin](http://www.unicode.org/reports/tr38/#kMandarin) 部分的拼音数据(普通话中最常用的一个读音。zh-CN 为主,如果 zh-CN 中没有则使用 zh-TW 中的拼音)
* `PUA.txt`: 位于 [Private Use Area](https://en.wikipedia.org/wiki/Private_Use_Areas) 有拼音的汉字
* `GBK_PUA.txt`: [Private Use Area](https://en.wikipedia.org/wiki/Private_Use_Areas) 中有拼音的汉字,参考 [GB 18030 - 维基百科,自由的百科全书](https://zh.wikipedia.org/wiki/GB_18030#PUA)
* `nonCJKUI.txt`: 不属于 [CJK Unified Ideograph](https://en.wikipedia.org/wiki/CJK_Unified_Ideographs) 但是却有拼音的字符
* `overwrite.txt`: 手工纠正的拼音数据(**上面的拼音数据都是通过程序生成的,修改的话只修改这个就可以了**
* `kMandarin_8105.txt`: [《通用规范汉字表》](https://zh.wikipedia.org/wiki/通用规范汉字表)里 8105 个汉字最常用的一个读音 (**可以修改**)
* `pinyin.txt`: 合并上述文件后的拼音数据
* `zdic.txt`: [汉典网](http://zdic.net) 的拼音数据

Expand All @@ -32,5 +33,8 @@
* [汉典 zdic.net](http://www.zdic.net/)
* [字海网,叶典网](http://zisea.com/)
* [Unicode、GB2312、GBK和GB18030中的汉字](http://www.fmddlmyy.cn/text24.html)
* [GB 18030 - 维基百科,自由的百科全书](https://zh.wikipedia.org/wiki/GB_18030#PUA)
* [通用规范汉字表 - 维基百科,自由的百科全书](https://zh.wikipedia.org/wiki/%E9%80%9A%E7%94%A8%E8%A7%84%E8%8C%83%E6%B1%89%E5%AD%97%E8%A1%A8)
* [China’s 通用规范汉字表 (Tōngyòng Guīfàn Hànzìbiǎo)](https://blogs.adobe.com/CCJKType/2014/03/china-8105.html)

[unihan]: http://www.unicode.org/charts/unihan.html
Loading

0 comments on commit 0fec9a7

Please sign in to comment.