BigramBased-PhraseGeneration

A bigram based approach to generate short phrases. Intention is to be used as a Chinese name generator.

Guide: run generic_singletoken_bigram.py, check commandline output for now.

Intuition

From raw data, obtain unigrams and bigrams.
Generating:
- using unigram frequencies, sample a unigram as the "leading unigram"
- using the leading unigram, collect a pool of bigrams that start with it.
- from the selected bigrams, with normalized probabilities, sample a final bigram.

(s denotes start of sentence, /s the end of sentence; can be interpreted as "" in the final results; haven't added filtering)

from 宋词三百首

['舟横', '暝尘', '溪深', '影横', '中年', '珠坠', '照', '处问', '歌长', '草']

['际烟', '山千', '暮雨', '片片', '数', '带脂', '练秋', '凄情', '事古', '是豪']

from 楚辞:

['之卑', '嗟尔', '来者', '君之', '迁', '怐愗', '以须', '殪兮', '余之', '佩之']

['哀众', '奉帝', '怀逝', '杂菜', '兮谗', '陆', '兮为', '酒不', '後下', '远之']

from 李世民诗选:

['靜戈', '聞弦', '驚禽', '鏡上', '無定', '佩蘭', '淨峰', '景洛', '疏派', '雲飄']

['撫躬', '結伴', '嗣雲', '瑣除', '樹', '姓玉', '銜宵', '條風', '豈必', '笥']

from 曹操诗选:

['沉吟', '贯日', '金阶', '彼洛', '人俱', '饮酒', '弗忧', '何困', '之', '高冈']

['东到', '不可', '千秋', '颈长', '尊', '卒来', '离', '以咏', '乔乃', '不相']

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md
generic_singletoken_bigram.py		generic_singletoken_bigram.py
宋词三百首.json		宋词三百首.json