N-gram

A project of N-gram model comparing FMM/BMM Document:CocoNLP

Usage

Firstly, you should download the data '199801.txt' from Internet and put it in the project dir. Use as followed:

python statistic.py

And you will get result like this:

successfully to split corpus by train = 0.900000 test = 0.100000
the total number of words is:53260
The total number of bigram is : 403121.
successfully witten-Bell smoothing! smooth_value:1.3372788850370981e-05
the total number of punction is:47
召回率为:0.962036929819092
准确率为:0.9401303935308096
F值为:0.950957517059212

Result

指标	FMM	BMM	Unigram	Bigram
准确率	91.54%	92.13%	93.20%	94.01%
召回率	94.66%	95.07%	96.14%	96.20%
F1值	93.07%	93.58%	94.64%	95.10%

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.gitignore		.gitignore
199801.txt		199801.txt
Bigram.py		Bigram.py
README.md		README.md
oneGram.py		oneGram.py
process.py		process.py
statstic.py		statstic.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

N-gram

Usage

Result

About

Releases

Packages

Languages

Aurelius84/N-gram

Folders and files

Latest commit

History

Repository files navigation

N-gram

Usage

Result

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages