Skip to content
/ ngrams Public

Python script that uses the multiprocessing module to calculate ngrams in a txt file.

Notifications You must be signed in to change notification settings

econpy/ngrams

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 

Repository files navigation

The pyngram.py script uses the multiprocessing module to calculate ngrams in a txt file.

Example Use

Calculate all the 3-grams in the txt file HuckleberryFinn.txt:

python pyngram.py HuckleberryFinn.txt 3

which will create a tab seperated file called 3grams.tsv with the sorted counts of 3-grams in The Adventures of Huckleberry Finn by Mark Twain. Take a look at the output file:

head -15 3grams.tsv
3gram    count
out of the	60
was going to	47
said it was	35
by and by	35
a lot of	32
and then he	29
the old man	28
the middle of	27
kind of a	25
and it was	25
it was a	24
one of them	23
a couple of	23
there was a	22

Extending this example

Although the demo input txt file is small, the ngram counting script in this repo is capable of counting ngrams in much larger input files (which are currently first turned into one long list of lines/strings).

About

Python script that uses the multiprocessing module to calculate ngrams in a txt file.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages