Skip to content

Node/Typescript app which demonstrates some text parsing

Notifications You must be signed in to change notification settings

sauntimo/name-counter

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Name-counter

Node/Typescript app which demonstrates text search.

There are two input files included in the repo; one with a list of approximately 5,400 first names and one with the entire text of Oliver Twist. The app counts the occurences of each name in the text of the story and writes a file with results sorted from most to least frequent.

The text search is handled by the streamsearch npm package which implements the Boyer–Moore–Horspool algorithm. It takes around 20 seconds to run.

Note that the matching is case sensitive so a search for "Oliver" will return 830 results and a search for "OLIVER" will return 51. I considered lower casing the whole text to return both combined but that resulted in a large number of false potisitves for 2 letter names.

Installation

  • clone the repo

    $ git clone [email protected]:sauntimo/name-counter.git name-counter
    
  • initialise

    $ cd name-counter && npm i -g 
    

Usage

$ namecounter

image

See Also

Similar functionality is also provided at via an API and a simple frontend at name-counter-api.herokuapp.com, with code at github.com/sauntimo/name-counter-api.

About

Node/Typescript app which demonstrates some text parsing

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published