LanguageClassifer

RUBY 2.0.0p0 ONLY

Download LanguageClassifier, train it with text files that you know the language/category of, then have it classify text files that you don't know the language/category of.

LanguageClassifer uses a simplified version of Bayes' Law, an algorithm used in spam detection systems.

This script can be used to classify anything. It will take plain text files and train the system to recognize the words that are more likely to appear in each category. It can then use that data to classify text files from unknown categories.

Installation & Use

First you will need bundler, run gem install bundler.

Then clone this repository with git clone git://github.com/FluffyJack/LanguageClassifier.git then cd LanguageClassifier. Then run bundle install.

You must then train up the system. To do so, you will need several plain text files of different langugages (at least one for each language you might need to classify). Then for each file you have to train the system with, run the command (from within the LanguageClassifier directory) bin/classify train -f FILE_PATH -c LANGUAGE_OR_CATEGORY. Or you can use the sample ones provided by running bin/classify seed.

After you've completely trained the system as much as you can, you can then classify a file's language with the command bin/classify classify -f FILE_PATH. Running that command will print out the matching language or category.

Notes

Persistance: To ensure training is persisted after each command is run, I have set up the script to use madeleine.
Bayes' Law: This script doesn't apply Bayes' Law in full, but a simplified version that gets the job done.

Commands

Training: bin/classify train -f FILE_PATH -c CATEGORY
Classification: bin/classify classify -f FILE_PATH
Seed: bin/classify seed
Clear All Data: bin/classify clear

Contributions

If you actually find this interesting, go ahead and contribute. This started as a project for me to send as an example of my work to a potential employer though.

License

Ruby on Rails is released under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
bin		bin
lib		lib
samples		samples
spec		spec
tmp		tmp
.gitignore		.gitignore
.ruby-version		.ruby-version
.travis.yml		.travis.yml
Gemfile		Gemfile
Gemfile.lock		Gemfile.lock
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LanguageClassifer

Installation & Use

Notes

Commands

Contributions

License

About

Releases

Packages

Languages

jacklynrose/LanguageClassifier

Folders and files

Latest commit

History

Repository files navigation

LanguageClassifer

Installation & Use

Notes

Commands

Contributions

License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages