Demonstrates how to use spark to implement a Bayes text classifier
This code example demonstrates how to use spark to
- create machine learning pipeline
- train a Naive Bayes text classifier
- evaluate and explore the learned model
- make predictions
The example uses the data set from http://nlp.stanford.edu/IR-book/html/htmledition/naive-bayes-text-classification-1.html
The spark example does not produce the same results as published in the paper
See the JUnit test file src/test/java/com/santacruzintegration/spark/NaiveBayesStanfordExampleTest.java The unit test simply runs the code. It does not have any asserts or other invarient tests. I.E. all tests will always pass even thou the results do not match the published results
$ mvn clean test