Bag is a powerful yet user-friendly bag of words
(BoW
) implementation written in Go, leveraging a Naive Bayes classifier for efficient text analysis. It functions both as a library that can be seamlessly integrated into Go code and as an accessible command line tool. This dual functionality allows users to leverage bag of words capabilities directly from the command line, making it accessible from any programming language. The implementation supports a file format that facilitates using bag of words functionality as code, designed for ease of use and flexible integration in various environments.
The bag of words
(BoW
) model is a fundamental text representation technique in natural language processing
(NLP
). In this model, a text (such as a sentence or a document) is represented as an unordered collection of words, disregarding grammar and word order but keeping multiplicity. The key idea is to create a vocabulary of all the unique words in the text corpus and then represent each text by a vector of word frequencies or binary indicators. This vector indicates the presence or absence, or frequency, of each word from the vocabulary within the text. The BoW
model is widely used for text classification tasks, including sentiment analysis
, due to its simplicity and effectiveness in capturing word occurrences.
func ExampleNew() {
var cfg Config
// Initialize with default values
exampleBag = New(cfg)
}
func ExampleNewFromTrainingSet() {
var t TrainingSet
t.Samples = SamplesByLabel{
"positive": {
"I love this product, it is amazing!",
"I am very happy with this.",
"Very good",
},
"negative": {
"This is the worst thing ever.",
"I hate this so much.",
"Not good",
},
}
// Initialize with default values
exampleBag = NewFromTrainingSet(t)
}
func ExampleBag_Train() {
exampleBag.Train("I love this product, it is amazing!", "positive")
exampleBag.Train("This is the worst thing ever.", "negative")
exampleBag.Train("I am very happy with this.", "positive")
exampleBag.Train("I hate this so much.", "negative")
exampleBag.Train("Not good", "negative")
exampleBag.Train("Very good", "positive")
}
func ExampleBag_GetResults() {
exampleResults = exampleBag.GetResults("I am very happy with this product.")
fmt.Println("Collection of results", exampleResults)
}
func ExampleResults_GetHighestProbability() {
match := exampleResults.GetHighestProbability()
fmt.Println("Highest probability", match)
}
config:
ngram-size: 1
samples:
yes:
- "yes"
- "Yeah"
- "Yep"
no:
- "No"
- "Nope"
- "Nah"
# Note: This training set is short for the sake of README filesize,
# please look in the examples directory for more complete examples
- Working implementation as Go library
- Training sets
- Support Character NGrams
- Text normalization added to inbound text processing
- CLI utility
- Generated model as MMAP file
Thanks goes to these wonderful people (emoji key):
Josh Montoya 💻 📖 |
Matt Stay 🎨 |
Chewxy |
Jack Muir |
This project follows the all-contributors specification. Contributions of any kind welcome!