crystal-stemmer - moved to https://github.com/johnjansen/text
A port of a Ruby Stemmify to crystal
This is a crystal shard for reducing words to their roots. For example, all the following words to are stemmed to "observ", which is not a real word in this case:
observance
observances
observancy
observant
observants
observation
observe
observed
observer
observers
observing
observingly
The algorithm used here is based on the Porter stemmer. You can read more about Martin Porter's stemmer at
http://tartarus.org/~martin/PorterStemmer/
Martin Porter explains:
The Porter stemming algorithm (or ‘Porter stemmer’) is a process for removing
the commoner morphological and inflexional endings from words in English. Its
main use is as part of a term normalisation process that is usually done when
setting up Information Retrieval systems.
Add this to your application's shard.yml
:
dependencies:
stemmer:
github: johnjansen/crystal-stemmer
Let's say you are building some sort of search tool. You want searches for "observations" and "observer" to all bring up the same items. When you are building you index, you can map all the words to their roots using the stem method.
Here's an example usage:
require 'stemmer'
print("observations".stem) # ==> "observ"
This test is based on the sample input and output text from Martin Porter website. It includes 23532 test words and their expected stem results. To run the test, just type
crystal spec
- Fork it ( https://github.com/johnjansen/stemmer/fork )
- Create your feature branch (git checkout -b my-new-feature)
- Commit your changes (git commit -am 'Add some feature')
- Push to the branch (git push origin my-new-feature)
- Create a new Pull Request
- johnjansen John Jansen - creator, maintainer