The “Cologne phonetics (Kölner Phonetik)” algorithm encodes words in a way that enables to search for similarly sounding words. It’s related to “Soundex” and “Metaphone”, but better suited for the German language.
This implementations closely follows the algorithm as described on its Wikipedia page. Support for umlauts (Ä, Ö, Ü) and ß has been added as suggested there.
Note that other accented characters are not handled. If your data may contain such characters you need to preprocess it (for example by using I18n.transliterate
).
I consider this gem to be stable and (more or less) finished.
Example usage:
ColognePhonetics.encode('Wikipedia') # => "3412"
# Only basic characters and äöüß are handled, everything else gets ignored:
ColognePhonetics.encode('Åè1%-') # => ""
# If a string contains words separated by spaces, each word is encoded separately:
ColognePhonetics.encode('Heinz Classen') # => "068 4586"
# Use `encode_word` if you want to ignore spaces (note that this usually gives
# different results that using `encode` and removing spaces afterwards; see
# Wikipedia article for details):
ColognePhonetics.encode_word('Heinz Classen') # => "068586"
You can set ColognePhonetics.debug = true
to get warnings printed to $stderr
about characters that can not be encoded:
ColognePhonetics.debug = true
ColognePhonetics.encode('Olé')
# Cologne Phonetics: No rule for 'é' (prev: 'l', next: '')
# => "05"
Add this line to your application's Gemfile:
gem 'cologne_phonetics'
And then execute:
$ bundle
Or install it yourself as:
$ gem install cologne_phonetics
After checking out the repo, run bin/setup
to install dependencies. You can also run bin/console
for an interactive prompt that will allow you to experiment.
Bug reports and pull requests are welcome on GitHub at https://github.com/noniq/cologne_phonetics. Please make sure to include tests, and check that running bin/rubocop
does not show any warnings.
The gem is available as open source under the terms of the MIT License.