Skip to content

The smart-match module contains functions for calculating strings/sets similarity.

License

Notifications You must be signed in to change notification settings

Zizhao-Wang/smart-match

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Introduction

The smart-match module contains functions for calculating strings/sets similarity.

Concept

  1. similarity: A value in a range of [0, 1], which represents how similar the two strings are. The larger the value, the more similar the two strings are.

  2. dissimilarity: A value in a range of [0, 1], which represents how dissimilar the two strings are. The larger the value, the more dissimilar the two strings are. For a pair of strings, similarity = 1 - dissimilarity

  3. distance: How far the two strings are. Notice that not all the methods support distance method.

  4. score The larger the score, the more similar the two strings are. Notice not all the methods have score method.

We support three levels of string matching.

  1. char: Similarity computation based on characters in the strings.

  2. term: Similarity computation based on terms in the strings.

  3. gram: Similarity computation based on q-grams in the strings.

Methods

We support the following methods.

Abbreviation Full name similarity dissimilarity distance score
LE(Default) Levenshtein
ED EuclideanDistance
DL Damerau Levenshtein
BD Block Distance
cos Cosine Similarity
TC TanimotoCoefficient
dice Dice Similarity
simon SimonWhite
LCST LongestCommonSubstring
LCSQ LongestCommonSubSequence
OC OverlapCoefficient
GOC GeneralizedOverlapCoefficient
jac Jaccard
gjac GeneralizedJaccard
HD HammingDistance
jaro Jaro
JW JaroWinkler
NW NeedlemanWunch
SW SmithWaterman
SWG SmithWatermanGotoh
MK MongeElkan

Installation

pip install smart-match

Usage

import smart_match
print(smart_match.similarity('hello', 'hero'))
print(smart_match.dissimilarity('hello', 'hero'))
print(smart_match.distance('hello', 'hero'))

Output:

0.6
0.4
2

Check Wiki for more details.

License

smart-match is a free software. See the file LICENSE for the full text.

Authors

qrcode_for_wechat_official_account

About

The smart-match module contains functions for calculating strings/sets similarity.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%