-
-
Notifications
You must be signed in to change notification settings - Fork 123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Substring search #122
Comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hi,
Thank you for this excellent library. So far, I have successfully implemented symspell in categorization algorithm. It works well and fast. I am looking for suggestions on how to improve my current algorithm for substring search:
I am using a list of keywords as a dictionary. The words that are misspelled or truncated are changed to the keywords, which determine the category of a string. For example 'salar for April', 'Life Insuranse' are changed to 'salary for April' and 'Life Insurance', respectfully, since 'salary' and 'insurance' are in the keywords list. However, some of the strings are not only misspelled, but also missing spaces or there are too many mistakes. So, 'salaryfor April', 'LifeInsurance' and 'salaryyyy' are not recognized and, therefore, cannot be categorized by the current solution. Using the whole vocabulary as a dictionary is not feasible. Instead, I want to find a way to implement substring search, which would help me to find strings that contain certain substrings such as 'salar', 'insuran', 'accommod' and so on.
Can symspell be utilized for substring search? Or maybe you have other suggestions on how to effectively implement this idea and combine it with symspell?
Thank you in advance
The text was updated successfully, but these errors were encountered: