-
Notifications
You must be signed in to change notification settings - Fork 297
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Suggestion: Split dictionary and core #62
Comments
I have created an abstraction layer for several libraries doing word splitting for Python. All of the libs receive kind of word list. Some ship the wordlist with them, some (like this ones) don't. I think that having an own word list for each library is redundancy and a violation of DRY principle (it may be convenient for the devs not to depend on compatibility though, but it is still harmful). I think we need a common spec for the dictionary files, because quite some software use them. So these wordlists could be installed and updated separately. So I won't have to bother "this lib requires a wordlist", "this lib bundles an own one". I also wonder if there are any benchmark datasets for such a task. I mean not only speed benchmark, but one of quality of splitting. And maybe even classifying the errors the implementations cannot avoid. |
I try SymSpell and it looks great. But one thinks I notice almost intermediately it bring dictionary file in my project (even if I do not use it). I understand it helps with a quick start but I strongly believe in real application most of the users make own. But even if not. I think it will be better to have NuGet package split to
SymSpell.Core
andSymSpell.Dic.En
for example. For keeping compatibilitySymSpell
could be composed of these two packages (something likeMicrosoft.AspNetCore.App
.The text was updated successfully, but these errors were encountered: