-
Notifications
You must be signed in to change notification settings - Fork 308
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature request] Fuzzy matching #279
Comments
It's definitely a consideration for the future. I tried implementing it in the past, however I quickly encountered issues with balancing the fuzzy search results and other sort/filter conditions already in place and ultimately abandoned it for working on other features. While the basic fuzzy search functionality is quick to implement using one of the many libraries that exist for it, it becomes really hard to balance for all aspects that TAC has to consider. Just to name a few issues:
This doesn't mean it can't be done, just that it's not straightforward and would likely require quite a bit of work to fit in nicely with the rest. Technically, each of the sources above could have its filter condition (at least partly) replaced by a fuzzy matcher, but that would still leave the displaying and sorting issues, which definitely need some custom logic. See the image below for the bookshelf example: The matches are mostly sensible, but not in a useful order based on the match, and due to the way aliases are implemented, TAC wrongly assumes most of the results come from one (but no matching mapping for display is found) Another option would be that the fuzzy matcher takes over completely and handles sorting / displaying on its own, however that could lead to issues with the matched tags not necessarily being common enough to be understood by models. Such a mode could instead work well if normal word lists are used instead of tags, which could be a valid use case too depending on the model. For the near future, I instead opted for a different method to require less typing, which is currently in "beta" on the https://github.com/DominikDoom/a1111-sd-webui-tagcomplete/tree/feature-sort-by-frequent-use branch (hopefully merged soon, after I find the time to iron out the last few kinks). While the filtering is still exact there, it favors frequently used tags over others, so a tag you like to use in your prompts will rise to the top automatically over time. This would at least handle the "I need to type out most of it to make it on top of the list" side. Not abbreviations or typos though. But once that's released, I'm definitely up for giving this a proper try. |
Looks like it's more complicated than I thought. For the first stage of the implementation, I think you can just use the current sorting order (sort by post count), and require the first letter of the input to match one of the initial letters of the tags. Btw I'm not a native speaker and don't quite understand what this sentence means: "and due to the way aliases are implemented, TAC wrongly assumes most of the results come from one (but no matching mapping for display is found)", so I could be missing something important. Sort by frequency sounds like a good idea. I guess it could solve a large part of my issue here, so I'm looking forward to it. Thank you. |
That would be a good workaround, yeah (and also more similar to how it currently behaves). But it doesn't work for every case out of the box, e.g. tags containing parentheses or special syntax for Loras, wildcards etc. that starts with a different letter. That wouldn't be too hard to check, though.
This is just a technical detail, nothing important. In different terms, it's that the current method assumes that an alias or translation was typed if whatever you entered couldn't be found in the result text. Without fuzzy matching, this would always be true, so the display logic could rely on that and add the arrow etc. With fuzzy matching, what you typed can still match a tag despite the letters being incomplete or even out of order, so this assumption doesn't work anymore. It's not hard to fix and was just meant as an explanation for why the results in the image look a bit broken. |
If I may weigh in, the limited addition of regex for models in #275 could probably be extended with relative ease to tags, along with the handy ^/$ positional anchors. |
That would be a possibility, but has some of the same implementation challenges as the "magic" approach with sorting and display logic. So from a work standpoint, it wouldn't be that much easier, while also having the downside that users need to know their way around regex syntax. Some of the libraries I toyed around with actually already support an autocomplete-like mode that takes word position into account (or can be configured in such a way), so it could be the best of both worlds if done properly. Probably also faster than any custom way I could come up with. The pure regex way would be a quick and dirty solution though if some challenges with the library approach pop up that I can't solve in a timely manner. |
Hmm, I guess that Is that what you meant by sort and display logic? |
Yeah that's mostly it, the addResultsToList() function does some lookup/logic on its own to determine the relation between the filtered tag subset and what was typed. Especially for translations as these can be looked up and displayed even if nothing from the typed tag word matches it. This could probably be integrated with the result object in some way, e.g. that the filter conditions in the initial autocomplete have side effects to mark which one matched, but it would likely be easier to extend the logic in addResultsToList() a bit to handle a fuzzy match and not default to the alias processing in that case. |
Probably right it's the easier approach: I went with saving the pattern under result, and translations were definitely a hassle, though I think gpt refactored that part and added pattern extraction pretty well (didn't implement translations in my test though). |
If I end up using a library for it, most have a highlighter of their own to use. If not, it won't be hard to extend the current solution to work for all matches, even just highlighting the first time each typed letter shows up might be enough. |
Is it possible to change the matching method into fuzzy matching, so you can type "detco" to match "detached collar" or "bosh" to match "bookshelf"?
I don't like to move my right hand between jkl and arrow keys often when typing, that means for some tags I need to type out most of it to make it on top of the list. Ex. "detached_co" for "detached_collar"
Many programs are using this matching method, like Visual Studio Code, PowerToys Run, Obsidian, etc. I think adding fuzzy matching would make the user experience much better.
The text was updated successfully, but these errors were encountered: