-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Repeated results from the same site. #139
Comments
I just discovered the "Copycats removal" Optic which somehow helps here, but also removes the original result of the latest version of the documentation and shows a completely different set of results instead. |
There currently is some soft deduplication based on the url, title and body. Essentially if a result has a title with a very high similarity to a result title that's higher in the list, then the lower result get's deprioritized a bit. I think if we had more results in the index that matched your search terms, then it would look a bit better as I am pretty sure the older versions would be deprioritized based on their title similarity and body similarity with the top result. It's a very interesting problem to detect which documentation that points to the latest version. Right now, the ranking would probably rely fully on the harmonic centrality values to try and figure it out, but we might need to write some custom logic here. I don't exactly know what the best way to implement it would be yet. Can you elaborate a bit on the optic problem? The "copycats removal" optic doesn't seem to remove the results from kcl-lang.io for me. |
When searching for certain keywords, that can be found in the documentation of a versioned library/application/framework, I often see the same result over and over and over again, where the only difference is the available version in the documentation, e.g. see the screenshot of a search for
"kcl" "dict" "schema"
below:The hard part might be detecting those as "same results, but only the most recent version/
latest
is releveant" and I lack the knowledge to suggest what to do about this implementation wise.From a UX PoV it would probably make sense to hide those duplicates behind a "Show more similar results" fold-out or so.
Might be related to #51
The text was updated successfully, but these errors were encountered: