-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
prefix word phrase matching #62
Comments
Hi Amit; glad you have found this useful. I think your requirement could be implemented easily by modifying ConcatenateFilter to have a new mode perhaps called "prefixShingles" in which on every word added, it outputs the token. Without looking closer, this might better be better as a separate Filter; not sure. If you implement this requirement, please contribute this back -- it would be a very welcome addition. |
Hi David, I would be happy to contribute and your suggestion looks doable. Let me try it out. Best, |
Hi David, While working on above, I did try a different approach for my needs. I added an EdgeNGramFilterFactory at the end of Analyzer chain like this -
Now with this, I am able to not only do word completions, but also partial prefix matches. So this is working perfectly as I wanted. However, there are 2 things where I am facing issues - 1). In the output, the data is returned in correct sort order. For ex - 2). If in a sentence there are multiple tags, I want only top 10 suggestions for each tag. But there is no option for that currently. I would really appreciate any help you could provide here. BR, |
Hi Amit, RE (1) sort: I'm unclear what you want to achieve. First of all, sorting of what? The tags, or the matching Solr documents? The matching solr documents are not sortable at present, though such a feature request is welcome. By "not sortable" I mean it comes back in whatever order Lucene/Solr has the documents, which might appear to have an order but I wouldn't depend on it. However I don't think it's that very useful since in general you're going to want all the data, and so you can sort how you want client-side. I think the tags are sorted by start offset then end offset; does that sound fine? RE (2) limit per sentence: I guess that could be useful; though what would constitute the "top" suggestions? Any way feel free to file an issue if you think you might get to it. |
Hi David, Thanks for the reply. Yes I meant sorting of matching solr documents. In my case, as I mentioned earlier, since I have added EdgeNGramFilterFactory at the end, it has allowed me to do tag searches based of partial left substring match (sort of autocomplete). This results in getting 100s of matching documents for 1 tag. So what I was looking for is to sort them and limit only few. For Ex - lets say I have indexed my tags in order "Tag 100", Tag 99", Tag 98"...."Tag 1", "Tag" and if I type "tag", it returns the result in the same order, but I would have liked the reverse and only 10 i.e. Hope this makes sense. BR, |
I understand how the document order is what it is, but I'm less clear on how it is you're getting 100s of matching documents per tag. Are you not using |
If I remove EdgeNGramFilterFactory , then No_SUB and LONGEST_DOMINANT_RIGHT works as expected, but not while I have this filter. It returns all matches. |
Or let me take my words back. It works here too. For ex - |
Ok. Perhaps what's needed is some sort of way to limit the number of tags at a specific location -- not for the sentence but for a given word. And then you'd need to provide some way to articulate to the tagger a sorting function of the tags. You might for example add a char length number to each document and want the shortest tag. |
Exactly. Would it be possible? |
Yeah; doesn't sound too hard, and it sounds generally useful. |
Hi David,
First of all I want to thank you for the contribution. Solr text tagger has really helped in building the solution we wanted.
However, as an extension, I am looking to use ShingleFilterFactory instead of ConcatenateFilter. And the reason is that I also want to enable partial matches as suggestions.
But I want to enable suggestions which only match from left edge and not in the middle.
For Ex - if the text is "Quick brown fox jumped"
Then the expected tokens should be -
"Quick"
"Quick brown"
"Quick brown fox"
"Quick brown fox jumped"
But using ShingleFilter produces extra token such as -
"brown fox"
"fox jumped"
etc
I would be really grateful if you can guide me on how to achieve it.
Best,
Amit
The text was updated successfully, but these errors were encountered: