Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revisit unfielded query mapping to author search #174

Open
aaccomazzi opened this issue Jul 1, 2020 · 4 comments
Open

Revisit unfielded query mapping to author search #174

aaccomazzi opened this issue Jul 1, 2020 · 4 comments

Comments

@aaccomazzi
Copy link
Member

Currently a query of "Kepler 1362 b" gets rewritten by our modified edismax into something that includes author:b, *. This is clearly undesirable, so we should reconsider the rewriting rules to avoid this chain of events

@JCRPaquin
Copy link
Contributor

Do we want strings in quotes to not be expanded at all? That is, we'd broadcast them as-given across all fields that an unfielded search would match. Even then I think the author name tokenizer would likely cause the same issue?

@aaccomazzi
Copy link
Member Author

I'm not sure I understand the question, so let me restate it as such and see if we agree. An quoted unfielded search ("A B C") IMHO should generate a edismax query across fields where the quotes are preserved for each one (something like author:"A B C" | title:"A B C" | abstract:"A B C" ...). So if I understand the question correctly, I'm saying that they should be broadcast as phrases across fields (but I'm still not sure what you mean by "expanded" above).

The problem of this query really lies in the fact that the above query generates, among others, a search for author:"Kepler 132 b" which then is further broken down into something that includes author:"b, *". The first step we should take is to review the expansion of the author queries at least for names that are not preformatted as "Last, First"

@aaccomazzi
Copy link
Member Author

Note: the solution to this may simply be a fix to the author parsing strategy (see #194), so we should work on that first

@JCRPaquin
Copy link
Contributor

The working solution for #194 is to ensure that the resulting query doesn't contain fewer than the number of terms it began with: "Anna Kelbert" contains 2 terms and should never map to "Kelbert, *" which only contains 1, but it could map to both "Kelbert, Anna" and "Anna Kelbert, *".

For your example "Kepler 1362 b" would contain 3 terms and could not be expanded to include a 1 term wildcard query like "b, *".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants