Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow searches on a library limited by words. #9

Open
bmschmidt opened this issue Jun 19, 2014 · 1 comment
Open

Allow searches on a library limited by words. #9

bmschmidt opened this issue Jun 19, 2014 · 1 comment

Comments

@bmschmidt
Copy link
Member

The code for this exists for this in the API, but it's so slow as to be useless for the time being. (At least under MySQL 5.5, which is what I'm using; it's possible it runs as intended on MySQL 5.6.5 or greater). I'm willing to take another stab at it if it seems like a valuable feature.

There are also some questions involving how the call should be made that aren't currently set, if anyone wants to comment.

The implementation is a little tricky, since some of the terms only make sense as an OR query: clearly OR(["cat","dog"]) has a wordcount of n(cat) + n(dog), but the meaning of AND(["cat","dog"]) is a little weird: there are no words that are both "cat" and "dog" at once.

The API method for this is to use an additional possible key, "hasword", The currently laid out scheme allows you to insert an additional field into the search contraints: so to search for counts of "cat" in books that have "dog" you would call {"word":["cat"],"hasword":["dog"]} ; to search for either "cat" or "dog" in books that have both you would search {"word":["cat","dog"],"hasword":["dog","cat"]}, and so forth.

That's out of keeping with the current API behavior, which defaults to searching every command as an "and": so should "hasword":["dog","cat"] mean it has either word, and "hasword":{"$and":["dog","cat"]} mean it has both? That would be more cumbersome for most searches, but align more easily with the rest of the API syntax.

@bmschmidt
Copy link
Member Author

Here's another problem: how should you specify a NOT search on this sort of data?

{"word":evolution","hasword":{"$not":["natural selection","Darwin",""]} is a potentially quite useful search term, if you want to see where 'evolution' is being used outside of a Darwininian context. So is that the right way to search for it? Will this support arbitrary large queries? ("Evolution in a non-Darwinian context, or in a social science context, say?) I think it should.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant