-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] Geospatial filtering based on Lucene 6 #9
Comments
dmunch
changed the title
Geospatial filtering based on Lucene 6
[Feature] Geospatial filtering based on Lucene 6
Dec 30, 2016
For those following this feature, there is a first successful proof-of-concept implementation available in the following branches: https://github.com/dmunch/clouseau/tree/dmunch-lucene6-latlonpoint |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Motivation
Starting from version 6.0 Lucene features effecient geospatial indexing with k-d index trees.
https://www.elastic.co/blog/lucene-points-6.0
https://www.elastic.co/blog/apache-lucene-numeric-filters
While Cloudant already offers a geospatial indexing module its implementation has one major drawback: It is not possible to combine geospatial indexing with fulltext indexing, i.e. either you query by geometry or you query by another criteria. Querying for both criteria at the same time is not possible.
Furthermore, the geospatial module is not open-sourced (yet?), i.e. it can only be used in the hosted Cloudant offer but not on premisses. While the abandoned branch 62936-geo-with-query indicates that there as been initial work to open-source the module its status is not clear.
Goal
The goal is to implement a minimal viable implementation featuring only the following basic functions:
LatLonPoint
Implementation
After initial work as described in issue #4 in porting clouseau to Java 8 and Lucene 6.3.0 we now have all the necessary conditions to implement efficient spatial search using Lucene's
LatLonPoint
and the corresponding functions
newBoxQuery()
newDistanceQuery()
newPolygonQuery()
nearest()
Two stages have to be distinguished: indexing and querying.
Indexing
Currently clouseau does not allow indexing of complex types like objects or arrays. A necessary step would be to allow arrays to be sent from dreyfus to clouseau. With some luck and thanks to
scalang
this involves little or tiny work inClouseauTypeDecoder
.Once clouseau receives an array a very first implementation could only handle the case of an array with only two elements and index it as
LatLonPoint
, the first element being the latitude, the later the longitude.Note that an array might also be indexed as a multi-dimensional
DoublePoint
. Eventuallythis might become the default case with
LatLonPoint
being a special case and being forced byan entry in the options map or vice versa.
In a third step complex types might be enabled as well, which would allow for GeoJSON to be send from dreyfus to clouseau. This however would involve further work and is certainly out of scope for this minimal proof of concept.
Note that most certainly most of this functionality can be implemented solely by modifications to clouseau.
Querying
In order to be able to query fields indexed as
LatLonPoint
another field has to be introduced in the querying endpoint of dreyfus. For the sake of simplicity I would call this fieldqdsl
and let its syntax be inspired from the ElasticSearch Geo Queries syntax.This reason to call it
qdsl
is that it might support the more generic Elasticsearch Query DSL at some point in the future.As a simple mashup this could look like the following JSON.
Dreyfus would recognize the
qdsl
field and pass it on to clouseau. Clouseau would do the heavy lifting, parse the JSON, construct the corresponding LuceneQuery
and combine it by means of aBooleanQuery
MUST
with the query obtained fromq
.Changes to dreyfus would be minimal while additions to clouseau are still moderate.
Conclusion
The necessary codechanges involved in bringing Lucene geospatial filtering to clouseau are moderate and rather straight forward. Different roads can be taken on how to index point fields and query the geospatial index, one is outlined in text above. Care was token to chose an approach with an expected minimal implementation effort while still providing most of the additional value introduced with
LatLonPoint
in Lucene 6.This proposition should currently be seen as a draft and is open for comments and suggestions.
The text was updated successfully, but these errors were encountered: