Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Geospatial filtering based on Lucene 6 #9

Open
dmunch opened this issue Dec 30, 2016 · 1 comment
Open

[Feature] Geospatial filtering based on Lucene 6 #9

dmunch opened this issue Dec 30, 2016 · 1 comment

Comments

@dmunch
Copy link

dmunch commented Dec 30, 2016

Motivation

Starting from version 6.0 Lucene features effecient geospatial indexing with k-d index trees.

https://www.elastic.co/blog/lucene-points-6.0
https://www.elastic.co/blog/apache-lucene-numeric-filters

While Cloudant already offers a geospatial indexing module its implementation has one major drawback: It is not possible to combine geospatial indexing with fulltext indexing, i.e. either you query by geometry or you query by another criteria. Querying for both criteria at the same time is not possible.

Furthermore, the geospatial module is not open-sourced (yet?), i.e. it can only be used in the hosted Cloudant offer but not on premisses. While the abandoned branch 62936-geo-with-query indicates that there as been initial work to open-source the module its status is not clear.

Goal

The goal is to implement a minimal viable implementation featuring only the following basic functions:

  • Index fields of a document as LatLonPoint
  • Query field by distance (center and radius)
  • Query field by box
  • Query field by polygon
  • Combine one of the beforementioned queries with a standard Lucene query string

Implementation

After initial work as described in issue #4 in porting clouseau to Java 8 and Lucene 6.3.0 we now have all the necessary conditions to implement efficient spatial search using Lucene's LatLonPoint
and the corresponding functions

  • newBoxQuery()
  • newDistanceQuery()
  • newPolygonQuery()
  • nearest()

Two stages have to be distinguished: indexing and querying.

Indexing

Currently clouseau does not allow indexing of complex types like objects or arrays. A necessary step would be to allow arrays to be sent from dreyfus to clouseau. With some luck and thanks to scalang this involves little or tiny work in ClouseauTypeDecoder.

Once clouseau receives an array a very first implementation could only handle the case of an array with only two elements and index it as LatLonPoint, the first element being the latitude, the later the longitude.

Note that an array might also be indexed as a multi-dimensional DoublePoint. Eventually
this might become the default case with LatLonPoint being a special case and being forced by
an entry in the options map or vice versa.

In a third step complex types might be enabled as well, which would allow for GeoJSON to be send from dreyfus to clouseau. This however would involve further work and is certainly out of scope for this minimal proof of concept.

Note that most certainly most of this functionality can be implemented solely by modifications to clouseau.

Querying

In order to be able to query fields indexed as LatLonPoint another field has to be introduced in the querying endpoint of dreyfus. For the sake of simplicity I would call this field qdsl and let its syntax be inspired from the ElasticSearch Geo Queries syntax.

This reason to call it qdsl is that it might support the more generic Elasticsearch Query DSL at some point in the future.

As a simple mashup this could look like the following JSON.

{
  "q": "index:my query",
  "sort": "foo",
  "limit": 3,
  "qdsl": {
    "geo_distance" : {
        "distance" : "200km",
        "fieldname" : {
            "lat" : 40,
            "lon" : -70
        }
    }
  }
} 

Dreyfus would recognize the qdsl field and pass it on to clouseau. Clouseau would do the heavy lifting, parse the JSON, construct the corresponding Lucene Query and combine it by means of a BooleanQuery MUST with the query obtained from q.

Changes to dreyfus would be minimal while additions to clouseau are still moderate.

Conclusion

The necessary codechanges involved in bringing Lucene geospatial filtering to clouseau are moderate and rather straight forward. Different roads can be taken on how to index point fields and query the geospatial index, one is outlined in text above. Care was token to chose an approach with an expected minimal implementation effort while still providing most of the additional value introduced with LatLonPoint in Lucene 6.

This proposition should currently be seen as a draft and is open for comments and suggestions.

@dmunch dmunch changed the title Geospatial filtering based on Lucene 6 [Feature] Geospatial filtering based on Lucene 6 Dec 30, 2016
@dmunch
Copy link
Author

dmunch commented Jan 12, 2017

For those following this feature, there is a first successful proof-of-concept implementation available in the following branches:

https://github.com/dmunch/clouseau/tree/dmunch-lucene6-latlonpoint
https://github.com/dmunch/dreyfus/tree/dmunch-lucene6-latlonpoint

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant