diff --git a/README.MD b/README.MD index ec61c99..b00c919 100644 --- a/README.MD +++ b/README.MD @@ -2,21 +2,20 @@ [![Github Workflow Status](https://img.shields.io/github/actions/workflow/status/sheinbergon/dremio-udf-gis/release-ci.yml?branch=23.1.x&logo=githubactions&style=for-the-badge)](https://github.com/sheinbergon/dremio-udf-gis/actions?query=workflow%3Arelease-actions) [![GitHub release (latest by date)](https://img.shields.io/github/v/release/sheinbergon/dremio-udf-gis?logo=github&color=%2340E0D0&style=for-the-badge)](https://github.com/sheinbergon/dremio-udf-gis/releases/latest) [![Maven Central](https://img.shields.io/maven-central/v/org.sheinbergon/dremio-udf-gis?logo=apachemaven&color=Crimson&style=for-the-badge)](https://search.maven.org/search?q=g:org.sheinbergon%20a:dremio-udf-gis*) -[![Snyk Vulnerabilities for GitHub Repo](https://img.shields.io/snyk/vulnerabilities/github/sheinbergon/dremio-udf-gis?logo=snyk&color=432f95&style=for-the-badge)](https://app.snyk.io/org/sheinbergon/project/94183993-505b-439c-9078-6276fa4c1626) [![Coveralls](https://img.shields.io/coveralls/github/sheinbergon/dremio-udf-gis?logo=coveralls&style=for-the-badge)](https://coveralls.io/github/sheinbergon/dremio-udf-gis) [![Liberapay](https://img.shields.io/liberapay/patrons/sheinbergon?logo=liberapay&style=for-the-badge)](https://liberapay.com/sheinbergon/donate) # Dremio Geo-Spatial Extensions ### What you get + - Widespread OGC implementation for SQL (adheres to PostGIS standards) - - Supported input formats: `WKT`, `WKB (HEX or BINARY)` - - Supported output formats: `WKT`, `WKB`, `GeoJSON` -- Easily installable Maven-Central/Github artifacts shaded jar artifact -- Dremio CE version compatibility (new versions will be released with each community edition) + - Supported input formats: `WKT`, `WKB (HEX or BINARY)` + - Supported output formats: `WKT`, `WKB`, `GeoJSON` +- Easily installable Maven-Central/Github artifacts shaded jar artifact +- Dremio CE version compatibility (new versions will be released with each community edition) - Up-2-date Proj4J & JTS geometry based implementation - ### Sponsorship Enjoying my work? A show of support would be much obliged :grin: @@ -28,6 +27,7 @@ Enjoying my work? A show of support would be much obliged :grin: ### Installation + - Take the shaded jar for the desired version and place inside your Dremio installation (`$DREMIO_HOME/jars/3rdparty`) - Restart your Dremio server(s) - Rejoice! (and see the [WIKI](https://github.com/sheinbergon/dremio-udf-gis/wiki) for detailed usage instructions) @@ -36,29 +36,30 @@ Enjoying my work? A show of support would be much obliged :grin: | Library Version | Dremio Version | Status | |-----------------|----------------|------------| -| 0.2.x | 20.1.x | Legacy | -| 0.3.x | 21.1.x | Legacy | -| 0.4.x | 21.2.x | Legacy | -| 0.5.x | 22.0.x | Maintained | -| 0.6.x | 22.1.x | Maintained | -| 0.7.x | 23.0.x | Maintained | -| 0.8.x | 23.1.x | Maintained | -| 0.9.x | 24.0.x | Maintained | - +| 0.2.x | 20.1.0 | Legacy | +| 0.3.x | 21.1.1 | Legacy | +| 0.4.x | 21.2.0 | Legacy | +| 0.5.x | 22.0.0 | Legacy | +| 0.6.x | 22.1.1 | Legacy | +| 0.7.x | 23.0.1 | Legacy | +| 0.8.x | 23.1.0 | Legacy | +| 0.9.x | 24.0.0 | Maintained | ### Usage Notes + As opposed to PostGIS, Dremio is only a query engine based on existing/projected data sources/lakes. That means that `Geometry` is not a natively supported data type, and you can only access it if it's being properly projected from the data sources (For example, PostGIS Geometry is read as an `EWKB` HEX encoded string). In order to successfully use the provided GIS functions, you must first make sure the geometry is in `WKB (BINARY)` format. -If it's not, you need to decode it: +If it's not, you need to decode it: + - if the input is in `WKT` format, use `ST_GeomFromText` - if the input is a HEX encoded`WKB`, use Dremio's `FROM_HEX` This library uses Dremios' Arrow buffers (`ArrowBuf`) to maintain geometry data in binary (`WKB`) format (for performance and efficiency) when interchanging it between GIS functions, which is of course undecipherable for the naked eye. When running queries from the UI, -`WKB` output will always be base64 encoded. +`WKB` output will always be base64 encoded. In order to resolve Data back to human-readable format (`WKT`), use `ST_AsText`/`ST_AsGeoJson` @@ -73,11 +74,14 @@ SELECT ST_AsText( ``` ### Roadmap + - Frequent version/dependency updates - Add more OGC/PostGIS matching functionality - Add Geography type support ### Noteworthy Mentions -Work in this repository was originally based on the following sources: + +Work in this repository was originally based on the following sources: + - [Apache Drill GIS Functionality](https://github.com/apache/drill/tree/master/contrib/udfs/src/main/java/org/apache/drill/exec/udfs/gis) - [Christy Haragan's initial port](https://github.com/christyharagan/dremio-gis) diff --git a/pom.xml b/pom.xml index f058f89..73c04cf 100644 --- a/pom.xml +++ b/pom.xml @@ -21,7 +21,7 @@ 0.7.0 9.0.0-20221123064031-c39b8a6253-dremio - 0.9.2-SNAPSHOT + 0.9.4-SNAPSHOT dremio-udf-gis GIS UDF extensions for Dremio https://github.com/sheinbergon/dremio-udf-gis diff --git a/src/main/java/org/sheinbergon/dremio/udf/gis/STGeomFromGeoJson.java b/src/main/java/org/sheinbergon/dremio/udf/gis/STGeomFromGeoJson.java new file mode 100644 index 0000000..d938ee4 --- /dev/null +++ b/src/main/java/org/sheinbergon/dremio/udf/gis/STGeomFromGeoJson.java @@ -0,0 +1,55 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + *

+ * http://www.apache.org/licenses/LICENSE-2.0 + *

+ * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.sheinbergon.dremio.udf.gis; + +import com.dremio.exec.expr.SimpleFunction; +import com.dremio.exec.expr.annotations.FunctionTemplate; +import com.dremio.exec.expr.annotations.Output; +import com.dremio.exec.expr.annotations.Param; + +import javax.inject.Inject; + +@FunctionTemplate( + name = "ST_GeomFromGeoJSON", + scope = FunctionTemplate.FunctionScope.SIMPLE, + nulls = FunctionTemplate.NullHandling.INTERNAL) +public class STGeomFromGeoJson implements SimpleFunction { + + @Param + org.apache.arrow.vector.holders.NullableVarCharHolder jsonInput; + + @Output + org.apache.arrow.vector.holders.NullableVarBinaryHolder binaryOutput; + + @Inject + org.apache.arrow.memory.ArrowBuf buffer; + + public void setup() { + } + + public void eval() { + if (org.sheinbergon.dremio.udf.gis.util.GeometryHelpers.isHolderSet(jsonInput)) { + org.locationtech.jts.geom.Geometry geom = org.sheinbergon.dremio.udf.gis.util.GeometryHelpers.toGeometryFromGeoJson(jsonInput); + byte[] bytes = org.sheinbergon.dremio.udf.gis.util.GeometryHelpers.toEWKB(geom); + buffer = buffer.reallocIfNeeded(bytes.length); + org.sheinbergon.dremio.udf.gis.util.GeometryHelpers.populate(bytes, buffer, binaryOutput); + } else { + org.sheinbergon.dremio.udf.gis.util.GeometryHelpers.markHolderNotSet(binaryOutput); + } + } +} \ No newline at end of file diff --git a/src/main/java/org/sheinbergon/dremio/udf/gis/util/GeometryHelpers.java b/src/main/java/org/sheinbergon/dremio/udf/gis/util/GeometryHelpers.java index 46050d9..6aff1c0 100644 --- a/src/main/java/org/sheinbergon/dremio/udf/gis/util/GeometryHelpers.java +++ b/src/main/java/org/sheinbergon/dremio/udf/gis/util/GeometryHelpers.java @@ -27,6 +27,7 @@ import org.locationtech.jts.algorithm.Angle; import org.locationtech.jts.geom.*; import org.locationtech.jts.io.*; +import org.locationtech.jts.io.geojson.GeoJsonReader; import org.locationtech.jts.io.geojson.GeoJsonWriter; import org.locationtech.jts.operation.buffer.BufferOp; import org.locationtech.jts.operation.valid.IsValidOp; @@ -126,6 +127,17 @@ public static Geometry toGeometry(final @Nonnull NullableVarCharHolder holder) { } } + @Nonnull + public static Geometry toGeometryFromGeoJson(final @Nonnull NullableVarCharHolder holder) { + try { + String json = toUTF8String(holder); + GeoJsonReader reader = new GeoJsonReader(); + return reader.read(json); + } catch (ParseException x) { + throw new RuntimeException(x); + } + } + @Nonnull public static Geometry toGeometryFromEWKT(final @Nonnull NullableVarCharHolder holder) { try { diff --git a/src/test/kotlin/org/sheinbergon/dremio/udf/gis/STGeomFromGeoJsonTests.kt b/src/test/kotlin/org/sheinbergon/dremio/udf/gis/STGeomFromGeoJsonTests.kt new file mode 100644 index 0000000..e0ddd6e --- /dev/null +++ b/src/test/kotlin/org/sheinbergon/dremio/udf/gis/STGeomFromGeoJsonTests.kt @@ -0,0 +1,37 @@ +package org.sheinbergon.dremio.udf.gis + +import org.apache.arrow.vector.holders.NullableVarBinaryHolder +import org.apache.arrow.vector.holders.NullableVarCharHolder +import org.sheinbergon.dremio.udf.gis.spec.GeometryInputFunSpec +import org.sheinbergon.dremio.udf.gis.util.allocateBuffer + +internal class STGeomFromGeoJsonTests : GeometryInputFunSpec.NullableVarChar() { + + init { + testGeometryInput( + "Calling ST_GeomFromGeoJSON on a POINT", + """ + {"type":"Point","coordinates":[0.5,0.5],"crs":{"type":"name","properties":{"name":"EPSG:4326"}}} + """.trimIndent(), + byteArrayOf(1, 1, 0, 0, 32, -26, 16, 0, 0, 0, 0, 0, 0, 0, 0, -32, 63, 0, 0, 0, 0, 0, 0, -32, 63) + ) + + testInvalidGeometryInput( + "Calling ST_GeomFromGeoJSON on rubbish text", + "42ifon2 fA!@", + ) + + testNullGeometryInput( + "Calling ST_GeomFromGeoJSON on null input" + ) + } + + override val function = STGeomFromGeoJson().apply { + jsonInput = NullableVarCharHolder() + binaryOutput = NullableVarBinaryHolder() + buffer = allocateBuffer() + } + + override val STGeomFromGeoJson.input: NullableVarCharHolder get() = function.jsonInput + override val STGeomFromGeoJson.output: NullableVarBinaryHolder get() = function.binaryOutput +}