Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] GeoParquet support #2129

Open
2 of 8 tasks
hongbo-miao opened this issue Oct 1, 2023 · 3 comments
Open
2 of 8 tasks

[Feature Request] GeoParquet support #2129

hongbo-miao opened this issue Oct 1, 2023 · 3 comments
Labels
enhancement New feature or request

Comments

@hongbo-miao
Copy link

hongbo-miao commented Oct 1, 2023

Feature request

Which Delta project/connector is this regarding?

  • Spark
  • Standalone
  • Flink
  • Kernel
  • Other (fill in here)

Overview

We are hoping to save geo data such as polygons in Delta Lake.

Motivation

Currently Apache Sedona which can help Spark read geo data from files such as GeoParquet, Shapefile, CSV (WKT, WKB formats).

GeoParquet just released formal 1.0.0 version.

It would be great to support GeoParquet, which can make it easy to save geo data such as polygons and potentially later query by Spark through Apache Sedona. Thanks! 😃

Further details

GeoParquet and Apache Sedona sides also mentioned about Detla Lake. It may need some collaborations from different parties to make it happen.

Willingness to contribute

The Delta Lake Community encourages new feature contributions. Would you or another member of your organization be willing to contribute an implementation of this feature?

  • Yes. I can contribute this feature independently.
  • Yes. I would be willing to contribute this feature with guidance from the Delta Lake community.
  • No. I cannot contribute this feature at this time.
@hongbo-miao hongbo-miao added the enhancement New feature or request label Oct 1, 2023
@hongbo-miao hongbo-miao changed the title [Feature Request] Geo data support [Feature Request] GeoParquet support Oct 1, 2023
@kylebarron
Copy link

👋 I'm a contributor to the GeoParquet spec and interested in exploring an intergration with Delta Lake. GeoParquet 1.1 will include native (non-binary) geometry support, based on GeoArrow, as well as bounding box columns to support spatial filtering for WKB-encoded geometries.

I don't know the Delta Lake spec well, but it seems to me this should be complementary, as long as there's some way to associate metadata with a column and store min/max column statistics. Would someone be able to point to the right place for that? I could potentially make a Rust/Python implementation

@ymoisan
Copy link

ymoisan commented May 28, 2024

What is the best data type for the geo column to partition/z-order/cluster on ?

@mmgeorge
Copy link

mmgeorge commented Jun 21, 2024

In GeoParquet 1.1 I believe one would want to z-order on the bounding-box column which can be [<xmin>, <ymin>, <xmax>, <ymax>] or [<xmin>, <ymin>, <zmin>, <xmax>, <ymax>, <zmax>] depending on if the geometry is 2D or 3D.

Would also love to see support for this, since delta tables don't currently support this metadata yet, it seems like we will need to do something custom on our end.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants