Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CRS spec definition for version 0.1 #25

Merged
merged 23 commits into from
Mar 6, 2022
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file modified examples/geoparquet/example.parquet
Binary file not shown.
33 changes: 17 additions & 16 deletions examples/geoparquet/example.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,24 +5,25 @@

.. code-block:: python

>>> import json, pprint, pyarrow.parquet
>>> import json, pprint, pyarrow.parquet as pq
>>> pprint.pprint(json.loads(pq.read_schema("example.parquet").metadata[b"geo"]))
{'columns': {'geometry': {'crs': 'GEOGCRS["WGS 84",ENSEMBLE["World Geodetic '
'System 1984 ensemble",MEMBER["World Geodetic '
'System 1984 (Transit)"],MEMBER["World '
'Geodetic System 1984 (G730)"],MEMBER["World '
'Geodetic System 1984 (G873)"],MEMBER["World '
'Geodetic System 1984 (G1150)"],MEMBER["World '
'Geodetic System 1984 (G1674)"],MEMBER["World '
'Geodetic System 1984 '
'(G1762)"],ELLIPSOID["WGS '
'84",6378137,298.257223563],ENSEMBLEACCURACY[2.0]],CS[ellipsoidal,2],AXIS["geodetic '
'latitude (Lat)",north],AXIS["geodetic '
'longitude '
'(Lon)",east],UNIT["degree",0.0174532925199433],USAGE[SCOPE["Horizontal '
'component of 3D '
'system."],AREA["World."],BBOX[-90,-180,90,180]],ID["EPSG",4326]]',
'encoding': 'WKB'}},
'System 1984 ensemble",MEMBER["World Geodetic '
'System 1984 (Transit)"],MEMBER["World '
'Geodetic System 1984 (G730)"],MEMBER["World '
'Geodetic System 1984 (G873)"],MEMBER["World '
'Geodetic System 1984 (G1150)"],MEMBER["World '
'Geodetic System 1984 (G1674)"],MEMBER["World '
'Geodetic System 1984 (G1762)"],MEMBER["World '
'Geodetic System 1984 '
'(G2139)"],ELLIPSOID["WGS '
'84",6378137,298.257223563],ENSEMBLEACCURACY[2.0]],CS[ellipsoidal,2],AXIS["geodetic '
'latitude (Lat)",north],AXIS["geodetic '
'longitude '
'(Lon)",east],UNIT["degree",0.0174532925199433],USAGE[SCOPE["Horizontal '
'component of 3D '
'system."],AREA["World."],BBOX[-90,-180,90,180]],ID["EPSG",4326]]',
'encoding': 'WKB'}},
'primary_column': 'geometry',
'version': '0.1.0'}
"""
Expand Down
45 changes: 33 additions & 12 deletions format-specs/geoparquet.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,26 +50,43 @@ Each geometry column in the dataset must be included in the columns field above

| Field Name | Type | Description |
| ---------- | ----------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------- |
| crs | string | **REQUIRED** [WKT2](http://docs.opengeospatial.org/is/12-063r5/12-063r5.html) string representing the Coordinate Reference System (CRS) of the geometry. |
| crs | string | **REQUIRED** string representing the Coordinate Reference System (CRS) of the geometry. |
alasarr marked this conversation as resolved.
Show resolved Hide resolved
| encoding | string | **REQUIRED** Name of the geometry encoding format. Currently only 'WKB' is supported. |

#### crs

It is strongly recommended to use [EPSG:4326 (lat, long)](https://spatialreference.org/ref/epsg/4326/) for all data, so in most cases the value of the crs should be:
The Coordinate Reference System (CRS) is a mandatory parameter for all the geometries defined in geoparquet format.
alasarr marked this conversation as resolved.
Show resolved Hide resolved

The CRS needs to be provided in [WKT](https://en.wikipedia.org/wiki/Well-known_text_representation_of_coordinate_reference_systems) version 2, also known as **WKT2**. WKT2 has several revisions, this specification supports the revisions from [2015](http://docs.opengeospatial.org/is/12-063r5/12-063r5.html) and [2019](https://docs.opengeospatial.org/is/18-010r7/18-010r7.html): WKT2_2015, WKT2_2015_SIMPLIFIED, WKT2_2019, WKT_2019_SIMPLIFIED.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making this sound a bit more spec-like.

Suggested change
The CRS needs to be provided in [WKT](https://en.wikipedia.org/wiki/Well-known_text_representation_of_coordinate_reference_systems) version 2, also known as **WKT2**. WKT2 has several revisions, this specification supports the revisions from [2015](http://docs.opengeospatial.org/is/12-063r5/12-063r5.html) and [2019](https://docs.opengeospatial.org/is/18-010r7/18-010r7.html): WKT2_2015, WKT2_2015_SIMPLIFIED, WKT2_2019, WKT_2019_SIMPLIFIED.
The CRS must be provided in [WKT](https://en.wikipedia.org/wiki/Well-known_text_representation_of_coordinate_reference_systems) version 2, also known as **WKT2**. WKT2 has several revisions, this specification supports the revisions from [2015](http://docs.opengeospatial.org/is/12-063r5/12-063r5.html) and [2019](https://docs.opengeospatial.org/is/18-010r7/18-010r7.html): WKT2_2015, WKT2_2015_SIMPLIFIED, WKT2_2019, WKT_2019_SIMPLIFIED.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added #26 for general discussion on officially adopting this kind of language.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need/want to support both 2015 and 2019? What are the advantages of doing so? In general I prefer just one way to do things, but am not an expert on these.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that this is a new effort, I think there is something to be said for directly going with only WKT2:2019.

I suppose the main disadvantage is that this new format is not yet as widely supported in libraries, making it more difficult to write compliant geoparquet files.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We did a restriction with WKT2 by removing WKT1. The difference between WKT2 revisions is not really big, and it is well managed by proj4. Thus, I decided to go with all the WKT2 “flavors”, so it’s easier to write files in geoparquet.

I also want to avoid with this decision that people start creating geoparquet files non compliant with non supported WKT2 strings. It might happen that someone use a WKT2 2015 string without realizing of the mistake. If it happens with a large service and a huge amount of datasets out there, the only way to fix it is with another revision of geoparquet spec to add it

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I lean towards going just WKT2:2019 to start, and then see if there are implementations that aren't able to handle it, and then we can loosen it.

The main thing I want to be sure of is that EPSG:4326 has just one representation. So If 2015 vs 2019 for EPSG:4326 are different then I think it'd be really good to just do 2019. My thinking is that we shouldn't make implementations have to check for two different strings that mean the same thing (like if you have proj there).

Ideally we'd have some sort of validator that checks on if it's a 2019 string so we'd catch if a large service starts to use it wrong, so we wouldn't need a spec revision.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main thing I want to be sure of is that EPSG:4326 has just one representation

Note that because of the dynamic nature of "EPSG:4326" (ensemble datum that gets updated over time), the exact representation of it actually depends on the version of the EPSG database you are using. See eg the comment at #24 (comment), where our example parquet file already changed because of this

Ideally we'd have some sort of validator that checks on if it's a 2019 string so we'd catch if a large service starts to use it wrong, so we wouldn't need a spec revision.

Just checking, and it seems that at least PROJ has functionality to check the dialect of a WKT string (although I don't directly see this exposed in pyproj)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally we'd have some sort of validator that checks on if it's a 2019 string so we'd catch if a large service starts to use it wrong, so we wouldn't need a spec revision.

The issue is that we cannot guarantee every tool writing geoparquet file has a proper validator in place.

Anyways, it looks like I'm the only one supporting this idea 😄 and it's 0.1, so let's move on with WKT:2019 and we can revisit it in the future.

Really appreciate your feedback here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, we don't need every tool writing geoparquet to have a proper validator in place. We just need one that the implementor creating a tool can use to check their output to ensure it aligns, and/or that data users can use to then give feedback to the implementor.

And more than happy to revisit - as with other discussions I think it's easier to come back and make things looser than it is to try to tighten later.



As the most common CRS for datasets is latitude/longitude, for the widest interoperability we recommend [EPSG:4326](https://spatialreference.org/ref/epsg/wgs-84) for all data, so in most cases the value of the crs should be:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like we should mention here that our axis order overrides this?

Copy link
Collaborator Author

@alasarr alasarr Mar 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have a specific section for coordinate order where we specify that. Did you see it? Or do you just want to emphasize here too?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I saw it - I think we can refer to that section. I just think we need to call it out here. Acknowledge it's a bit confusing. We say here 'use the crs for latitude/longitude', and then below we say 'but do it as longitude/latitude'.

I suppose alternatively we don't actually mention lat/long or or long/lat here - we just that we recommend 4326 for widest interoperability.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose alternatively we don't actually mention lat/long or or long/lat here - we just that we recommend 4326 for widest interoperability.

+1

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point now 👍

alasarr marked this conversation as resolved.
Show resolved Hide resolved

```
GEOGCS["WGS 84",
DATUM["WGS_1984",
SPHEROID["WGS 84",6378137,298.257223563,
AUTHORITY["EPSG","7030"]],
AUTHORITY["EPSG","6326"]],
PRIMEM["Greenwich",0,
AUTHORITY["EPSG","8901"]],
UNIT["degree",0.01745329251994328,
AUTHORITY["EPSG","9122"]],
AUTHORITY["EPSG","4326"]]
GEOGCRS["WGS 84",
cholmes marked this conversation as resolved.
Show resolved Hide resolved
ENSEMBLE["World Geodetic System 1984 ensemble",
MEMBER["World Geodetic System 1984 (Transit)"],
MEMBER["World Geodetic System 1984 (G730)"],
MEMBER["World Geodetic System 1984 (G873)"],
MEMBER["World Geodetic System 1984 (G1150)"],
MEMBER["World Geodetic System 1984 (G1674)"],
MEMBER["World Geodetic System 1984 (G1762)"],
MEMBER["World Geodetic System 1984 (G2139)"],
ELLIPSOID["WGS 84",6378137,298.257223563],
ENSEMBLEACCURACY[2.0]],
CS[ellipsoidal,2],
AXIS["geodetic latitude (Lat)",north],
AXIS["geodetic longitude (Lon)",east],
UNIT["degree",0.0174532925199433],
USAGE[
SCOPE["Horizontal component of 3D system."],
AREA["World."],
BBOX[-90,-180,90,180]],
ID["EPSG",4326]]
```

Due to the large number of CRSes available and the difficulty of implementing all of them, we strongly encourage the first implementing [EPSG:4326](https://spatialreference.org/ref/epsg/wgs-84).

alasarr marked this conversation as resolved.
Show resolved Hide resolved
Data that is better served in particular projections can choose to use an alternate coordinate reference system.
alasarr marked this conversation as resolved.
Show resolved Hide resolved

#### encoding
Expand All @@ -78,6 +95,10 @@ This is the binary format that the geometry is encoded in. The string 'WKB' to r
[Well Known Binary](https://en.wikipedia.org/wiki/Well-known_text_representation_of_geometry#Well-known_binary) is the only current option, but future versions
alasarr marked this conversation as resolved.
Show resolved Hide resolved
of the spec may support alternative encodings. This should be the ["standard"](https://libgeos.org/specifications/wkb/#standard-wkb) WKB representation.
alasarr marked this conversation as resolved.
Show resolved Hide resolved

#### Coordinated order
alasarr marked this conversation as resolved.
Show resolved Hide resolved

The axis order in WKB stored in a geoparquet follows the de facto standard for axis order in WKB and is therefore always (x,y{,z}{,m}) where x is easting or longitude, y is northing or latitude, z is optional elevation, and m is optional measure. This ordering explicitly overrides the axis order as specified in the CRS.
alasarr marked this conversation as resolved.
Show resolved Hide resolved

### Additional information

## TODO
Expand Down