Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CRS spec definition for version 0.1 #25

Merged
merged 23 commits into from
Mar 6, 2022
Merged

CRS spec definition for version 0.1 #25

merged 23 commits into from
Mar 6, 2022

Conversation

alasarr
Copy link
Collaborator

@alasarr alasarr commented Mar 3, 2022

This PR aims to cover the definition of CRS for version 0.1.

I've covered all the topics discussed at #3

@alasarr
Copy link
Collaborator Author

alasarr commented Mar 3, 2022

I've just made a change to the recommended CRS. I've decided to move from OGC:84 to EPSG:4326 (as we decided at #3 (comment)). The reasons are:

  • We decided OGC:84 over EPSG:4326 because of the axis order, but we later decided to force a specific axis order of long/lat.
  • The first dataset I was about to use (at our example.py) was in EPSG:4326 and I needed to transform it to OGC:84.
  • 4326 is far more used than OGC:84.
  • The experience working with OGC:84 is harder than 4326. It's not very hard, but I played a little bit with geopandas and pyproj to get the WKT2 string, apply transformations, etc... and it was easier for me with 4326 because I'm more familiar with this, I think lots of users will have the same feeling.

Sorry to re-open this, but I think it's important. If you don't agree with this change we can rollback to OGC:84: 2ddcb90

@alasarr alasarr added this to the 0.1 milestone Mar 3, 2022
@cholmes
Copy link
Member

cholmes commented Mar 3, 2022

I was actually looking around at OGC 84 as well and reached a similar conclusion, so I'm +1 on keeping EPSG:4326. And agree that our decision to force axis order also helps. I think we should perhaps 'cite' geopackage, refer to the section of the spec, and say that we are following them.

Perhaps it would be good to write up a small page in the spec repo that helps explain the state of things. It looks like geojson uses OGC:84, but it also looks like doing that is easier if you're not including a WKT2. I do worry a bit that we're just propagating the issue with using 4326 and doing lon/lat. Perhaps we should ask the geopackage group if they still feel good about their decision.

But yes, for 0.1 let's just go with 4326 and clarify that we're overriding it, and we can also revisit it.

Copy link
Member

@cholmes cholmes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good. Made a bunch of suggested commits and comments.

format-specs/geoparquet.md Outdated Show resolved Hide resolved
format-specs/geoparquet.md Outdated Show resolved Hide resolved
It is strongly recommended to use [EPSG:4326 (lat, long)](https://spatialreference.org/ref/epsg/4326/) for all data, so in most cases the value of the crs should be:
The Coordinate Reference System (CRS) is a mandatory parameter for all the geometries defined in geoparquet format.

The CRS needs to be provided in [WKT](https://en.wikipedia.org/wiki/Well-known_text_representation_of_coordinate_reference_systems) version 2, also known as **WKT2**. WKT2 has several revisions, this specification supports the revisions from [2015](http://docs.opengeospatial.org/is/12-063r5/12-063r5.html) and [2019](https://docs.opengeospatial.org/is/18-010r7/18-010r7.html): WKT2_2015, WKT2_2015_SIMPLIFIED, WKT2_2019, WKT_2019_SIMPLIFIED.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making this sound a bit more spec-like.

Suggested change
The CRS needs to be provided in [WKT](https://en.wikipedia.org/wiki/Well-known_text_representation_of_coordinate_reference_systems) version 2, also known as **WKT2**. WKT2 has several revisions, this specification supports the revisions from [2015](http://docs.opengeospatial.org/is/12-063r5/12-063r5.html) and [2019](https://docs.opengeospatial.org/is/18-010r7/18-010r7.html): WKT2_2015, WKT2_2015_SIMPLIFIED, WKT2_2019, WKT_2019_SIMPLIFIED.
The CRS must be provided in [WKT](https://en.wikipedia.org/wiki/Well-known_text_representation_of_coordinate_reference_systems) version 2, also known as **WKT2**. WKT2 has several revisions, this specification supports the revisions from [2015](http://docs.opengeospatial.org/is/12-063r5/12-063r5.html) and [2019](https://docs.opengeospatial.org/is/18-010r7/18-010r7.html): WKT2_2015, WKT2_2015_SIMPLIFIED, WKT2_2019, WKT_2019_SIMPLIFIED.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added #26 for general discussion on officially adopting this kind of language.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need/want to support both 2015 and 2019? What are the advantages of doing so? In general I prefer just one way to do things, but am not an expert on these.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that this is a new effort, I think there is something to be said for directly going with only WKT2:2019.

I suppose the main disadvantage is that this new format is not yet as widely supported in libraries, making it more difficult to write compliant geoparquet files.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We did a restriction with WKT2 by removing WKT1. The difference between WKT2 revisions is not really big, and it is well managed by proj4. Thus, I decided to go with all the WKT2 “flavors”, so it’s easier to write files in geoparquet.

I also want to avoid with this decision that people start creating geoparquet files non compliant with non supported WKT2 strings. It might happen that someone use a WKT2 2015 string without realizing of the mistake. If it happens with a large service and a huge amount of datasets out there, the only way to fix it is with another revision of geoparquet spec to add it

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I lean towards going just WKT2:2019 to start, and then see if there are implementations that aren't able to handle it, and then we can loosen it.

The main thing I want to be sure of is that EPSG:4326 has just one representation. So If 2015 vs 2019 for EPSG:4326 are different then I think it'd be really good to just do 2019. My thinking is that we shouldn't make implementations have to check for two different strings that mean the same thing (like if you have proj there).

Ideally we'd have some sort of validator that checks on if it's a 2019 string so we'd catch if a large service starts to use it wrong, so we wouldn't need a spec revision.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main thing I want to be sure of is that EPSG:4326 has just one representation

Note that because of the dynamic nature of "EPSG:4326" (ensemble datum that gets updated over time), the exact representation of it actually depends on the version of the EPSG database you are using. See eg the comment at #24 (comment), where our example parquet file already changed because of this

Ideally we'd have some sort of validator that checks on if it's a 2019 string so we'd catch if a large service starts to use it wrong, so we wouldn't need a spec revision.

Just checking, and it seems that at least PROJ has functionality to check the dialect of a WKT string (although I don't directly see this exposed in pyproj)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally we'd have some sort of validator that checks on if it's a 2019 string so we'd catch if a large service starts to use it wrong, so we wouldn't need a spec revision.

The issue is that we cannot guarantee every tool writing geoparquet file has a proper validator in place.

Anyways, it looks like I'm the only one supporting this idea 😄 and it's 0.1, so let's move on with WKT:2019 and we can revisit it in the future.

Really appreciate your feedback here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, we don't need every tool writing geoparquet to have a proper validator in place. We just need one that the implementor creating a tool can use to check their output to ensure it aligns, and/or that data users can use to then give feedback to the implementor.

And more than happy to revisit - as with other discussions I think it's easier to come back and make things looser than it is to try to tighten later.

The CRS needs to be provided in [WKT](https://en.wikipedia.org/wiki/Well-known_text_representation_of_coordinate_reference_systems) version 2, also known as **WKT2**. WKT2 has several revisions, this specification supports the revisions from [2015](http://docs.opengeospatial.org/is/12-063r5/12-063r5.html) and [2019](https://docs.opengeospatial.org/is/18-010r7/18-010r7.html): WKT2_2015, WKT2_2015_SIMPLIFIED, WKT2_2019, WKT_2019_SIMPLIFIED.


As the most common CRS for datasets is latitude/longitude, for the widest interoperability we recommend [EPSG:4326](https://spatialreference.org/ref/epsg/wgs-84) for all data, so in most cases the value of the crs should be:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like we should mention here that our axis order overrides this?

Copy link
Collaborator Author

@alasarr alasarr Mar 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have a specific section for coordinate order where we specify that. Did you see it? Or do you just want to emphasize here too?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I saw it - I think we can refer to that section. I just think we need to call it out here. Acknowledge it's a bit confusing. We say here 'use the crs for latitude/longitude', and then below we say 'but do it as longitude/latitude'.

I suppose alternatively we don't actually mention lat/long or or long/lat here - we just that we recommend 4326 for widest interoperability.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose alternatively we don't actually mention lat/long or or long/lat here - we just that we recommend 4326 for widest interoperability.

+1

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point now 👍

format-specs/geoparquet.md Show resolved Hide resolved
format-specs/geoparquet.md Outdated Show resolved Hide resolved
format-specs/geoparquet.md Outdated Show resolved Hide resolved
format-specs/geoparquet.md Outdated Show resolved Hide resolved
format-specs/geoparquet.md Outdated Show resolved Hide resolved
format-specs/geoparquet.md Outdated Show resolved Hide resolved
Copy link
Collaborator

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! Added a few more minor comments.

format-specs/geoparquet.md Outdated Show resolved Hide resolved
format-specs/geoparquet.md Outdated Show resolved Hide resolved
format-specs/geoparquet.md Outdated Show resolved Hide resolved
format-specs/geoparquet.md Outdated Show resolved Hide resolved
format-specs/geoparquet.md Outdated Show resolved Hide resolved
format-specs/geoparquet.md Outdated Show resolved Hide resolved
@jorisvandenbossche
Copy link
Collaborator

it looks like geojson uses OGC:84, but it also looks like doing that is easier if you're not including a WKT2

Sidenote: I don't think the use of WKT2 matters that much. The full WKT2:2019 string for EPSG:4326 or OGC:CRS84 is almost identical (except for the switch axis order, and the different ID)

Comparison of WKT2 for both
>>> import pyproj
>>> crs_epsg4326 = pyproj.CRS("EPSG:4326")
>>> crs_ogc84 = pyproj.CRS("OGC:CRS84")

>>> print(crs_ogc84.to_wkt(pretty=True))
GEOGCRS["WGS 84 (CRS84)",
    ENSEMBLE["World Geodetic System 1984 ensemble",
        MEMBER["World Geodetic System 1984 (Transit)"],
        MEMBER["World Geodetic System 1984 (G730)"],
        MEMBER["World Geodetic System 1984 (G873)"],
        MEMBER["World Geodetic System 1984 (G1150)"],
        MEMBER["World Geodetic System 1984 (G1674)"],
        MEMBER["World Geodetic System 1984 (G1762)"],
        MEMBER["World Geodetic System 1984 (G2139)"],
        ELLIPSOID["WGS 84",6378137,298.257223563,
            LENGTHUNIT["metre",1]],
        ENSEMBLEACCURACY[2.0]],
    PRIMEM["Greenwich",0,
        ANGLEUNIT["degree",0.0174532925199433]],
    CS[ellipsoidal,2],
        AXIS["geodetic longitude (Lon)",east,
            ORDER[1],
            ANGLEUNIT["degree",0.0174532925199433]],
        AXIS["geodetic latitude (Lat)",north,
            ORDER[2],
            ANGLEUNIT["degree",0.0174532925199433]],
    USAGE[
        SCOPE["Not known."],
        AREA["World."],
        BBOX[-90,-180,90,180]],
    ID["OGC","CRS84"]]

>>> print(crs_epsg4326.to_wkt(pretty=True))
GEOGCRS["WGS 84",
    ENSEMBLE["World Geodetic System 1984 ensemble",
        MEMBER["World Geodetic System 1984 (Transit)"],
        MEMBER["World Geodetic System 1984 (G730)"],
        MEMBER["World Geodetic System 1984 (G873)"],
        MEMBER["World Geodetic System 1984 (G1150)"],
        MEMBER["World Geodetic System 1984 (G1674)"],
        MEMBER["World Geodetic System 1984 (G1762)"],
        MEMBER["World Geodetic System 1984 (G2139)"],
        ELLIPSOID["WGS 84",6378137,298.257223563,
            LENGTHUNIT["metre",1]],
        ENSEMBLEACCURACY[2.0]],
    PRIMEM["Greenwich",0,
        ANGLEUNIT["degree",0.0174532925199433]],
    CS[ellipsoidal,2],
        AXIS["geodetic latitude (Lat)",north,
            ORDER[1],
            ANGLEUNIT["degree",0.0174532925199433]],
        AXIS["geodetic longitude (Lon)",east,
            ORDER[2],
            ANGLEUNIT["degree",0.0174532925199433]],
    USAGE[
        SCOPE["Horizontal component of 3D system."],
        AREA["World."],
        BBOX[-90,-180,90,180]],
    ID["EPSG",4326]]

Now, given that we explicitly overrule the axis order of the CRS, it might indeed not matter that much whether we use OGC:CRS84 or EPSG:4326. And since EPSG:4326 is much more common, that seems a good reason to go with that.


Another question is maybe rather if we actually want to strongly recommend an inaccurate ensemble CRS (EPSG:4326 and OGC:CRS84 both use the same ensemble datum, so that question applies to both). For example QGIS now warns about this, see for example https://twitter.com/nyalldawson/status/1390118738251317254 for some context.
Now, if we want to discuss this, let's open a new issue for that (it's a hairy discussion with no easy answer, but just wanted to mention it here, being reminded about it seeing the ensemble datum in the WKT2 output). I think for this PR / for 0.1, the current recommendation is good enough (in the end, it's only a recommendation, and changing a recommendation is not a breaking change to the spec).

format-specs/geoparquet.md Outdated Show resolved Hide resolved
format-specs/geoparquet.md Outdated Show resolved Hide resolved
@cholmes
Copy link
Member

cholmes commented Mar 4, 2022

Now, if we want to discuss this, let's open a new issue for that (it's a hairy discussion with no easy answer, but just wanted to mention it here, being reminded about it seeing the ensemble datum in the WKT2 output). I think for this PR / for 0.1, the current recommendation is good enough (in the end, it's only a recommendation, and changing a recommendation is not a breaking change to the spec).

Yeah, it does feel weird to explain to users to use 4326 but ignore what it actually says. Fully agree on opening an issue and just shipping what we have in 0.1. I think it'd be good to get a wider audience to sound in on what we should do - I'm definitely going back and forth on it.

Copy link
Member

@cholmes cholmes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have two little commitable suggestions, but other than that this looks good, and I may be away from the computer for the rest of the day so giving my approval so this can merge.

@jorisvandenbossche
Copy link
Collaborator

(small note: I pushed a commit to revert the changes to the example, to avoid conflicts with #24)

format-specs/geoparquet.md Outdated Show resolved Hide resolved
@jorisvandenbossche
Copy link
Collaborator

Thanks @alasarr !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants