-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CRS spec definition for version 0.1 #25
Conversation
I've just made a change to the recommended CRS. I've decided to move from OGC:84 to EPSG:4326 (as we decided at #3 (comment)). The reasons are:
Sorry to re-open this, but I think it's important. If you don't agree with this change we can rollback to OGC:84: 2ddcb90 |
I was actually looking around at OGC 84 as well and reached a similar conclusion, so I'm +1 on keeping EPSG:4326. And agree that our decision to force axis order also helps. I think we should perhaps 'cite' geopackage, refer to the section of the spec, and say that we are following them. Perhaps it would be good to write up a small page in the spec repo that helps explain the state of things. It looks like geojson uses OGC:84, but it also looks like doing that is easier if you're not including a WKT2. I do worry a bit that we're just propagating the issue with using 4326 and doing lon/lat. Perhaps we should ask the geopackage group if they still feel good about their decision. But yes, for 0.1 let's just go with 4326 and clarify that we're overriding it, and we can also revisit it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good. Made a bunch of suggested commits and comments.
format-specs/geoparquet.md
Outdated
It is strongly recommended to use [EPSG:4326 (lat, long)](https://spatialreference.org/ref/epsg/4326/) for all data, so in most cases the value of the crs should be: | ||
The Coordinate Reference System (CRS) is a mandatory parameter for all the geometries defined in geoparquet format. | ||
|
||
The CRS needs to be provided in [WKT](https://en.wikipedia.org/wiki/Well-known_text_representation_of_coordinate_reference_systems) version 2, also known as **WKT2**. WKT2 has several revisions, this specification supports the revisions from [2015](http://docs.opengeospatial.org/is/12-063r5/12-063r5.html) and [2019](https://docs.opengeospatial.org/is/18-010r7/18-010r7.html): WKT2_2015, WKT2_2015_SIMPLIFIED, WKT2_2019, WKT_2019_SIMPLIFIED. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Making this sound a bit more spec-like.
The CRS needs to be provided in [WKT](https://en.wikipedia.org/wiki/Well-known_text_representation_of_coordinate_reference_systems) version 2, also known as **WKT2**. WKT2 has several revisions, this specification supports the revisions from [2015](http://docs.opengeospatial.org/is/12-063r5/12-063r5.html) and [2019](https://docs.opengeospatial.org/is/18-010r7/18-010r7.html): WKT2_2015, WKT2_2015_SIMPLIFIED, WKT2_2019, WKT_2019_SIMPLIFIED. | |
The CRS must be provided in [WKT](https://en.wikipedia.org/wiki/Well-known_text_representation_of_coordinate_reference_systems) version 2, also known as **WKT2**. WKT2 has several revisions, this specification supports the revisions from [2015](http://docs.opengeospatial.org/is/12-063r5/12-063r5.html) and [2019](https://docs.opengeospatial.org/is/18-010r7/18-010r7.html): WKT2_2015, WKT2_2015_SIMPLIFIED, WKT2_2019, WKT_2019_SIMPLIFIED. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added #26 for general discussion on officially adopting this kind of language.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need/want to support both 2015 and 2019? What are the advantages of doing so? In general I prefer just one way to do things, but am not an expert on these.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that this is a new effort, I think there is something to be said for directly going with only WKT2:2019.
I suppose the main disadvantage is that this new format is not yet as widely supported in libraries, making it more difficult to write compliant geoparquet files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We did a restriction with WKT2 by removing WKT1. The difference between WKT2 revisions is not really big, and it is well managed by proj4. Thus, I decided to go with all the WKT2 “flavors”, so it’s easier to write files in geoparquet.
I also want to avoid with this decision that people start creating geoparquet files non compliant with non supported WKT2 strings. It might happen that someone use a WKT2 2015 string without realizing of the mistake. If it happens with a large service and a huge amount of datasets out there, the only way to fix it is with another revision of geoparquet spec to add it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I lean towards going just WKT2:2019 to start, and then see if there are implementations that aren't able to handle it, and then we can loosen it.
The main thing I want to be sure of is that EPSG:4326 has just one representation. So If 2015 vs 2019 for EPSG:4326 are different then I think it'd be really good to just do 2019. My thinking is that we shouldn't make implementations have to check for two different strings that mean the same thing (like if you have proj there).
Ideally we'd have some sort of validator that checks on if it's a 2019 string so we'd catch if a large service starts to use it wrong, so we wouldn't need a spec revision.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main thing I want to be sure of is that EPSG:4326 has just one representation
Note that because of the dynamic nature of "EPSG:4326" (ensemble datum that gets updated over time), the exact representation of it actually depends on the version of the EPSG database you are using. See eg the comment at #24 (comment), where our example parquet file already changed because of this
Ideally we'd have some sort of validator that checks on if it's a 2019 string so we'd catch if a large service starts to use it wrong, so we wouldn't need a spec revision.
Just checking, and it seems that at least PROJ has functionality to check the dialect of a WKT string (although I don't directly see this exposed in pyproj)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally we'd have some sort of validator that checks on if it's a 2019 string so we'd catch if a large service starts to use it wrong, so we wouldn't need a spec revision.
The issue is that we cannot guarantee every tool writing geoparquet file has a proper validator in place.
Anyways, it looks like I'm the only one supporting this idea 😄 and it's 0.1, so let's move on with WKT:2019 and we can revisit it in the future.
Really appreciate your feedback here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, we don't need every tool writing geoparquet to have a proper validator in place. We just need one that the implementor creating a tool can use to check their output to ensure it aligns, and/or that data users can use to then give feedback to the implementor.
And more than happy to revisit - as with other discussions I think it's easier to come back and make things looser than it is to try to tighten later.
format-specs/geoparquet.md
Outdated
The CRS needs to be provided in [WKT](https://en.wikipedia.org/wiki/Well-known_text_representation_of_coordinate_reference_systems) version 2, also known as **WKT2**. WKT2 has several revisions, this specification supports the revisions from [2015](http://docs.opengeospatial.org/is/12-063r5/12-063r5.html) and [2019](https://docs.opengeospatial.org/is/18-010r7/18-010r7.html): WKT2_2015, WKT2_2015_SIMPLIFIED, WKT2_2019, WKT_2019_SIMPLIFIED. | ||
|
||
|
||
As the most common CRS for datasets is latitude/longitude, for the widest interoperability we recommend [EPSG:4326](https://spatialreference.org/ref/epsg/wgs-84) for all data, so in most cases the value of the crs should be: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like we should mention here that our axis order overrides this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already have a specific section for coordinate order where we specify that. Did you see it? Or do you just want to emphasize here too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I saw it - I think we can refer to that section. I just think we need to call it out here. Acknowledge it's a bit confusing. We say here 'use the crs for latitude/longitude', and then below we say 'but do it as longitude/latitude'.
I suppose alternatively we don't actually mention lat/long or or long/lat here - we just that we recommend 4326 for widest interoperability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose alternatively we don't actually mention lat/long or or long/lat here - we just that we recommend 4326 for widest interoperability.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see your point now 👍
Co-authored-by: Chris Holmes <[email protected]>
Co-authored-by: Chris Holmes <[email protected]>
Co-authored-by: Chris Holmes <[email protected]>
Co-authored-by: Chris Holmes <[email protected]>
Co-authored-by: Chris Holmes <[email protected]>
Co-authored-by: Chris Holmes <[email protected]>
Co-authored-by: Chris Holmes <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! Added a few more minor comments.
Sidenote: I don't think the use of WKT2 matters that much. The full WKT2:2019 string for EPSG:4326 or OGC:CRS84 is almost identical (except for the switch axis order, and the different ID) Comparison of WKT2 for both
Now, given that we explicitly overrule the axis order of the CRS, it might indeed not matter that much whether we use OGC:CRS84 or EPSG:4326. And since EPSG:4326 is much more common, that seems a good reason to go with that. Another question is maybe rather if we actually want to strongly recommend an inaccurate ensemble CRS (EPSG:4326 and OGC:CRS84 both use the same ensemble datum, so that question applies to both). For example QGIS now warns about this, see for example https://twitter.com/nyalldawson/status/1390118738251317254 for some context. |
Co-authored-by: Joris Van den Bossche <[email protected]>
Co-authored-by: Joris Van den Bossche <[email protected]>
Co-authored-by: Joris Van den Bossche <[email protected]>
Co-authored-by: Joris Van den Bossche <[email protected]>
Co-authored-by: Joris Van den Bossche <[email protected]>
Yeah, it does feel weird to explain to users to use 4326 but ignore what it actually says. Fully agree on opening an issue and just shipping what we have in 0.1. I think it'd be good to get a wider audience to sound in on what we should do - I'm definitely going back and forth on it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have two little commitable suggestions, but other than that this looks good, and I may be away from the computer for the rest of the day so giving my approval so this can merge.
Co-authored-by: Chris Holmes <[email protected]>
Co-authored-by: Chris Holmes <[email protected]>
(small note: I pushed a commit to revert the changes to the example, to avoid conflicts with #24) |
Co-authored-by: Joris Van den Bossche <[email protected]>
Thanks @alasarr ! |
This PR aims to cover the definition of CRS for version 0.1.
I've covered all the topics discussed at #3