diff --git a/.gitignore b/.gitignore index 659c4a8..c31c9c9 100644 --- a/.gitignore +++ b/.gitignore @@ -1,2 +1,8 @@ /scripts/data/ /scripts/__pycache__/ +/format-specs/relaton/ +/format-specs/iev/ +.DS_Store +format-specs/document.err.html +format-specs/document.presentation.xml +format-specs/document.html diff --git a/format-specs/Gemfile b/format-specs/Gemfile new file mode 100644 index 0000000..8d658e5 --- /dev/null +++ b/format-specs/Gemfile @@ -0,0 +1,4 @@ +source "https://rubygems.org" + +gem "metanorma-cli" +gem "relaton-cli" diff --git a/format-specs/README.adoc b/format-specs/README.adoc new file mode 100644 index 0000000..4b84362 --- /dev/null +++ b/format-specs/README.adoc @@ -0,0 +1,26 @@ += Standard template in Metanorma + +== Content + +This repository contains the content for an OGC standard. + +* `document.adoc` - the main standard document with references to all sections +* remaining ``adoc``s - each section of the standard document is in a separate document: follow directions in each document to populate +* `figures` - figures go here +* `images` - Image files for graphics go here. Image files for figures go in the `figures` directory. Only place in here images not used in figures (e.g., as parts of tables, as logos, etc.) +* `requirements` - directory for requirements and requirement classes to be referenced in `clause_7_normative_text.adoc` +* `code` - sample code to accompany the standard, if desired +* `abstract_tests` - the Abstract Test Suite comprising one test for every requirement, optional +* `UML` - UML diagrams, if applicable + +More information about the document template is https://github.com/opengeospatial/templates/tree/master/standard#readme[here]. + +An authoring guide is available at https://www.metanorma.org/author/ogc/authoring-guide/[metanorma.org]. + +== Building + +Run `docker run -v "$(pwd)":/metanorma -v ${HOME}/.fontist/fonts/:/config/fonts metanorma/metanorma metanorma compile --agree-to-terms -t ogc -x html document.adoc`. + +== Auto built document + +A daily built document is available at https://docs.ogc.org/DRAFTS/[OGC Document DRAFTS]. \ No newline at end of file diff --git a/format-specs/abstract_tests/ATS_class_core.adoc b/format-specs/abstract_tests/ATS_class_core.adoc new file mode 100644 index 0000000..b6f77e1 --- /dev/null +++ b/format-specs/abstract_tests/ATS_class_core.adoc @@ -0,0 +1,48 @@ +[[ats_core]] +[conformance_class] +==== +[%metadata] +identifier:: /conf/core +subject:: <> +classification:: Target Type:Apache Parquet file +conformance-test:: /conf/core/geometry-columns +conformance-test:: /conf/core/nesting +conformance-test:: /conf/core/repetition +conformance-test:: /conf/core/metadata +conformance-test:: /conf/core/crs +conformance-test:: /conf/core/epoch +conformance-test:: /conf/core/orientation +conformance-test:: /conf/core/bbox +==== + +==== Geometry colums + +include::./TEST001.adoc[] + +==== Nesting + +include::./TEST002.adoc[] + +==== Repetition + +include::./TEST003.adoc[] + +==== Metadata + +include::./TEST004.adoc[] + +==== CRS + +include::./TEST005.adoc[] + +==== Epoch + +include::./TEST006.adoc[] + +==== Orientation + +include::./TEST007.adoc[] + +==== Bounding Box + +include::./TEST008.adoc[] \ No newline at end of file diff --git a/format-specs/abstract_tests/README.adoc b/format-specs/abstract_tests/README.adoc new file mode 100644 index 0000000..09aa402 --- /dev/null +++ b/format-specs/abstract_tests/README.adoc @@ -0,0 +1,5 @@ +This folder contains the Abstract Test Suite. + +The test is expressed according to this pattern: + +NOTE: for each test, there should be a corresponding requirement in the "requirements" folder. diff --git a/format-specs/abstract_tests/TEST001.adoc b/format-specs/abstract_tests/TEST001.adoc new file mode 100644 index 0000000..a03826d --- /dev/null +++ b/format-specs/abstract_tests/TEST001.adoc @@ -0,0 +1,15 @@ + +[abstract_test] +==== +[%metadata] +identifier:: /conf/core/geometry-columns +target:: /req/core/geometry-columns +test-purpose:: Validate that geometry columns are stored using the BYTE_ARRAY parquet type. +test-method:: ++ +-- +1. Verify that geometry columns are stored using the BYTE_ARRAY parquet type. + +2. Verify that geometries are encoded as WKB. +-- +==== \ No newline at end of file diff --git a/format-specs/abstract_tests/TEST002.adoc b/format-specs/abstract_tests/TEST002.adoc new file mode 100644 index 0000000..e271eba --- /dev/null +++ b/format-specs/abstract_tests/TEST002.adoc @@ -0,0 +1,16 @@ + +[abstract_test] +==== +[%metadata] +identifier:: /conf/core/nesting +target:: /req/core/nesting +test-purpose:: Validate that geometries are not contained in complex or nested types such as structs, lists, arrays, or map types. +test-method:: ++ +-- +1. Verify that geometry columns are at the root of the schema. + +2. Verify that no geometry is a group field or nested in a group. + +-- +==== \ No newline at end of file diff --git a/format-specs/abstract_tests/TEST003.adoc b/format-specs/abstract_tests/TEST003.adoc new file mode 100644 index 0000000..49c7d42 --- /dev/null +++ b/format-specs/abstract_tests/TEST003.adoc @@ -0,0 +1,16 @@ + +[abstract_test] +==== +[%metadata] +identifier:: /conf/core/repetition +target:: /req/core/repetition +test-purpose:: Validate the cardinality of geometry columns. +test-method:: ++ +-- +1. Verify that the cardinality for all geometry columns is “required” (exactly one) or “optional” (zero or one). + +2. Verify that no geometry column is repeated. + +-- +==== \ No newline at end of file diff --git a/format-specs/abstract_tests/TEST004.adoc b/format-specs/abstract_tests/TEST004.adoc new file mode 100644 index 0000000..5d8a4c6 --- /dev/null +++ b/format-specs/abstract_tests/TEST004.adoc @@ -0,0 +1,19 @@ + +[abstract_test] +==== +[%metadata] +identifier:: /conf/core/metadata +target:: /req/core/metadata +test-purpose:: Validate the metadata keys contained in the GeoParquet file. +test-method:: ++ +-- + +1. Verify that the GeoParquet file includes a geo key in the Parquet metadata (see FileMetaData::key_value_metadata). + +2. Verify that the value of this key is a JSON-encoded UTF-8 string representing the file and column metadata that validates against the GeoParquet metadata schema. + +3. Verify that each geometry column in the dataset is included in the columns field (specified in <>) with the content specified in <>, keyed by the column name + +-- +==== \ No newline at end of file diff --git a/format-specs/abstract_tests/TEST005.adoc b/format-specs/abstract_tests/TEST005.adoc new file mode 100644 index 0000000..dd510f8 --- /dev/null +++ b/format-specs/abstract_tests/TEST005.adoc @@ -0,0 +1,17 @@ + +[abstract_test] +==== +[%metadata] +identifier:: /conf/core/crs +target:: /req/core/crs +test-purpose:: Validate that the CRS correctly specified. +test-method:: ++ +-- + +1. If CRS is provided, verify that the CRS is provided in https://proj.org/specifications/projjson.html[PROJJSON] format. + +2. If CRS is not provided, verify that all coordinates in the geometries use longitude, latitude based on the WGS84 datum, and the default value is https://www.opengis.net/def/crs/OGC/1.3/CRS84[OGC:CRS84] for CRS-aware implementations. + +-- +==== \ No newline at end of file diff --git a/format-specs/abstract_tests/TEST006.adoc b/format-specs/abstract_tests/TEST006.adoc new file mode 100644 index 0000000..3cde708 --- /dev/null +++ b/format-specs/abstract_tests/TEST006.adoc @@ -0,0 +1,15 @@ + +[abstract_test] +==== +[%metadata] +identifier:: /conf/core/epoch +target:: /req/core/epoch +test-purpose:: If the crs field defines a dynamic CRS, validate that the coordinates are qualified with the epoch at which they are valid. +test-method:: ++ +-- + +1. If the crs field defines a dynamic CRS, verify that the coordinates are qualified with the epoch at which they are valid. + +-- +==== \ No newline at end of file diff --git a/format-specs/abstract_tests/TEST007.adoc b/format-specs/abstract_tests/TEST007.adoc new file mode 100644 index 0000000..b209780 --- /dev/null +++ b/format-specs/abstract_tests/TEST007.adoc @@ -0,0 +1,17 @@ + +[abstract_test] +==== +[%metadata] +identifier:: /conf/core/orientation +target:: /req/core/orientation +test-purpose:: Validate the winding order of polygons. +test-method:: ++ +-- + +1. Verify that all vertices of exterior polygon rings are ordered in the counterclockwise direction + +2. Verify that all interior rings are ordered in the clockwise direction. + +-- +==== \ No newline at end of file diff --git a/format-specs/abstract_tests/TEST008.adoc b/format-specs/abstract_tests/TEST008.adoc new file mode 100644 index 0000000..f30d6af --- /dev/null +++ b/format-specs/abstract_tests/TEST008.adoc @@ -0,0 +1,14 @@ + +[abstract_test] +==== +[%metadata] +identifier:: /conf/core/bbox +target:: /req/core/bbox +test-purpose:: Validate that the bounding boxes are constructed correctly. +test-method:: ++ +-- +1. Verify that the bbox, if specified, is encoded with an array representing the range of values for each dimension in the geometry coordinates. + +-- +==== \ No newline at end of file diff --git a/format-specs/code/README.adoc b/format-specs/code/README.adoc new file mode 100644 index 0000000..14fca89 --- /dev/null +++ b/format-specs/code/README.adoc @@ -0,0 +1 @@ +Sample code may be stored in this folder, organized as you see fit diff --git a/format-specs/document.adoc b/format-specs/document.adoc new file mode 100644 index 0000000..19ebbd9 --- /dev/null +++ b/format-specs/document.adoc @@ -0,0 +1,60 @@ += GeoParquet Specification +:doctype: standard +:encoding: utf-8 +:lang: en +:status: draft +:committee: technical +:draft: 3.0 +:external-id: http://www.opengis.net/doc/IS/geoparquet/1.0 +:docnumber: 24-013 +:received-date: 2029-03-30 +:issued-date: 2029-03-30 +:published-date: 2029-03-30 +:fullname: Chris Holmes +:fullname_2: Tim Schaub +:fullname_3: Joris Van den Bossche +:fullname_4: Kyle Barron +:fullname_5: Javier de la Torre +:docsubtype: Interface +:keywords: ogcdoc, OGC document, geoparquet, parquet, columnar, cloud +:submitting-organizations: Planet; CARTO +:mn-document-class: ogc +:mn-output-extensions: xml,html,doc,pdf +:local-cache-only: +:data-uri-image: +:pdf-uri: ./document.pdf +:xml-uri: ./document.xml +:doc-uri: ./document.doc +:edition: 1.0.0 + +//// +Make sure to complete each included document +//// +include::sections/clause_0_front_material.adoc[] + +include::sections/clause_1_scope.adoc[] + +include::sections/clause_2_conformance.adoc[] + +include::sections/clause_3_references.adoc[] + +include::sections/clause_4_terms_and_definitions.adoc[] + +include::sections/clause_5_conventions.adoc[] + +include::sections/clause_6_normative_text.adoc[] + + +//// +add or remove annexes after "A" as necessary +//// + +include::sections/annex-a.adoc[] + +//// +Revision History should be the last annex before the Bibliography +Bibliography should be the last annex +//// +include::sections/annex-history.adoc[] + +include::sections/annex-bibliography.adoc[] diff --git a/format-specs/figures/README.adoc b/format-specs/figures/README.adoc new file mode 100644 index 0000000..67909ae --- /dev/null +++ b/format-specs/figures/README.adoc @@ -0,0 +1,5 @@ +Figures go here. + +Each figure is a separate file with the naming convention: + +"FIGn.xxx" where "n" is a number with leading zeroes appropriate for the total number of figures and "xxx" is the appropriate extension for the file type. \ No newline at end of file diff --git a/format-specs/images/README.adoc b/format-specs/images/README.adoc new file mode 100644 index 0000000..12e6fb0 --- /dev/null +++ b/format-specs/images/README.adoc @@ -0,0 +1,5 @@ +Image files for graphics go here. Image files for figures go in the "figures" directory. Only place in here images not used in figures (e.g., as parts of tables, as logos, etc.) + +Each graphic is a separate file with the naming convention: + +"GRPn.xxx" where "n" is a sequential number with leading zeroes appropriate for the total number of graphics and "xxx" is the appropriate extension for the file type. diff --git a/format-specs/notes.txt b/format-specs/notes.txt new file mode 100644 index 0000000..f9538c3 --- /dev/null +++ b/format-specs/notes.txt @@ -0,0 +1,3 @@ +Confirm the target type of the Abstract Test suite. Presumably it is the Parquet file. + +Confirm the editors, submitters and contributors. \ No newline at end of file diff --git a/format-specs/recommendations/recommendation001.adoc b/format-specs/recommendations/recommendation001.adoc new file mode 100644 index 0000000..ff2ee00 --- /dev/null +++ b/format-specs/recommendations/recommendation001.adoc @@ -0,0 +1,6 @@ +[recommendation] +==== +[%metadata] +identifier:: /rec/core/encoding +part:: The geometry encoding SHOULD be the https://portal.ogc.org/files/?artifact_id=18241[OpenGIS® Implementation Specification for Geographic information — Simple feature access — Part 1: Common architecture] WKB representation (using codes for 3D geometry types in the [1001,1007] range). +==== \ No newline at end of file diff --git a/format-specs/recommendations/recommendation002.adoc b/format-specs/recommendations/recommendation002.adoc new file mode 100644 index 0000000..7c61198 --- /dev/null +++ b/format-specs/recommendations/recommendation002.adoc @@ -0,0 +1,6 @@ +[recommendation] +==== +[%metadata] +identifier:: /rec/core/orientation-spherical-edges +part:: If edges is “spherical”, the orientation SHOULD always be set to counterclockwise +==== \ No newline at end of file diff --git a/format-specs/recommendations/recommendation003.adoc b/format-specs/recommendations/recommendation003.adoc new file mode 100644 index 0000000..5481762 --- /dev/null +++ b/format-specs/recommendations/recommendation003.adoc @@ -0,0 +1,6 @@ +[recommendation] +==== +[%metadata] +identifier:: /rec/core/feature-identifiers +part:: If you are using GeoParquet to serialize geospatial data with feature identifiers, you SHOULD create your own https://github.com/apache/parquet-format#metadata[file key/value metadata] to indicate the column that represents this identifier. +==== \ No newline at end of file diff --git a/format-specs/requirements/README.adoc b/format-specs/requirements/README.adoc new file mode 100644 index 0000000..ab7871f --- /dev/null +++ b/format-specs/requirements/README.adoc @@ -0,0 +1,15 @@ +This folder contains requirements description. + +Each file is a single requirement. The naming convention for these files is: + +"REQn.adoc" where "n" corresponds to the requirement number. Numbers should have preceding zeros appropriate for the total number of requirements in the project (e.g., the first requirement could be REQ001 if less than 1000 requirements are anticipated). + +The requirement files are integrated into the main document as links. + +The requirement is expressed according to this pattern: + +NOTE: for each requirement, there should be a corresponding Abstract Test in the "abstract_tests" folder. + +NOTE: sample code may reference one or more requirements and should state which requirements are included in the code by adding the following line to the Extended Description: + +"#REQS: reqnum1,reqnum2,...reqnumn" diff --git a/format-specs/requirements/requirement001.adoc b/format-specs/requirements/requirement001.adoc new file mode 100644 index 0000000..80adcc7 --- /dev/null +++ b/format-specs/requirements/requirement001.adoc @@ -0,0 +1,7 @@ +[requirement] +==== +[%metadata] +identifier:: /req/core/geometry-columns +part:: Geometry columns SHALL be stored using the BYTE_ARRAY parquet type. +part:: Geometries SHALL be encoded as https://en.wikipedia.org/wiki/Well-known_text_representation_of_geometry#Well-known_binary[Well Known Binary (WKB)]. +==== \ No newline at end of file diff --git a/format-specs/requirements/requirement002.adoc b/format-specs/requirements/requirement002.adoc new file mode 100644 index 0000000..f946801 --- /dev/null +++ b/format-specs/requirements/requirement002.adoc @@ -0,0 +1,7 @@ +[requirement] +==== +[%metadata] +identifier:: /req/core/nesting +part:: Geometry columns SHALL be at the root of the schema. +part:: A geometry SHALL NOT be a group field or nested in a group. +==== \ No newline at end of file diff --git a/format-specs/requirements/requirement003.adoc b/format-specs/requirements/requirement003.adoc new file mode 100644 index 0000000..a549a97 --- /dev/null +++ b/format-specs/requirements/requirement003.adoc @@ -0,0 +1,7 @@ +[requirement] +==== +[%metadata] +identifier:: /req/core/repetition +part:: The repetition for all geometry columns SHALL be “required” (exactly one) or “optional” (zero or one). +part:: A geometry column SHALL NOT be repeated. +==== \ No newline at end of file diff --git a/format-specs/requirements/requirement004.adoc b/format-specs/requirements/requirement004.adoc new file mode 100644 index 0000000..29c1848 --- /dev/null +++ b/format-specs/requirements/requirement004.adoc @@ -0,0 +1,9 @@ +[requirement] +==== +[%metadata] +identifier:: /req/core/metadata +part:: A GeoParquet file SHALL include a geo key in the Parquet metadata (see FileMetaData::key_value_metadata). +part:: The value of this key SHALL be a JSON-encoded UTF-8 string representing the file and column metadata that validates against the GeoParquet metadata schema. +part:: Each geometry column in the dataset SHALL be included in the columns field (specified in <>) with the following content (specified in <>), keyed by the column name +==== + diff --git a/format-specs/requirements/requirement005.adoc b/format-specs/requirements/requirement005.adoc new file mode 100644 index 0000000..579cc9c --- /dev/null +++ b/format-specs/requirements/requirement005.adoc @@ -0,0 +1,7 @@ +[requirement] +==== +[%metadata] +identifier:: /req/core/crs +part:: If CRS is provided, the CRS SHALL be provided in https://proj.org/specifications/projjson.html[PROJJSON] format. +part:: If CRS is not provided, all coordinates in the geometries MUST use longitude, latitude based on the WGS84 datum, and the default value is https://www.opengis.net/def/crs/OGC/1.3/CRS84[OGC:CRS84] for CRS-aware implementations. +==== \ No newline at end of file diff --git a/format-specs/requirements/requirement006.adoc b/format-specs/requirements/requirement006.adoc new file mode 100644 index 0000000..3556e21 --- /dev/null +++ b/format-specs/requirements/requirement006.adoc @@ -0,0 +1,6 @@ +[requirement] +==== +[%metadata] +identifier:: /req/core/epoch +part:: If the crs field defines a dynamic CRS, the coordinates SHALL always be qualified with the epoch at which they are valid. +==== \ No newline at end of file diff --git a/format-specs/requirements/requirement007.adoc b/format-specs/requirements/requirement007.adoc new file mode 100644 index 0000000..b1b943d --- /dev/null +++ b/format-specs/requirements/requirement007.adoc @@ -0,0 +1,7 @@ +[requirement] +==== +[%metadata] +identifier:: /req/core/orientation +part:: All vertices of exterior polygon rings SHALL be ordered in the counterclockwise direction +part:: All interior rings SHALL be ordered in the clockwise direction. +==== \ No newline at end of file diff --git a/format-specs/requirements/requirement008.adoc b/format-specs/requirements/requirement008.adoc new file mode 100644 index 0000000..a088cff --- /dev/null +++ b/format-specs/requirements/requirement008.adoc @@ -0,0 +1,6 @@ +[requirement] +==== +[%metadata] +identifier:: /req/core/bbox +part:: The bbox, if specified, SHALL be encoded with an array representing the range of values for each dimension in the geometry coordinates. +==== \ No newline at end of file diff --git a/format-specs/requirements/requirements_class.adoc b/format-specs/requirements/requirements_class.adoc new file mode 100644 index 0000000..c10edd2 --- /dev/null +++ b/format-specs/requirements/requirements_class.adoc @@ -0,0 +1,15 @@ +[[rc_table-core]] +[requirements_class] +.Requirements Class Core +==== +[%metadata] +identifier:: /req/core +requirement:: /req/core/geometry-columns +requirement:: /req/core/nesting +requirement:: /req/core/repetition +requirement:: /req/core/metadata +requirement:: /req/core/crs +requirement:: /req/core/epoch +requirement:: /req/core/orientation +requirement:: /req/core/bbox +==== \ No newline at end of file diff --git a/format-specs/sections/annex-a.adoc b/format-specs/sections/annex-a.adoc new file mode 100644 index 0000000..56fc099 --- /dev/null +++ b/format-specs/sections/annex-a.adoc @@ -0,0 +1,6 @@ +[appendix] +== Conformance Class Abstract Test Suite (Normative) + +=== Conformance Class "Core" + +include::../abstract_tests/ATS_class_core.adoc[] \ No newline at end of file diff --git a/format-specs/sections/annex-bibliography.adoc b/format-specs/sections/annex-bibliography.adoc new file mode 100644 index 0000000..e6fa872 --- /dev/null +++ b/format-specs/sections/annex-bibliography.adoc @@ -0,0 +1,6 @@ +[bibliography] +[[Bibliography]] +== Bibliography + + +* [[[ISO13249-3,ISO/IEC 13249-3:2016]]], \ No newline at end of file diff --git a/format-specs/sections/annex-history.adoc b/format-specs/sections/annex-history.adoc new file mode 100644 index 0000000..196dc0c --- /dev/null +++ b/format-specs/sections/annex-history.adoc @@ -0,0 +1,8 @@ +[appendix] +== Revision History + +[width="90%",options="header"] +|=== +|Date |Release |Editor | Primary clauses modified |Description +|2022-04-18 |0.1 |GeoParquet SWG |all |initial version +|=== diff --git a/format-specs/sections/clause_0_front_material.adoc b/format-specs/sections/clause_0_front_material.adoc new file mode 100644 index 0000000..4fef5ed --- /dev/null +++ b/format-specs/sections/clause_0_front_material.adoc @@ -0,0 +1,48 @@ +.Preface + +This is version 1.0.0 of the GeoParquet specification. See the https://geoparquet.org/releases/v1.0.0/schema.json[JSON Schema] to validate metadata for this version. + +//// +*OGC Declaration* +//// + +Attention is drawn to the possibility that some of the elements of this document may be the subject of patent rights. The Open Geospatial Consortium shall not be held responsible for identifying any or all such patent rights. + +Recipients of this document are requested to submit, with their comments, notification of any relevant patent claims or other intellectual property rights of which they may be aware that might be infringed by any implementation of the standard set forth in this document, and to provide supporting documentation. + + + +[abstract] +== Abstract + +The Apache Parquet provides a standardized open-source columnar storage format. The GeoParquet specification defines how geospatial data should be stored in parquet format, including the representation of geometries and the required additional metadata. + + +// Security Considerations - Since this standard does not specify security considerations, metanorma will automatically include text stating that "No security considerations have been made for this Standard.". + +== Submitters + +All questions regarding this submission should be directed to the editor or the submitters: + +|=== +|*Name* |*Affiliation* +| Chris Holmes| Planet +| Tim Schaub| Planet +| Javier de la Torre| CARTO +| | +| | +|=== + +== Contributors + +//This clause is optional. + +Additional contributors to this Standard include the following: + +|=== +|*Name* |*Affiliation* |*OGC Member* +| | | Yes/No +| | | Yes/No +| | | Yes/No +| | | Yes/No +|=== diff --git a/format-specs/sections/clause_1_scope.adoc b/format-specs/sections/clause_1_scope.adoc new file mode 100644 index 0000000..29883ba --- /dev/null +++ b/format-specs/sections/clause_1_scope.adoc @@ -0,0 +1,3 @@ +== Scope + +The GeoParquet specification defines how geospatial data should be stored in parquet format, including the representation of geometries and the required additional metadata. diff --git a/format-specs/sections/clause_2_conformance.adoc b/format-specs/sections/clause_2_conformance.adoc new file mode 100644 index 0000000..dd8711a --- /dev/null +++ b/format-specs/sections/clause_2_conformance.adoc @@ -0,0 +1,5 @@ +== Conformance + +Conformance with this standard shall be checked using all the relevant tests specified in Annex A (normative) of this document. The framework, concepts, and methodology for testing, and the criteria to be achieved to claim conformance are specified in the OGC Compliance Testing Policies and Procedures and the OGC Compliance Testing web site. + +In order to conform to this OGC® interface standard, a software implementation shall choose to implement the Core conformance class. diff --git a/format-specs/sections/clause_3_references.adoc b/format-specs/sections/clause_3_references.adoc new file mode 100644 index 0000000..d95ea5e --- /dev/null +++ b/format-specs/sections/clause_3_references.adoc @@ -0,0 +1,7 @@ +[bibliography] +== References + +* [[[apache_parquet,Apache Parquet]]], Apache Software Foundation: Apache Parquet, https://parquet.apache.org/, last accessed 2024/04/10. +* [[[projjson,PROJJSON]]], PROJ contributors: PROJJSON Specification for PROJ 9.4, https://proj.org/specifications/projjson.html, last accessed 2024/04/10. +* [[[OGC06-103r3,OGC 06-103r3]]], +* [[[ISO19111_2019,ISO 19111:2019]]], \ No newline at end of file diff --git a/format-specs/sections/clause_4_terms_and_definitions.adoc b/format-specs/sections/clause_4_terms_and_definitions.adoc new file mode 100644 index 0000000..f084a3e --- /dev/null +++ b/format-specs/sections/clause_4_terms_and_definitions.adoc @@ -0,0 +1,27 @@ +== Terms and definitions + + +NOTE: The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119. + +=== epoch + point in time + +Note 1 to entry: In this document an epoch is expressed in the Gregorian calendar as a decimal year. + +[.source] +<> + +[example] +2017-03-25 in the Gregorian calendar is epoch 2017,23. + +=== coordinate reference system + +coordinate system that is related to an object by a datum + +NOTE: Geodetic and vertical datums are referred to as reference frames. + +NOTE: For geodetic and vertical reference frames, the object will be the Earth. In planetary applications, geodetic and vertical reference frames may be applied to other celestial bodies. + +[.source] +<> + diff --git a/format-specs/sections/clause_5_conventions.adoc b/format-specs/sections/clause_5_conventions.adoc new file mode 100644 index 0000000..cdcfe48 --- /dev/null +++ b/format-specs/sections/clause_5_conventions.adoc @@ -0,0 +1,11 @@ +== Conventions + +This sections provides details and examples for any conventions used in the document. Examples of conventions are symbols, abbreviations, use of XML schema, or special notes regarding how to read the document. + +=== Identifiers +The normative provisions in this standard are denoted by the URI + +`http://www.opengis.net/spec/geoparquet/1.0` + +All requirements and conformance tests that appear in this document are denoted by partial URIs which are relative to this base. + diff --git a/format-specs/sections/clause_6_normative_text.adoc b/format-specs/sections/clause_6_normative_text.adoc new file mode 100644 index 0000000..84b6a04 --- /dev/null +++ b/format-specs/sections/clause_6_normative_text.adoc @@ -0,0 +1,216 @@ +== Core Requirements Class + +include::/requirements/requirements_class.adoc[] + +=== Geometry columns + +Geometry columns MUST be stored using the BYTE_ARRAY parquet type. They MUST be encoded as https://en.wikipedia.org/wiki/Well-known_text_representation_of_geometry#Well-known_binary[WKB]. See the https://geoparquet.org/releases/v1.0.0/#encoding[encoding] section below for more details. + +include::/requirements/requirement001.adoc[] + +=== Nesting + +Geometry columns MUST be at the root of the schema. A geometry MUST NOT be a group field or nested in a group. In practice, this means that when writing to GeoParquet from another format, geometries cannot be contained in complex or nested types such as structs, lists, arrays, or map types. + +include::/requirements/requirement002.adoc[] + +=== Repetition + +The repetition for all geometry columns MUST be "required" (exactly one) or "optional" (zero or one). A geometry column MUST NOT be repeated. A GeoParquet file MAY have multiple geometry columns with different names, but those geometry columns cannot be repeated. + +include::/requirements/requirement003.adoc[] + +=== Metadata + +GeoParquet files include additional metadata at two levels: + +[arabic] +. File metadata indicating things like the version of this specification used +. Column metadata with additional metadata for each geometry column + +A GeoParquet file MUST include a geo key in the Parquet metadata (see https://github.com/apache/parquet-format#metadata[FileMetaData::key_value_metadata]). The value of this key MUST be a JSON-encoded UTF-8 string representing the file and column metadata that validates against the https://geoparquet.org/releases/v1.0.0/schema.json[GeoParquet metadata schema]. The file and column metadata fields are described in <>. + +include::/requirements/requirement004.adoc[] + +=== File metadata + +[[tbl_file_and_column_metadata_fields]] +.file and column metadata fields +[cols=",,",options="header",] +|=== +|*Field Name* |*Type* |*Description* +|version |string |*REQUIRED.* The version identifier for the GeoParquet specification. +|primary_column |string |*REQUIRED.* The name of the "primary" geometry column. In cases where a GeoParquet file contains multiple geometry columns, the primary geometry may be used by default in geospatial operations. +|columns |object |*REQUIRED.* Metadata about geometry columns. Each key is the name of a geometry column in the table. +|=== + +At this level, additional implementation-specific fields (e.g. library name) MAY be present, and readers should be robust in ignoring those. + +=== Column metadata + +Each geometry column in the dataset MUST be included in the columns field above with the following content, keyed by the column name: + +[[tbl_column_metadata]] +.Column metadata +[cols=",,",options="header",] +|=== +|*Field Name* |*Type* |*Description* +|encoding |string |*REQUIRED.* Name of the geometry encoding format. Currently only "WKB" is supported. +|geometry_types |[string] |*REQUIRED.* The geometry types of all geometries, or an empty array if they are not known. +|crs |object/null |https://proj.org/specifications/projjson.html[PROJJSON] object representing the Coordinate Reference System (CRS) of the geometry. If the field is not provided, the default CRS is https://www.opengis.net/def/crs/OGC/1.3/CRS84[OGC:CRS84], which means the data in this column must be stored in longitude, latitude based on the WGS84 datum. +|orientation |string |Winding order of exterior ring of polygons. If present must be "counterclockwise"; interior rings are wound in opposite order. If absent, no assertions are made regarding the winding order. +|edges |string |Name of the coordinate system for the edges. Must be one of "planar" or "spherical". The default value is "planar". +|bbox |[number] |Bounding Box of the geometries in the file, formatted according to https://tools.ietf.org/html/rfc7946#section-5[RFC 7946, section 5]. +|epoch |number |Coordinate epoch in case of a dynamic CRS, expressed as a decimal year. +|=== + +=== CRS + +The Coordinate Reference System (CRS) is an optional parameter for each geometry column defined in GeoParquet format. + +The CRS MUST be provided in https://proj.org/specifications/projjson.html[PROJJSON] format, which is a JSON encoding of https://docs.opengeospatial.org/is/18-010r7/18-010r7.html[WKT2:2019 / ISO-19162:2019], which itself implements the model of http://docs.opengeospatial.org/as/18-005r4/18-005r4.html[OGC Topic 2: Referencing by coordinates abstract specification / ISO-19111:2019]. Apart from the difference of encodings, the semantics are intended to match WKT2:2019, and a CRS in one encoding can generally be represented in the other. + +If CRS is not provided, all coordinates in the geometries MUST use longitude, latitude based on the WGS84 datum, and the default value is https://www.opengis.net/def/crs/OGC/1.3/CRS84[OGC:CRS84] for CRS-aware implementations. + +https://www.opengis.net/def/crs/OGC/1.3/CRS84[OGC:CRS84] is equivalent to the well-known https://epsg.org/crs_4326/WGS-84.html[EPSG:4326] but changes the axis from latitude-longitude to longitude-latitude. + +Due to the large number of CRSes available and the difficulty of implementing all of them, we expect that a number of implementations will start without support for the optional crs field. Users are recommended to store their data in longitude, latitude (OGC:CRS84 or not including the crs field) for it to work with the widest number of tools. Data that are more appropriately represented in particular projections may use an alternate coordinate reference system. We expect many tools will support alternate CRSes, but encourage users to check to ensure their chosen tool supports their chosen CRS. + +See below for additional details about representing or identifying OGC:CRS84. + +The value of this key may be explicitly set to null to indicate that there is no CRS assigned to this column (CRS is undefined or unknown). + +include::/requirements/requirement005.adoc[] + +=== Epoch + +In a dynamic CRS, coordinates of a point on the surface of the Earth may change with time. To be unambiguous, the coordinates must always be qualified with the epoch at which they are valid. + +The optional epoch field allows to specify this in case the crs field defines a a dynamic CRS. The coordinate epoch is expressed as a decimal year (e.g. 2021.47). Currently, this specification only supports an epoch per column (and not per geometry). + +include::/requirements/requirement006.adoc[] + +=== Encoding + +This is the binary format that the geometry is encoded in. The string "WKB", signifying Well Known Binary is the only current option, but future versions of the Standard may support alternative encodings. This SHOULD be the https://portal.ogc.org/files/?artifact_id=18241["OpenGIS® Implementation Specification for Geographic information - Simple feature access - Part 1: Common architecture"] WKB representation (using codes for 3D geometry types in the [1001,1007] range). This encoding is also consistent with the one defined in the https://www.iso.org/standard/60343.html[ISO/IEC 13249-3:2016 (Information technology - Database languages - SQL multimedia and application packages - Part 3: Spatial)] standard. + +include::/recommendations/recommendation001.adoc[] + +Note that the current version of the spec only allows for a subset of WKB: 2D or 3D geometries of the standard geometry types (the Point, LineString, Polygon, MultiPoint, MultiLineString, MultiPolygon, and GeometryCollection geometry types). This means that M values or non-linear geometry types are not yet supported. + +=== Coordinate axis order + +The axis order of the coordinates in WKB stored in a GeoParquet follows the de facto standard for axis order in WKB and is therefore always (x, y) where x is easting or longitude and y is northing or latitude. This ordering explicitly overrides the axis order as specified in the CRS. This follows the precedent of https://geopackage.org/[GeoPackage], see the https://www.geopackage.org/spec130/#gpb_spec[note in the GeoPackage Standard]. + +=== geometry_types + +This field captures the geometry types of the geometries in the column, when known. Accepted geometry types are: "Point", "LineString", "Polygon", "MultiPoint", "MultiLineString", "MultiPolygon", "GeometryCollection". + +In addition, the following rules are used: + +* In case of 3D geometries, a " Z" suffix gets added (e.g. ["Point Z"]). +* A list of multiple values indicates that multiple geometry types are present (e.g. ["Polygon", "MultiPolygon"]). +* An empty array explicitly signals that the geometry types are not known. +* The geometry types in the list must be unique (e.g. ["Point", "Point"] is not valid). + +It is expected that this field is strictly correct. For example, if having both polygons and multipolygons, it is not sufficient to specify ["MultiPolygon"], but it is expected to specify ["Polygon", "MultiPolygon"]. Or if having 3D points, it is not sufficient to specify ["Point"], but it is expected to list ["Point Z"]. + +=== Orientation + +This attribute indicates the winding order of polygons. The only available value is "counterclockwise". All vertices of exterior polygon rings MUST be ordered in the counterclockwise direction and all interior rings MUST be ordered in the clockwise direction. + +include::/requirements/requirement007.adoc[] + +If no value is set, no assertions are made about winding order or consistency of such between exterior and interior rings or between individual geometries within a dataset. Readers are responsible for verifying and if necessary re-ordering vertices as required for their analytical representation. + +Writers are encouraged but not required to set orientation="counterclockwise" for portability of the data within the broader ecosystem. + +It is RECOMMENDED to always set the orientation (to counterclockwise) if edges is "spherical" (see below). + +include::/recommendations/recommendation002.adoc[] + +=== Edges + +This attribute indicates how to interpret the edges of the geometries: whether the line between two points is a straight cartesian line or the shortest line on the sphere (geodesic line). Available values are: + +* "planar": use a flat cartesian coordinate system. +* "spherical": use a spherical coordinate system and radius derived from the spheroid defined by the coordinate reference system. + +If no value is set, the default value to assume is "planar". + +Note if edges is "spherical" then it is RECOMMENDED that orientation is always ensured to be "counterclockwise". If it is not set, it is not clear how polygons should be interpreted within spherical coordinate systems, which can lead to major analytical errors if interpreted incorrectly. In this case, software will typically interpret the rings of a polygon such that it encloses at most half of the sphere (i.e. the smallest polygon of both ways it could be interpreted). But the specification itself does not make any guarantee about this. + +=== Bounding box + +Bounding boxes are used to help define the spatial extent of each geometry column. Implementations of this schema may choose to use those bounding boxes to filter partitions (files) of a partitioned dataset. + +The bounding box (bbox), if specified, MUST be encoded with an array representing the range of values for each dimension in the geometry coordinates. For geometries in a geographic coordinate reference system, longitude and latitude values are listed for the most southwesterly coordinate followed by values for the most northeasterly coordinate. This follows the GeoJSON specification (https://tools.ietf.org/html/rfc7946#section-5[RFC 7946, section 5]), which also describes how to represent the bbox for a set of geometries that cross the antimeridian. + +include::/requirements/requirement008.adoc[] + +For non-geographic coordinate reference systems, the items in the bbox are minimum values for each dimension followed by maximum values for each dimension. For example, given geometries that have coordinates with two dimensions, the bbox would have the form [, , , ]. For three dimensions, the bbox would have the form [, , , , , ]. + +The bbox values are in the same coordinate reference system as the geometry. + +=== Additional information + +==== Feature identifiers + +If you are using GeoParquet to serialize geospatial data with feature identifiers, it is RECOMMENDED that you create your own https://github.com/apache/parquet-format#metadata[[.underline]#file key/value metadata#] to indicate the column that represents this identifier. As an example, GDAL writes additional metadata using the gdal:schema key including information about feature identifiers and other information outside the scope of the GeoParquet specification. + +include::/recommendations/recommendation003.adoc[] + +==== OGC:CRS84 details + +The PROJJSON object for OGC:CRS84 is: + +[%unnumbered%] +[source,json] +---- +{ + "$schema": "https://proj.org/schemas/v0.5/projjson.schema.json", + "type": "GeographicCRS", + "name": "WGS 84 longitude-latitude", + "datum": { + "type": "GeodeticReferenceFrame", + "name": "World Geodetic System 1984", + "ellipsoid": { + "name": "WGS 84", + "semi_major_axis": 6378137, + "inverse_flattening": 298.257223563 + } + }, + "coordinate_system": { + "subtype": "ellipsoidal", + "axis": [ + { + "name": "Geodetic longitude", + "abbreviation": "Lon", + "direction": "east", + "unit": "degree" + }, + { + "name": "Geodetic latitude", + "abbreviation": "Lat", + "direction": "north", + "unit": "degree" + } + ] + }, + "id": { + "authority": "OGC", + "code": "CRS84" + } +} +---- + +For implementations that operate entirely with longitude, latitude coordinates and are not CRS-aware or do not have easy access to CRS-aware libraries that can fully parse PROJJSON, it may be possible to infer that coordinates conform to the OGC:CRS84 CRS based on elements of the crs field. For simplicity, Javascript object dot notation is used to refer to nested elements. + +The CRS is likely equivalent to OGC:CRS84 for a GeoParquet file if the id element is present: + +* id.authority = "OGC" and id.code = "CRS84" +* id.authority = "EPSG" and id.code = 4326 (due to longitude, latitude ordering in this specification) + +It is reasonable for implementations to require that one of the above id elements are present and skip further tests to determine if the CRS is functionally equivalent with OGC:CRS84. + +Note: EPSG:4326 and OGC:CRS84 are equivalent with respect to this specification because this specification specifically overrides the coordinate axis order in the crs to be longitude-latitude.