Skip to content

Commit

Permalink
Updates
Browse files Browse the repository at this point in the history
* Move Use Cases to the Top
* Added section on Collection JSON
* Added note on accessing fields
  • Loading branch information
Tom Augspurger committed Oct 15, 2023
1 parent 9f3cfff commit b0d9a6a
Showing 1 changed file with 16 additions and 7 deletions.
23 changes: 16 additions & 7 deletions spec/stac-geoparquet-spec.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,12 @@ This document specifies how to map a set of [STAC Items](https://github.com/radi
[GeoParquet](https://geoparquet.org). It is directly inspired by the [STAC GeoParquet](https://github.com/stac-utils/stac-geoparquet)
library, but aims to provide guidance for anyone putting STAC data into GeoParquet.

## Use cases

* Provide a STAC GeoParquet that mirrors a static Collection as a way to query the whole dataset instead of reading every specific GeoJSON file.
* As an output format for STAC API responses that is more efficient than paging through thousands of pages of GeoJSON.
* Provide efficient access to specific fields of a STAC item, thanks to Parquet's columnar format.

## Guidelines

Each row in the Parquet Dataset represents a single STAC item. Most all the fields in a STAC Item should be mapped to a column in GeoParquet. We embrace Parquet structures where possible, mapping
Expand All @@ -29,7 +35,7 @@ most of the fields should be the same in STAC and in GeoParquet.
* Any field in 'properties' of the STAC item should be moved up to be a top-level field in the GeoParquet.
* STAC GeoParquet does not support properties that are named such that they collide with a top-level key.
* datetime columns should be stored as a [native timestamp][timestamp], not as a string
* The Collection JSON should be included in the Parquet metadata (TODO: flesh this out more)
* The Collection JSON should be included in the Parquet metadata. See [Collection JSON](#collection-json) below.

### Link Struct

Expand Down Expand Up @@ -62,18 +68,21 @@ To take advantage of Parquet's columnar nature and compression, the assets shoul

See [Asset Object][asset] for more.

## Mapping to other geospatial data formats
## Collection JSON

The principles here can likely be used to map into other geospatial data formats (GeoPackage, FlatGeobuf, etc), but we embrace Parquet's nested 'structs' for some of the mappings, so other formats will need to do something different. The obvious thing to do is to dump JSON into those fields, but that's outside the scope of this document, and we recommend creating a general document for that.
To make a stac-geoparquet file a fully self-contained representation, you can
include the Collection JSON in the Parquet metadata. If present in the [Parquet
file metadata][parquet-metadata], the key must be `stac:collection` and the
value must be a JSON string with the Collection JSON.

## Use cases
## Mapping to other geospatial data formats

* Provide a STAC GeoParquet that mirrors a static Collection as a way to query the whole dataset instead of reading every specific GeoJSON file.
* As an output format for STAC API responses that is more efficient than paging through thousands of pages of GeoJSON.
The principles here can likely be used to map into other geospatial data formats (GeoPackage, FlatGeobuf, etc), but we embrace Parquet's nested 'structs' for some of the mappings, so other formats will need to do something different. The obvious thing to do is to dump JSON into those fields, but that's outside the scope of this document, and we recommend creating a general document for that.

[media-type]: https://github.com/radiantearth/stac-spec/blob/master/item-spec/item-spec.md#asset-media-type
[asset]: https://github.com/radiantearth/stac-spec/blob/master/item-spec/item-spec.md#asset-object
[asset-roles]: https://github.com/radiantearth/stac-spec/blob/master/item-spec/item-spec.md#asset-roles
[link]: https://github.com/radiantearth/stac-spec/blob/master/item-spec/item-spec.md#link-object
[common-media-types]: https://github.com/radiantearth/stac-spec/blob/master/best-practices.md#common-media-types-in-stac
[timestamp]: https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#timestamp
[timestamp]: https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#timestamp
[parquet-metadata]: https://github.com/apache/parquet-format#metadata

0 comments on commit b0d9a6a

Please sign in to comment.