From d7096e4cdeec302f6c632a09e0d35cb8198f066d Mon Sep 17 00:00:00 2001 From: Szehon Ho Date: Tue, 20 Aug 2024 00:52:47 +0200 Subject: [PATCH] [Draft] Spec: Support geo type --- format/spec.md | 335 +++++++++++++++++++++++++++---------------------- 1 file changed, 183 insertions(+), 152 deletions(-) diff --git a/format/spec.md b/format/spec.md index c322f8174fe2..d62f1cc018b6 100644 --- a/format/spec.md +++ b/format/spec.md @@ -190,6 +190,7 @@ Supported primitive types are defined in the table below. Primitive types added | | **`uuid`** | Universally unique identifiers | Should use 16-byte fixed | | | **`fixed(L)`** | Fixed-length byte array of length L | | | | **`binary`** | Arbitrary-length byte array | | +| [v3](#version-3) | **`geometry(C, T, E)`** | An object of the simple feature geometry model as defined by Appendix G; This may be any of the Geometry subclasses defined therein; coordinate reference system C [4], coordinate reference system type T [5], edges E [6] | C, T, E are fixed. Encoded as WKB, see Appendix G. | Notes: @@ -198,6 +199,9 @@ Notes: - Timestamp values _with time zone_ represent a point in time: values are stored as UTC and do not retain a source time zone (`2017-11-16 17:10:34 PST` is stored/retrieved as `2017-11-17 01:10:34 UTC` and these values are considered identical). - Timestamp values _without time zone_ represent a date and time of day regardless of zone: the time value is independent of zone adjustments (`2017-11-16 17:10:34` is always retrieved as `2017-11-16 17:10:34`). 3. Character strings must be stored as UTF-8 encoded byte arrays. +4. Coordinate Reference System, i.e. mapping of how coordinates refer to precise locations on earth. Defaults to "OGC:CRS84". Fixed and cannot be changed by schema evolution. +5. Coordinate Reference System Type, value specifying type of Coordinate Reference System field, if that is different from default. Defaults to empty string. Fixed and cannot be changed by schema evolution. +6. Edges, interpretation for edges within geometry object, i.e. whether the edge between points represent a straight cartesian line or the shortest line on the sphere. Please note that it only applies to polygons. Fixed and cannot be changed by schema evolution. For details on how to serialize a schema to JSON, see Appendix C. @@ -323,16 +327,17 @@ Partition field IDs must be reused if an existing partition spec contains an equ #### Partition Transforms -| Transform name | Description | Source types | Result type | -|-------------------|--------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------|-------------| -| **`identity`** | Source value, unmodified | Any | Source type | -| **`bucket[N]`** | Hash of value, mod `N` (see below) | `int`, `long`, `decimal`, `date`, `time`, `timestamp`, `timestamptz`, `timestamp_ns`, `timestamptz_ns`, `string`, `uuid`, `fixed`, `binary` | `int` | -| **`truncate[W]`** | Value truncated to width `W` (see below) | `int`, `long`, `decimal`, `string`, `binary` | Source type | -| **`year`** | Extract a date or timestamp year, as years from 1970 | `date`, `timestamp`, `timestamptz`, `timestamp_ns`, `timestamptz_ns` | `int` | -| **`month`** | Extract a date or timestamp month, as months from 1970-01-01 | `date`, `timestamp`, `timestamptz`, `timestamp_ns`, `timestamptz_ns` | `int` | -| **`day`** | Extract a date or timestamp day, as days from 1970-01-01 | `date`, `timestamp`, `timestamptz`, `timestamp_ns`, `timestamptz_ns` | `int` | -| **`hour`** | Extract a timestamp hour, as hours from 1970-01-01 00:00:00 | `timestamp`, `timestamptz`, `timestamp_ns`, `timestamptz_ns` | `int` | -| **`void`** | Always produces `null` | Any | Source type or `int` | +| Transform name | Description | Source types | Result type | +|-------------------|--------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------|----------------------| +| **`identity`** | Source value, unmodified | Any | Source type | +| **`bucket[N]`** | Hash of value, mod `N` (see below) | `int`, `long`, `decimal`, `date`, `time`, `timestamp`, `timestamptz`, `timestamp_ns`, `timestamptz_ns`, `string`, `uuid`, `fixed`, `binary` | `int` | +| **`truncate[W]`** | Value truncated to width `W` (see below) | `int`, `long`, `decimal`, `string`, `binary` | Source type | +| **`year`** | Extract a date or timestamp year, as years from 1970 | `date`, `timestamp`, `timestamptz`, `timestamp_ns`, `timestamptz_ns` | `int` | +| **`month`** | Extract a date or timestamp month, as months from 1970-01-01 | `date`, `timestamp`, `timestamptz`, `timestamp_ns`, `timestamptz_ns` | `int` | +| **`day`** | Extract a date or timestamp day, as days from 1970-01-01 | `date`, `timestamp`, `timestamptz`, `timestamp_ns`, `timestamptz_ns` | `int` | +| **`hour`** | Extract a timestamp hour, as hours from 1970-01-01 00:00:00 | `timestamp`, `timestamptz`, `timestamp_ns`, `timestamptz_ns` | `int` | +| **`void`** | Always produces `null` | Any | Source type or `int` | +| **`xz2[R]`** | XZ-Ordering, resolution `R` (see below) | `geometry` | `long` | All transforms must return `null` for a `null` input value. @@ -373,6 +378,13 @@ Notes: 3. Strings are truncated to a valid UTF-8 string with no more than `L` code points. 4. In contrast to strings, binary values do not have an assumed encoding and are truncated to `L` bytes. +#### XZ2 Transform Details + +XZ2 is based on the paper [XZ-Ordering: A Space-Filling Curve for Objects with Spatial Extensions](https://www.dbs.ifi.lmu.de/Publikationen/Boehm/Ordering_99.pdf). + +Notes: +1. Resolution must be a positive integer. Defaults to TODO +2. Currently supported only for geometry types with default coordinate reference system and edges=`planar`. #### Partition Evolution @@ -444,28 +456,28 @@ The schema of a manifest file is a struct called `manifest_entry` with the follo `data_file` is a struct with the following fields: -| v1 | v2 | Field id, name | Type | Description | -| ---------- | ---------- |-----------------------------------|------------------------------|-------------| -| | _required_ | **`134 content`** | `int` with meaning: `0: DATA`, `1: POSITION DELETES`, `2: EQUALITY DELETES` | Type of content stored by the data file: data, equality deletes, or position deletes (all v1 files are data files) | -| _required_ | _required_ | **`100 file_path`** | `string` | Full URI for the file with FS scheme | -| _required_ | _required_ | **`101 file_format`** | `string` | String file format name, avro, orc or parquet | -| _required_ | _required_ | **`102 partition`** | `struct<...>` | Partition data tuple, schema based on the partition spec output using partition field ids for the struct field ids | -| _required_ | _required_ | **`103 record_count`** | `long` | Number of records in this file | -| _required_ | _required_ | **`104 file_size_in_bytes`** | `long` | Total file size in bytes | -| _required_ | | ~~**`105 block_size_in_bytes`**~~ | `long` | **Deprecated. Always write a default in v1. Do not write in v2.** | -| _optional_ | | ~~**`106 file_ordinal`**~~ | `int` | **Deprecated. Do not write.** | -| _optional_ | | ~~**`107 sort_columns`**~~ | `list<112: int>` | **Deprecated. Do not write.** | -| _optional_ | _optional_ | **`108 column_sizes`** | `map<117: int, 118: long>` | Map from column id to the total size on disk of all regions that store the column. Does not include bytes necessary to read other columns, like footers. Leave null for row-oriented formats (Avro) | -| _optional_ | _optional_ | **`109 value_counts`** | `map<119: int, 120: long>` | Map from column id to number of values in the column (including null and NaN values) | -| _optional_ | _optional_ | **`110 null_value_counts`** | `map<121: int, 122: long>` | Map from column id to number of null values in the column | -| _optional_ | _optional_ | **`137 nan_value_counts`** | `map<138: int, 139: long>` | Map from column id to number of NaN values in the column | -| _optional_ | _optional_ | **`111 distinct_counts`** | `map<123: int, 124: long>` | Map from column id to number of distinct values in the column; distinct counts must be derived using values in the file by counting or using sketches, but not using methods like merging existing distinct counts | -| _optional_ | _optional_ | **`125 lower_bounds`** | `map<126: int, 127: binary>` | Map from column id to lower bound in the column serialized as binary [1]. Each value must be less than or equal to all non-null, non-NaN values in the column for the file [2] | -| _optional_ | _optional_ | **`128 upper_bounds`** | `map<129: int, 130: binary>` | Map from column id to upper bound in the column serialized as binary [1]. Each value must be greater than or equal to all non-null, non-Nan values in the column for the file [2] | -| _optional_ | _optional_ | **`131 key_metadata`** | `binary` | Implementation-specific key metadata for encryption | -| _optional_ | _optional_ | **`132 split_offsets`** | `list<133: long>` | Split offsets for the data file. For example, all row group offsets in a Parquet file. Must be sorted ascending | -| | _optional_ | **`135 equality_ids`** | `list<136: int>` | Field ids used to determine row equality in equality delete files. Required when `content=2` and should be null otherwise. Fields with ids listed in this column must be present in the delete file | -| _optional_ | _optional_ | **`140 sort_order_id`** | `int` | ID representing sort order for this file [3]. | +| v1 | v2 | Field id, name | Type | Description | +| ---------- | ---------- |-----------------------------------|------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| | _required_ | **`134 content`** | `int` with meaning: `0: DATA`, `1: POSITION DELETES`, `2: EQUALITY DELETES` | Type of content stored by the data file: data, equality deletes, or position deletes (all v1 files are data files) | +| _required_ | _required_ | **`100 file_path`** | `string` | Full URI for the file with FS scheme | +| _required_ | _required_ | **`101 file_format`** | `string` | String file format name, avro, orc or parquet | +| _required_ | _required_ | **`102 partition`** | `struct<...>` | Partition data tuple, schema based on the partition spec output using partition field ids for the struct field ids | +| _required_ | _required_ | **`103 record_count`** | `long` | Number of records in this file | +| _required_ | _required_ | **`104 file_size_in_bytes`** | `long` | Total file size in bytes | +| _required_ | | ~~**`105 block_size_in_bytes`**~~ | `long` | **Deprecated. Always write a default in v1. Do not write in v2.** | +| _optional_ | | ~~**`106 file_ordinal`**~~ | `int` | **Deprecated. Do not write.** | +| _optional_ | | ~~**`107 sort_columns`**~~ | `list<112: int>` | **Deprecated. Do not write.** | +| _optional_ | _optional_ | **`108 column_sizes`** | `map<117: int, 118: long>` | Map from column id to the total size on disk of all regions that store the column. Does not include bytes necessary to read other columns, like footers. Leave null for row-oriented formats (Avro) | +| _optional_ | _optional_ | **`109 value_counts`** | `map<119: int, 120: long>` | Map from column id to number of values in the column (including null and NaN values) | +| _optional_ | _optional_ | **`110 null_value_counts`** | `map<121: int, 122: long>` | Map from column id to number of null values in the column | +| _optional_ | _optional_ | **`137 nan_value_counts`** | `map<138: int, 139: long>` | Map from column id to number of NaN values in the column | +| _optional_ | _optional_ | **`111 distinct_counts`** | `map<123: int, 124: long>` | Map from column id to number of distinct values in the column; distinct counts must be derived using values in the file by counting or using sketches, but not using methods like merging existing distinct counts | +| _optional_ | _optional_ | **`125 lower_bounds`** | `map<126: int, 127: binary>` | Map from column id to lower bound in the column serialized as binary [1]. Each value must be less than or equal to all non-null, non-NaN values in the column for the file [2] For Geometry type, this is a Point composed of the min value of each dimension in all Points in the Geometry. | +| _optional_ | _optional_ | **`128 upper_bounds`** | `map<129: int, 130: binary>` | Map from column id to upper bound in the column serialized as binary [1]. Each value must be greater than or equal to all non-null, non-Nan values in the column for the file [2] For Geometry type, this is a Point composed of the max value of each dimension in all Points in the Geometry. | +| _optional_ | _optional_ | **`131 key_metadata`** | `binary` | Implementation-specific key metadata for encryption | +| _optional_ | _optional_ | **`132 split_offsets`** | `list<133: long>` | Split offsets for the data file. For example, all row group offsets in a Parquet file. Must be sorted ascending | +| | _optional_ | **`135 equality_ids`** | `list<136: int>` | Field ids used to determine row equality in equality delete files. Required when `content=2` and should be null otherwise. Fields with ids listed in this column must be present in the delete file | +| _optional_ | _optional_ | **`140 sort_order_id`** | `int` | ID representing sort order for this file [3]. | Notes: @@ -576,12 +588,12 @@ Manifest list files store `manifest_file`, a struct with the following fields: `field_summary` is a struct with the following fields: -| v1 | v2 | Field id, name | Type | Description | -| ---------- | ---------- |-------------------------|---------------|-------------| -| _required_ | _required_ | **`509 contains_null`** | `boolean` | Whether the manifest contains at least one partition with a null value for the field | -| _optional_ | _optional_ | **`518 contains_nan`** | `boolean` | Whether the manifest contains at least one partition with a NaN value for the field | -| _optional_ | _optional_ | **`510 lower_bound`** | `bytes` [1] | Lower bound for the non-null, non-NaN values in the partition field, or null if all values are null or NaN [2] | -| _optional_ | _optional_ | **`511 upper_bound`** | `bytes` [1] | Upper bound for the non-null, non-NaN values in the partition field, or null if all values are null or NaN [2] | +| v1 | v2 | Field id, name | Type | Description | +| ---------- | ---------- |-------------------------|---------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| _required_ | _required_ | **`509 contains_null`** | `boolean` | Whether the manifest contains at least one partition with a null value for the field | +| _optional_ | _optional_ | **`518 contains_nan`** | `boolean` | Whether the manifest contains at least one partition with a NaN value for the field | +| _optional_ | _optional_ | **`510 lower_bound`** | `bytes` [1] | Lower bound for the non-null, non-NaN values in the partition field, or null if all values are null or NaN [2]. For Geometry type, this is a Point composed of the min value of each dimension among all non-null, non-NAN Geometry values in the partition field. | +| _optional_ | _optional_ | **`511 upper_bound`** | `bytes` [1] | Upper bound for the non-null, non-NaN values in the partition field, or null if all values are null or NaN [2]. For Geometry type, this is a Point composed of the max value of each dimension among all non-null, non-NAN Geometry values in the partition field. | | Notes: @@ -942,9 +954,9 @@ Maps with non-string keys must use an array representation with the `map` logica |**`decimal(P,S)`**|`{ "type": "fixed",`
  `"size": minBytesRequired(P),`
  `"logicalType": "decimal",`
  `"precision": P,`
  `"scale": S }`|Stored as fixed using the minimum number of bytes for the given precision.| |**`date`**|`{ "type": "int",`
  `"logicalType": "date" }`|Stores days from 1970-01-01.| |**`time`**|`{ "type": "long",`
  `"logicalType": "time-micros" }`|Stores microseconds from midnight.| -|**`timestamp`** | `{ "type": "long",`
  `"logicalType": "timestamp-micros",`
  `"adjust-to-utc": false }` | Stores microseconds from 1970-01-01 00:00:00.000000. [1] | -|**`timestamptz`** | `{ "type": "long",`
  `"logicalType": "timestamp-micros",`
  `"adjust-to-utc": true }` | Stores microseconds from 1970-01-01 00:00:00.000000 UTC. [1] | -|**`timestamp_ns`** | `{ "type": "long",`
  `"logicalType": "timestamp-nanos",`
  `"adjust-to-utc": false }` | Stores nanoseconds from 1970-01-01 00:00:00.000000000. [1], [2] | +|**`timestamp`** | `{ "type": "long",`
  `"logicalType": "timestamp-micros",`
  `"adjust-to-utc": false }` | Stores microseconds from 1970-01-01 00:00:00.000000. [1] | +|**`timestamptz`** | `{ "type": "long",`
  `"logicalType": "timestamp-micros",`
  `"adjust-to-utc": true }` | Stores microseconds from 1970-01-01 00:00:00.000000 UTC. [1] | +|**`timestamp_ns`** | `{ "type": "long",`
  `"logicalType": "timestamp-nanos",`
  `"adjust-to-utc": false }` | Stores nanoseconds from 1970-01-01 00:00:00.000000000. [1], [2] | |**`timestamptz_ns`** | `{ "type": "long",`
  `"logicalType": "timestamp-nanos",`
  `"adjust-to-utc": true }` | Stores nanoseconds from 1970-01-01 00:00:00.000000000 UTC. [1], [2] | |**`string`**|`string`|| |**`uuid`**|`{ "type": "fixed",`
  `"size": 16,`
  `"logicalType": "uuid" }`|| @@ -953,13 +965,13 @@ Maps with non-string keys must use an array representation with the `map` logica |**`struct`**|`record`|| |**`list`**|`array`|| |**`map`**|`array` of key-value records, or `map` when keys are strings (optional).|Array storage must use logical type name `map` and must store elements that are 2-field records. The first field is a non-null key and the second field is the value.| +|**`geometry`**|`bytes`| WKB format, see Appendix G | Notes: 1. Avro type annotation `adjust-to-utc` is an Iceberg convention; default value is `false` if not present. 2. Avro logical type `timestamp-nanos` is an Iceberg convention; the Avro specification does not define this type. - **Field IDs** Iceberg struct, list, and map types identify nested types by ID. When writing data to Avro files, these IDs must be stored in the Avro schema to support ID-based column pruning. @@ -985,59 +997,66 @@ Values should be stored in Parquet using the types and logical type annotations Lists must use the [3-level representation](https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#lists). -| Type | Parquet physical type | Logical type | Notes | -|--------------------|--------------------------------------------------------------------|---------------------------------------------|----------------------------------------------------------------| -| **`boolean`** | `boolean` | | | -| **`int`** | `int` | | | -| **`long`** | `long` | | | -| **`float`** | `float` | | | -| **`double`** | `double` | | | -| **`decimal(P,S)`** | `P <= 9`: `int32`,
`P <= 18`: `int64`,
`fixed` otherwise | `DECIMAL(P,S)` | Fixed must use the minimum number of bytes that can store `P`. | -| **`date`** | `int32` | `DATE` | Stores days from 1970-01-01. | -| **`time`** | `int64` | `TIME_MICROS` with `adjustToUtc=false` | Stores microseconds from midnight. | -| **`timestamp`** | `int64` | `TIMESTAMP_MICROS` with `adjustToUtc=false` | Stores microseconds from 1970-01-01 00:00:00.000000. | -| **`timestamptz`** | `int64` | `TIMESTAMP_MICROS` with `adjustToUtc=true` | Stores microseconds from 1970-01-01 00:00:00.000000 UTC. | -| **`timestamp_ns`** | `int64` | `TIMESTAMP_NANOS` with `adjustToUtc=false` | Stores nanoseconds from 1970-01-01 00:00:00.000000000. | -| **`timestamptz_ns`** | `int64` | `TIMESTAMP_NANOS` with `adjustToUtc=true` | Stores nanoseconds from 1970-01-01 00:00:00.000000000 UTC. | -| **`string`** | `binary` | `UTF8` | Encoding must be UTF-8. | -| **`uuid`** | `fixed_len_byte_array[16]` | `UUID` | | -| **`fixed(L)`** | `fixed_len_byte_array[L]` | | | -| **`binary`** | `binary` | | | -| **`struct`** | `group` | | | -| **`list`** | `3-level list` | `LIST` | See Parquet docs for 3-level representation. | -| **`map`** | `3-level map` | `MAP` | See Parquet docs for 3-level representation. | +| Type | Parquet physical type | Logical type | Notes | +|--------------------|--------------------------------------------------------------------|---------------------------------------------|---------------------------------------------------------------------------------------------------------| +| **`boolean`** | `boolean` | | | +| **`int`** | `int` | | | +| **`long`** | `long` | | | +| **`float`** | `float` | | | +| **`double`** | `double` | | | +| **`decimal(P,S)`** | `P <= 9`: `int32`,
`P <= 18`: `int64`,
`fixed` otherwise | `DECIMAL(P,S)` | Fixed must use the minimum number of bytes that can store `P`. | +| **`date`** | `int32` | `DATE` | Stores days from 1970-01-01. | +| **`time`** | `int64` | `TIME_MICROS` with `adjustToUtc=false` | Stores microseconds from midnight. | +| **`timestamp`** | `int64` | `TIMESTAMP_MICROS` with `adjustToUtc=false` | Stores microseconds from 1970-01-01 00:00:00.000000. | +| **`timestamptz`** | `int64` | `TIMESTAMP_MICROS` with `adjustToUtc=true` | Stores microseconds from 1970-01-01 00:00:00.000000 UTC. | +| **`timestamp_ns`** | `int64` | `TIMESTAMP_NANOS` with `adjustToUtc=false` | Stores nanoseconds from 1970-01-01 00:00:00.000000000. | +| **`timestamptz_ns`** | `int64` | `TIMESTAMP_NANOS` with `adjustToUtc=true` | Stores nanoseconds from 1970-01-01 00:00:00.000000000 UTC. | +| **`string`** | `binary` | `UTF8` | Encoding must be UTF-8. | +| **`uuid`** | `fixed_len_byte_array[16]` | `UUID` | | +| **`fixed(L)`** | `fixed_len_byte_array[L]` | | | +| **`binary`** | `binary` | | | +| **`struct`** | `group` | | | +| **`list`** | `3-level list` | `LIST` | See Parquet docs for 3-level representation. | +| **`map`** | `3-level map` | `MAP` | See Parquet docs for 3-level representation. | +| **`geometry`** | `binary` | `GEOMETRY` | WKB format, see Appendix G. Logical type annotation optional for supported Parquet format versions [1]. | +Notes: + +1. [https://github.com/apache/parquet-format/pull/240](https://github.com/apache/parquet-format/pull/240)) ### ORC **Data Type Mappings** -| Type | ORC type | ORC type attributes | Notes | -|--------------------|---------------------|------------------------------------------------------|-----------------------------------------------------------------------------------------| -| **`boolean`** | `boolean` | | | -| **`int`** | `int` | | ORC `tinyint` and `smallint` would also map to **`int`**. | -| **`long`** | `long` | | | -| **`float`** | `float` | | | -| **`double`** | `double` | | | -| **`decimal(P,S)`** | `decimal` | | | -| **`date`** | `date` | | | -| **`time`** | `long` | `iceberg.long-type`=`TIME` | Stores microseconds from midnight. | -| **`timestamp`** | `timestamp` | `iceberg.timestamp-unit`=`MICROS` | Stores microseconds from 2015-01-01 00:00:00.000000. [1], [2] | -| **`timestamptz`** | `timestamp_instant` | `iceberg.timestamp-unit`=`MICROS` | Stores microseconds from 2015-01-01 00:00:00.000000 UTC. [1], [2] | -| **`timestamp_ns`** | `timestamp` | `iceberg.timestamp-unit`=`NANOS` | Stores nanoseconds from 2015-01-01 00:00:00.000000000. [1] | -| **`timestamptz_ns`** | `timestamp_instant` | `iceberg.timestamp-unit`=`NANOS` | Stores nanoseconds from 2015-01-01 00:00:00.000000000 UTC. [1] | -| **`string`** | `string` | | ORC `varchar` and `char` would also map to **`string`**. | -| **`uuid`** | `binary` | `iceberg.binary-type`=`UUID` | | -| **`fixed(L)`** | `binary` | `iceberg.binary-type`=`FIXED` & `iceberg.length`=`L` | The length would not be checked by the ORC reader and should be checked by the adapter. | -| **`binary`** | `binary` | | | -| **`struct`** | `struct` | | | -| **`list`** | `array` | | | -| **`map`** | `map` | | | + +| Type | ORC type | ORC type attributes | Notes | +|--------------------|---------------------|------------------------------------------------------|------------------------------------------------------------------------------------------| +| **`boolean`** | `boolean` | | | +| **`int`** | `int` | | ORC `tinyint` and `smallint` would also map to **`int`**. | +| **`long`** | `long` | | | +| **`float`** | `float` | | | +| **`double`** | `double` | | | +| **`decimal(P,S)`** | `decimal` | | | +| **`date`** | `date` | | | +| **`time`** | `long` | `iceberg.long-type`=`TIME` | Stores microseconds from midnight. | +| **`timestamp`** | `timestamp` | `iceberg.timestamp-unit`=`MICROS` | Stores microseconds from 2015-01-01 00:00:00.000000. [1], [2] | +| **`timestamptz`** | `timestamp_instant` | `iceberg.timestamp-unit`=`MICROS` | Stores microseconds from 2015-01-01 00:00:00.000000 UTC. [1], [2] | +| **`timestamp_ns`** | `timestamp` | `iceberg.timestamp-unit`=`NANOS` | Stores nanoseconds from 2015-01-01 00:00:00.000000000. [1] | +| **`timestamptz_ns`** | `timestamp_instant` | `iceberg.timestamp-unit`=`NANOS` | Stores nanoseconds from 2015-01-01 00:00:00.000000000 UTC. [1] | +| **`string`** | `string` | | ORC `varchar` and `char` would also map to **`string`**. | +| **`uuid`** | `binary` | `iceberg.binary-type`=`UUID` | | +| **`fixed(L)`** | `binary` | `iceberg.binary-type`=`FIXED` & `iceberg.length`=`L` | The length would not be checked by the ORC reader and should be checked by the adapter. | +| **`binary`** | `binary` | | | +| **`struct`** | `struct` | | | +| **`list`** | `array` | | | +| **`map`** | `map` | | | +| **`geometry`** | `binary` | | WKB format, see Appendix G; Optional geometry type for supported ORC format versions [3] | Notes: 1. ORC's [TimestampColumnVector](https://orc.apache.org/api/hive-storage-api/org/apache/hadoop/hive/ql/exec/vector/TimestampColumnVector.html) consists of a time field (milliseconds since epoch) and a nanos field (nanoseconds within the second). Hence the milliseconds within the second are reported twice; once in the time field and again in the nanos field. The read adapter should only use milliseconds within the second from one of these fields. The write adapter should also report milliseconds within the second twice; once in the time field and again in the nanos field. ORC writer is expected to correctly consider millis information from one of the fields. More details at https://issues.apache.org/jira/browse/ORC-546 2. ORC `timestamp` and `timestamp_instant` values store nanosecond precision. Iceberg ORC writers for Iceberg types `timestamp` and `timestamptz` **must** truncate nanoseconds to microseconds. `iceberg.timestamp-unit` is assumed to be `MICROS` if not present. +3 [https://github.com/apache/orc-format/pull/18](https://github.com/apache/orc-format/pull/18) One of the interesting challenges with this is how to map Iceberg’s schema evolution (id based) on to ORC’s (name based). In theory, we could use Iceberg’s column ids as the column and field names, but that would be inconvenient. @@ -1054,21 +1073,23 @@ Iceberg would build the desired reader schema with their schema evolution rules The 32-bit hash implementation is 32-bit Murmur3 hash, x86 variant, seeded with 0. -| Primitive type | Hash specification | Test value | -|--------------------|-------------------------------------------|--------------------------------------------| -| **`int`** | `hashLong(long(v))` [1] | `34` → `2017239379` | -| **`long`** | `hashBytes(littleEndianBytes(v))` | `34L` → `2017239379` | -| **`decimal(P,S)`** | `hashBytes(minBigEndian(unscaled(v)))`[2] | `14.20` → `-500754589` | -| **`date`** | `hashInt(daysFromUnixEpoch(v))` | `2017-11-16` → `-653330422` | -| **`time`** | `hashLong(microsecsFromMidnight(v))` | `22:31:08` → `-662762989` | -| **`timestamp`** | `hashLong(microsecsFromUnixEpoch(v))` | `2017-11-16T22:31:08` → `-2047944441`
`2017-11-16T22:31:08.000001` → `-1207196810` | -| **`timestamptz`** | `hashLong(microsecsFromUnixEpoch(v))` | `2017-11-16T14:31:08-08:00` → `-2047944441`
`2017-11-16T14:31:08.000001-08:00` → `-1207196810` | -| **`timestamp_ns`** | `hashLong(nanosecsFromUnixEpoch(v))` | `2017-11-16T22:31:08` → `-737750069`
`2017-11-16T22:31:08.000001` → `-976603392`
`2017-11-16T22:31:08.000000001` → `-160215926` | -| **`timestamptz_ns`** | `hashLong(nanosecsFromUnixEpoch(v))` | `2017-11-16T14:31:08-08:00` → `-737750069`
`2017-11-16T14:31:08.000001-08:00` → `-976603392`
`2017-11-16T14:31:08.000000001-08:00` → `-160215926` | -| **`string`** | `hashBytes(utf8Bytes(v))` | `iceberg` → `1210000089` | -| **`uuid`** | `hashBytes(uuidBytes(v))` [3] | `f79c3e09-677c-4bbd-a479-3f349cb785e7` → `1488055340` | -| **`fixed(L)`** | `hashBytes(v)` | `00 01 02 03` → `-188683207` | -| **`binary`** | `hashBytes(v)` | `00 01 02 03` → `-188683207` | +| Primitive type | Hash specification | Test value | +|----------------------|-------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------| +| **`int`** | `hashLong(long(v))` [1] | `34` → `2017239379` | +| **`long`** | `hashBytes(littleEndianBytes(v))` | `34L` → `2017239379` | +| **`decimal(P,S)`** | `hashBytes(minBigEndian(unscaled(v)))`[2] | `14.20` → `-500754589` | +| **`date`** | `hashInt(daysFromUnixEpoch(v))` | `2017-11-16` → `-653330422` | +| **`time`** | `hashLong(microsecsFromMidnight(v))` | `22:31:08` → `-662762989` | +| **`timestamp`** | `hashLong(microsecsFromUnixEpoch(v))` | `2017-11-16T22:31:08` → `-2047944441`
`2017-11-16T22:31:08.000001` → `-1207196810` | +| **`timestamptz`** | `hashLong(microsecsFromUnixEpoch(v))` | `2017-11-16T14:31:08-08:00` → `-2047944441`
`2017-11-16T14:31:08.000001-08:00` → `-1207196810` | +| **`timestamp_ns`** | `hashLong(nanosecsFromUnixEpoch(v))` | `2017-11-16T22:31:08` → `-737750069`
`2017-11-16T22:31:08.000001` → `-976603392`
`2017-11-16T22:31:08.000000001` → `-160215926` | +| **`timestamptz_ns`** | `hashLong(nanosecsFromUnixEpoch(v))` | `2017-11-16T14:31:08-08:00` → `-737750069`
`2017-11-16T14:31:08.000001-08:00` → `-976603392`
`2017-11-16T14:31:08.000000001-08:00` → `-160215926` | +| **`string`** | `hashBytes(utf8Bytes(v))` | `iceberg` → `1210000089` | +| **`uuid`** | `hashBytes(uuidBytes(v))` [3] | `f79c3e09-677c-4bbd-a479-3f349cb785e7` → `1488055340` | +| **`fixed(L)`** | `hashBytes(v)` | `00 01 02 03` → `-188683207` | +| **`binary`** | `hashBytes(v)` | `00 01 02 03` → `-188683207` | +| **`geometry`** | `hashBytes(wkb(v))` [4] | `(1.0, 1.0)` → `-246548298` | + The types below are not currently valid for bucketing, and so are not hashed. However, if that changes and a hash value is needed, the following table shall apply: @@ -1085,7 +1106,8 @@ Notes: Hash results are not dependent on decimal scale, which is part of the type, not the data value. 3. UUIDs are encoded using big endian. The test UUID for the example above is: `f79c3e09-677c-4bbd-a479-3f349cb785e7`. This UUID encoded as a byte array is: `F7 9C 3E 09 67 7C 4B BD A4 79 3F 34 9C B7 85 E7` -4. `doubleToLongBits` must give the IEEE 754 compliant bit representation of the double value. All `NaN` bit patterns must be canonicalized to `0x7ff8000000000000L`. Negative zero (`-0.0`) must be canonicalized to positive zero (`0.0`). Float hash values are the result of hashing the float cast to double to ensure that schema evolution does not change hash values if float types are promoted. +4. WKB format, see Appendix G. +5. `doubleToLongBits` must give the IEEE 754 compliant bit representation of the double value. All `NaN` bit patterns must be canonicalized to `0x7ff8000000000000L`. Negative zero (`-0.0`) must be canonicalized to positive zero (`0.0`). Float hash values are the result of hashing the float cast to double to ensure that schema evolution does not change hash values if float types are promoted. ## Appendix C: JSON serialization @@ -1101,27 +1123,28 @@ Schemas are serialized as a JSON object with the same fields as a struct in the Types are serialized according to this table: -|Type|JSON representation|Example| -|--- |--- |--- | -|**`boolean`**|`JSON string: "boolean"`|`"boolean"`| -|**`int`**|`JSON string: "int"`|`"int"`| -|**`long`**|`JSON string: "long"`|`"long"`| -|**`float`**|`JSON string: "float"`|`"float"`| -|**`double`**|`JSON string: "double"`|`"double"`| -|**`date`**|`JSON string: "date"`|`"date"`| -|**`time`**|`JSON string: "time"`|`"time"`| -|**`timestamp, microseconds, without zone`**|`JSON string: "timestamp"`|`"timestamp"`| -|**`timestamp, microseconds, with zone`**|`JSON string: "timestamptz"`|`"timestamptz"`| -|**`timestamp, nanoseconds, without zone`**|`JSON string: "timestamp_ns"`|`"timestamp_ns"`| -|**`timestamp, nanoseconds, with zone`**|`JSON string: "timestamptz_ns"`|`"timestamptz_ns"`| -|**`string`**|`JSON string: "string"`|`"string"`| -|**`uuid`**|`JSON string: "uuid"`|`"uuid"`| -|**`fixed(L)`**|`JSON string: "fixed[]"`|`"fixed[16]"`| -|**`binary`**|`JSON string: "binary"`|`"binary"`| -|**`decimal(P, S)`**|`JSON string: "decimal(

,)"`|`"decimal(9,2)"`,
`"decimal(9, 2)"`| -|**`struct`**|`JSON object: {`
  `"type": "struct",`
  `"fields": [ {`
    `"id": ,`
    `"name": ,`
    `"required": ,`
    `"type": ,`
    `"doc": ,`
    `"initial-default": ,`
    `"write-default": `
    `}, ...`
  `] }`|`{`
  `"type": "struct",`
  `"fields": [ {`
    `"id": 1,`
    `"name": "id",`
    `"required": true,`
    `"type": "uuid",`
    `"initial-default": "0db3e2a8-9d1d-42b9-aa7b-74ebe558dceb",`
    `"write-default": "ec5911be-b0a7-458c-8438-c9a3e53cffae"`
  `}, {`
    `"id": 2,`
    `"name": "data",`
    `"required": false,`
    `"type": {`
      `"type": "list",`
      `...`
    `}`
  `} ]`
`}`| -|**`list`**|`JSON object: {`
  `"type": "list",`
  `"element-id": ,`
  `"element-required": `
  `"element": `
`}`|`{`
  `"type": "list",`
  `"element-id": 3,`
  `"element-required": true,`
  `"element": "string"`
`}`| -|**`map`**|`JSON object: {`
  `"type": "map",`
  `"key-id": ,`
  `"key": ,`
  `"value-id": ,`
  `"value-required": `
  `"value": `
`}`|`{`
  `"type": "map",`
  `"key-id": 4,`
  `"key": "string",`
  `"value-id": 5,`
  `"value-required": false,`
  `"value": "double"`
`}`| +| Type | JSON representation | Example | +|---------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| **`boolean`** | `JSON string: "boolean"` | `"boolean"` | +| **`int`** | `JSON string: "int"` | `"int"` | +| **`long`** | `JSON string: "long"` | `"long"` | +| **`float`** | `JSON string: "float"` | `"float"` | +| **`double`** | `JSON string: "double"` | `"double"` | +| **`date`** | `JSON string: "date"` | `"date"` | +| **`time`** | `JSON string: "time"` | `"time"` | +| **`timestamp, microseconds, without zone`** | `JSON string: "timestamp"` | `"timestamp"` | +| **`timestamp, microseconds, with zone`** | `JSON string: "timestamptz"` | `"timestamptz"` | +| **`timestamp, nanoseconds, without zone`** | `JSON string: "timestamp_ns"` | `"timestamp_ns"` | +| **`timestamp, nanoseconds, with zone`** | `JSON string: "timestamptz_ns"` | `"timestamptz_ns"` | +| **`string`** | `JSON string: "string"` | `"string"` | +| **`uuid`** | `JSON string: "uuid"` | `"uuid"` | +| **`fixed(L)`** | `JSON string: "fixed[]"` | `"fixed[16]"` | +| **`binary`** | `JSON string: "binary"` | `"binary"` | +| **`decimal(P, S)`** | `JSON string: "decimal(

,)"` | `"decimal(9,2)"`,
`"decimal(9, 2)"` | +| **`struct`** | `JSON object: {`
  `"type": "struct",`
  `"fields": [ {`
    `"id": ,`
    `"name": ,`
    `"required": ,`
    `"type": ,`
    `"doc": ,`
    `"initial-default": ,`
    `"write-default": `
    `}, ...`
  `] }` | `{`
  `"type": "struct",`
  `"fields": [ {`
    `"id": 1,`
    `"name": "id",`
    `"required": true,`
    `"type": "uuid",`
    `"initial-default": "0db3e2a8-9d1d-42b9-aa7b-74ebe558dceb",`
    `"write-default": "ec5911be-b0a7-458c-8438-c9a3e53cffae"`
  `}, {`
    `"id": 2,`
    `"name": "data",`
    `"required": false,`
    `"type": {`
      `"type": "list",`
      `...`
    `}`
  `} ]`
`}` | +| **`list`** | `JSON object: {`
  `"type": "list",`
  `"element-id": ,`
  `"element-required": `
  `"element": `
`}` | `{`
  `"type": "list",`
  `"element-id": 3,`
  `"element-required": true,`
  `"element": "string"`
`}` | +| **`map`** | `JSON object: {`
  `"type": "map",`
  `"key-id": ,`
  `"key": ,`
  `"value-id": ,`
  `"value-required": `
  `"value": `
`}` | `{`
  `"type": "map",`
  `"key-id": 4,`
  `"key": "string",`
  `"value-id": 5,`
  `"value-required": false,`
  `"value": "double"`
`}` | +| **`geometry(C, T, E)`** | `JSON object: {`
  `"type": "geometry",`
  `"crs": ,`
  `"crs-type": `
  `"edges": `
`}` | `{`
  `"type": "geometry",`
  `"crs": {`
    `"$schema": "https://proj.org/schemas/v0.5/projjson.schema.json",`
    `"type": "GeographicCRS",`
    `"name": "WGS 84 longitude-latitude",`
    `"datum": {`
       `"type": "GeodeticReferenceFrame",`
       `"ellipsoid": {`
        `"name": "WGS 84",`
        `"semi_major_axis": 6378137,`
        `"inverse_flattening": 298.257223563`
      `}`
    `},`
    `"coordinate_system": {`
      `"subtype": "ellipsoidal",`
      `"axis": [`
      `{`
        `"name": "Geodetic longitude",`
        `"abbreviation": "Lon",`
        `"direction": "east",`
        `"unit": "degree"`
      `},`
      `{`
        `"name": "Geodetic longitude",`
        `"abbreviation": "Lon",`
        `"direction": "east",`
        `"unit": "degree"`
      `}`
      `]`
    `},`
    `"id":{`
      `"authority": "OGC",`
      `""code": "CRS84"`
    `}`
  `},`
  `"crs_kind": "PROJJSON",`
  `"edges": "planar"`
`}`| Note that default values are serialized using the JSON single-value serialization in [Appendix D](#appendix-d-single-value-serialization). @@ -1147,15 +1170,16 @@ Each partition field in `fields` is stored as a JSON object with the following p Supported partition transforms are listed below. -|Transform or Field|JSON representation|Example| -|--- |--- |--- | -|**`identity`**|`JSON string: "identity"`|`"identity"`| -|**`bucket[N]`**|`JSON string: "bucket[]"`|`"bucket[16]"`| -|**`truncate[W]`**|`JSON string: "truncate[]"`|`"truncate[20]"`| -|**`year`**|`JSON string: "year"`|`"year"`| -|**`month`**|`JSON string: "month"`|`"month"`| -|**`day`**|`JSON string: "day"`|`"day"`| -|**`hour`**|`JSON string: "hour"`|`"hour"`| +| Transform or Field |JSON representation|Example| +|--------------------|--- |--- | +| **`identity`** |`JSON string: "identity"`|`"identity"`| +| **`bucket[N]`** |`JSON string: "bucket[]"`|`"bucket[16]"`| +| **`truncate[W]`** |`JSON string: "truncate[]"`|`"truncate[20]"`| +| **`year`** |`JSON string: "year"`|`"year"`| +| **`month`** |`JSON string: "month"`|`"month"`| +| **`day`** |`JSON string: "day"`|`"day"`| +| **`hour`** |`JSON string: "hour"`|`"hour"`| +| **`xz2(R)`** |`JSON string: "xz2[]"`|`"xz2[10]"` In some cases partition specs are stored using only the field list instead of the object format that includes the spec ID, like the deprecated `partition-spec` field in table metadata. The object format should be used unless otherwise noted in this spec. @@ -1249,27 +1273,28 @@ Example This serialization scheme is for storing single values as individual binary values in the lower and upper bounds maps of manifest files. -| Type | Binary serialization | -|------------------------------|--------------------------------------------------------------------------------------------------------------| -| **`boolean`** | `0x00` for false, non-zero byte for true | -| **`int`** | Stored as 4-byte little-endian | -| **`long`** | Stored as 8-byte little-endian | -| **`float`** | Stored as 4-byte little-endian | -| **`double`** | Stored as 8-byte little-endian | -| **`date`** | Stores days from the 1970-01-01 in an 4-byte little-endian int | -| **`time`** | Stores microseconds from midnight in an 8-byte little-endian long | -| **`timestamp`** | Stores microseconds from 1970-01-01 00:00:00.000000 in an 8-byte little-endian long | -| **`timestamptz`** | Stores microseconds from 1970-01-01 00:00:00.000000 UTC in an 8-byte little-endian long | -| **`timestamp_ns`** | Stores nanoseconds from 1970-01-01 00:00:00.000000000 in an 8-byte little-endian long | -| **`timestamptz_ns`** | Stores nanoseconds from 1970-01-01 00:00:00.000000000 UTC in an 8-byte little-endian long | -| **`string`** | UTF-8 bytes (without length) | -| **`uuid`** | 16-byte big-endian value, see example in Appendix B | -| **`fixed(L)`** | Binary value | -| **`binary`** | Binary value (without length) | -| **`decimal(P, S)`** | Stores unscaled value as two’s-complement big-endian binary, using the minimum number of bytes for the value | -| **`struct`** | Not supported | -| **`list`** | Not supported | -| **`map`** | Not supported | +| Type | Binary serialization | +|----------------------|--------------------------------------------------------------------------------------------------------------| +| **`boolean`** | `0x00` for false, non-zero byte for true | +| **`int`** | Stored as 4-byte little-endian | +| **`long`** | Stored as 8-byte little-endian | +| **`float`** | Stored as 4-byte little-endian | +| **`double`** | Stored as 8-byte little-endian | +| **`date`** | Stores days from the 1970-01-01 in an 4-byte little-endian int | +| **`time`** | Stores microseconds from midnight in an 8-byte little-endian long | +| **`timestamp`** | Stores microseconds from 1970-01-01 00:00:00.000000 in an 8-byte little-endian long | +| **`timestamptz`** | Stores microseconds from 1970-01-01 00:00:00.000000 UTC in an 8-byte little-endian long | +| **`timestamp_ns`** | Stores nanoseconds from 1970-01-01 00:00:00.000000000 in an 8-byte little-endian long | +| **`timestamptz_ns`** | Stores nanoseconds from 1970-01-01 00:00:00.000000000 UTC in an 8-byte little-endian long | +| **`string`** | UTF-8 bytes (without length) | +| **`uuid`** | 16-byte big-endian value, see example in Appendix B | +| **`fixed(L)`** | Binary value | +| **`binary`** | Binary value (without length) | +| **`decimal(P, S)`** | Stores unscaled value as two’s-complement big-endian binary, using the minimum number of bytes for the value | +| **`struct`** | Not supported | +| **`list`** | Not supported | +| **`map`** | Not supported | +| **`geometry`** | WKB format, see Appendix G | ### JSON single-value serialization @@ -1296,7 +1321,7 @@ This serialization scheme is for storing single values as individual binary valu | **`struct`** | **`JSON object by field ID`** | `{"1": 1, "2": "bar"}` | Stores struct fields using the field ID as the JSON field name; field values are stored using this JSON single-value format | | **`list`** | **`JSON array of values`** | `[1, 2, 3]` | Stores a JSON array of values that are serialized using this JSON single-value format | | **`map`** | **`JSON object of key and value arrays`** | `{ "keys": ["a", "b"], "values": [1, 2] }` | Stores arrays of keys and values; individual keys and values are serialized using this JSON single-value format | - +| **`geometry`** | **`JSON string`** |`00000000013FF00000000000003FF0000000000000`| Stores WKB as a hexideciamal string, see Appendix G | ## Appendix E: Format version changes @@ -1407,3 +1432,9 @@ Iceberg supports two types of histories for tables. A history of previous "curre might indicate different snapshot IDs for a specific timestamp. The discrepancies can be caused by a variety of table operations (e.g. updating the `current-snapshot-id` can be used to set the snapshot of a table to any arbitrary snapshot, which might have a lineage derived from a table branch or no lineage at all). When processing point in time queries implementations should use "snapshot-log" metadata to lookup the table state at the given point in time. This ensures time-travel queries reflect the state of the table at the provided timestamp. For example a SQL query like `SELECT * FROM prod.db.table TIMESTAMP AS OF '1986-10-26 01:21:00Z';` would find the snapshot of the Iceberg table just prior to '1986-10-26 01:21:00 UTC' in the snapshot logs and use the metadata from that snapshot to perform the scan of the table. If no snapshot exists prior to the timestamp given or "snapshot-log" is not populated (it is an optional field), then systems should raise an informative error message about the missing metadata. + +## Appendix G: Geospatial Notes + +The Geometry class hierarchy and WKB serialization is defined by [OpenGIS Implementation Specification for Geographic information – Simple feature access – Part 1: Common architecture, Version 1.2.1](https://portal.ogc.org/files/?artifact_id=25355), from [Open Geospatial Consortium](https://www.ogc.org/standard/sfa/). + +Future versions of this spec may also used if the WKB representation remains wire-compatible.