You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The output of the library produces a schema observed by parquet-tools as such:
optional group my_data (MAP) {
repeated group map {
optional binary key (STRING);
optional binary value (STRING);
}
}
Note that repeated group map omits the MAP_KEY_VALUE in the schema.
This results in the AWS glue crawler seeing the two schemas differently.
For the Kinesis Firehose generated data, the parsed schema by glue appears as the following:
However, the schema parsed by glue generated by this library presents the following:
I am unsure if I am using the MAP part of this library incorrectly however, as it is an undocumented feature. The structure of this schema is based off parquet files generated by a Kinesis Firehose pipeline.
The text was updated successfully, but these errors were encountered:
After experimenting around with the MAP type for Athena, it appears that the structure is not quite right.
Here is the schema output from parquet-tools for the MAP data generated by Kinesis Firehose:
Noting the MAP_KEY_VALUE for repeated group map.
However, when generating the map data-type with this schema:
The output of the library produces a schema observed by parquet-tools as such:
Note that repeated group map omits the MAP_KEY_VALUE in the schema.
This results in the AWS glue crawler seeing the two schemas differently.
For the Kinesis Firehose generated data, the parsed schema by glue appears as the following:
However, the schema parsed by glue generated by this library presents the following:
I am unsure if I am using the MAP part of this library incorrectly however, as it is an undocumented feature. The structure of this schema is based off parquet files generated by a Kinesis Firehose pipeline.
The text was updated successfully, but these errors were encountered: