You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
What is the bug?
If accidentally attempting to make a table connecting to Parquet where a type is mismatched, there is no useful error message on either table creation or querying.
How can one reproduce the bug?
Steps to reproduce the behavior:
Create a parquet file with some data that has a mismatched type, in my case there was a time field that I expected to be a string but it was an int
Create a table connected to that file
CREATE EXTERNAL TABLE IF NOT EXISTS sample_s3.default.elb_logs_parquet (
type string,
time string,
elb string
)
USING parquet
LOCATION 's3://[sample-location]/data/truncated-elb-parquet/';
At this point, the query should fail, but it returns a success, so we continue by trying to query it:
SELECT*FROMsample_s3.default.elb_logs_parquet;
This fails with the error: Fail to write result, cause: null. If you dig deeper into the EMR logs, you find the error,
org.apache.spark.sql.execution.QueryExecutionException: Parquet column cannot be converted in file s3://[location]/data/truncated-elb-parquet/logs.parquet. Column: [time], Expected: string, Found: INT64
What is the expected behavior?
Table validation should fail on creation, but barring that, there should at least be a helpful error returned by the API if this happens.
What is your host/environment?
AWS EMR
Do you have any screenshots?
Do you have any additional context?
N/A
The text was updated successfully, but these errors were encountered:
Update: This turns out to get much more annoying if you have a large table and individually need to debug columns giving errors Expected: int, Found: INT64.
What is the bug?
If accidentally attempting to make a table connecting to Parquet where a type is mismatched, there is no useful error message on either table creation or querying.
How can one reproduce the bug?
Steps to reproduce the behavior:
time
field that I expected to be a string but it was an intFail to write result, cause: null
. If you dig deeper into the EMR logs, you find the error,What is the expected behavior?
Table validation should fail on creation, but barring that, there should at least be a helpful error returned by the API if this happens.
What is your host/environment?
Do you have any screenshots?
Do you have any additional context?
N/A
The text was updated successfully, but these errors were encountered: