Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Missing errors when connecting to a Parquet table with incorrect types #101

Closed
Swiddis opened this issue Oct 25, 2023 · 1 comment
Closed
Labels
0.3 bug Something isn't working

Comments

@Swiddis
Copy link
Contributor

Swiddis commented Oct 25, 2023

What is the bug?
If accidentally attempting to make a table connecting to Parquet where a type is mismatched, there is no useful error message on either table creation or querying.

How can one reproduce the bug?
Steps to reproduce the behavior:

  1. Create a parquet file with some data that has a mismatched type, in my case there was a time field that I expected to be a string but it was an int
  2. Create a table connected to that file
CREATE EXTERNAL TABLE IF NOT EXISTS sample_s3.default.elb_logs_parquet (
  type string,
  time string,
  elb string
)
USING parquet
LOCATION 's3://[sample-location]/data/truncated-elb-parquet/';
  1. At this point, the query should fail, but it returns a success, so we continue by trying to query it:
SELECT * FROM sample_s3.default.elb_logs_parquet;
  1. This fails with the error: Fail to write result, cause: null. If you dig deeper into the EMR logs, you find the error,
org.apache.spark.sql.execution.QueryExecutionException: Parquet column cannot be converted in file s3://[location]/data/truncated-elb-parquet/logs.parquet. Column: [time], Expected: string, Found: INT64

What is the expected behavior?
Table validation should fail on creation, but barring that, there should at least be a helpful error returned by the API if this happens.

What is your host/environment?

  • AWS EMR

Do you have any screenshots?
image

Do you have any additional context?
N/A

@Swiddis Swiddis added bug Something isn't working untriaged labels Oct 25, 2023
@Swiddis
Copy link
Contributor Author

Swiddis commented Oct 25, 2023

Update: This turns out to get much more annoying if you have a large table and individually need to debug columns giving errors Expected: int, Found: INT64.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0.3 bug Something isn't working
Development

No branches or pull requests

2 participants