You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I got error below ArrowError(CsvError("incorrect number of fields for line 1, expected 31 got more than 31"))
I found the code cause the error might be df.write_parquet(&output_filename, Some(props)).await?;
in lib.rs
After I delete the first number in call_center.dat/part-1.dat, the error became to ArrowError(CsvError("incorrect number of fields for line 2, expected 31 got 32"))
However the process of TPCH data is OK. The generators of TPCH and TPC-DS are obtained as you described in your repo.
The text was updated successfully, but these errors were encountered:
Just leaving this comment here as a sub-optimal solution. The problem lies in the dataset. There are trailing comma, in this case the "|" character at the end of each lines of every .tbl files. This causes a mismatch between the defined schema in lib/tpcds.rs and the actual reading of the file. An ugly solution would be modifying every single schema definition for a blank column, then drop it after CSV file reading. And this exposes yet another problem in encoding that datafusion is currently not in support of latin-1 encode, which is the encoding scheme used in TPC-DS 3.0.1rc. Would love to see this problem resolve one day. But I dont mind creating a PR to this codebase.
After I execute
The data are generated in folder /tmp/tpcds/sf1000/. Then I execute
I got error below
ArrowError(CsvError("incorrect number of fields for line 1, expected 31 got more than 31"))
I found the code cause the error might be
df.write_parquet(&output_filename, Some(props)).await?;
in
lib.rs
After I delete the first number in
call_center.dat/part-1.dat
, the error became toArrowError(CsvError("incorrect number of fields for line 2, expected 31 got 32"))
However the process of TPCH data is OK. The generators of TPCH and TPC-DS are obtained as you described in your repo.
The text was updated successfully, but these errors were encountered: