-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Not getting all columns while streaming parquet file from S3 or locally downloaded file reading #60
Comments
Update: same is completely readable by using the python panda module. |
Have you found any patterns in which rows have 'column A' data missing? Are they strings or numbers (eg. BigInt?)
Regards,
Dustin
…-----Original message-----
From: Rajan Kasodariya
Sent: Sunday, January 24 2021, 6:24 am
To: ZJONSSON/parquetjs
Cc: Dustin Charles; Comment
Subject: Re: [ZJONSSON/parquetjs] Not getting all columns while streaming parquet file from S3 or locally downloaded file reading (#60)
Update: same is completely readable by using the python panda module.
The parquet file present in s3 was originated by spark/python.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
@entitycs I can not share the parquerjs file but there was one pattern for all the parquet files which I tried to read. It was giving this field missing after 70k lines of data read. |
@rajan596 When you get a chance, can you try reverting to this commit, and attempt to read past the 70K mark again? Without being able to see the data, my hunch is that either the numbers in the field grow, and a compression algorithm not in the lite version takes on a different path, or there is an issue in lib/shred.js, which has differing iteration methods between HEAD and the above commit. |
Hi,
I am using standard code mentioned below. I am not getting desired columns in cursor. Can anyone tell where is the issue in library ?
Desired columns in S3 parquet is A,B,C while I am not getting column A most of the records.
While validating same parquet file downloading local and converting it to CSV A column value is present for all the dataset.
Please help where can this go wrong ?
Version: "parquetjs-lite": "0.8.0",
NodeJs version: v8.0.0
The text was updated successfully, but these errors were encountered: