Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add parquet-mr test #56

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Conversation

ZJONSSON
Copy link
Contributor

Here is a very basic example of how we can use dockerized parquet-tools (from parquet-mr) to test on travis whether files created by parquetjs can be read by parquet-mr (and therefore spark etc)

The basic test succeeds but more advanced tests fail. I will add a failing branch that we can use as a guide for fixing any errors.

image

@ZJONSSON ZJONSSON requested review from kessler and asmuth February 28, 2018 01:53
@@ -9,3 +9,11 @@ deploy:
tags: true
npm_api_key:
secure: HK/tFvgj/TtYTJ3s2Bszc1/yJWvbSkLcfY3ki3GEuudMpfzcq134/2fbdZLb+B7Ukg31rdRVFCrSg8k6a1KhztkRr9SnMts5WO2ZGulmzNQ+XsBwdd0Bf7KYamAtqft5qBnSvh+ypBloQJQqq5qazb31971Fwvg5pdkYTQgCQxyIfZlH8nUbOxcYyl4w6Mvz5zsQp2c4OKOdq0FgeU3OqJ05i5lWL/CZWRO9L7+f0Uih5Jr9CuRzBUcVVxIopn1uOX1czug+OudIuUMLxbJwJt69ZpWdTbywLg6wVvA58ozbyialuEx8S1UaehsqHFj29JJWcOw+6TCi5+512DrBZMguiyTkjq5I5kmRcPNPY8dcqJUZUD6eDpKYQemFeg+6vKIvT3spK53VXNoEOIqAAiNTpmfY6JQ17S31gy1TqZldMtWr1HXf95LGlLC+czgMHPi1m6YiUgdDx5N7MFXumdOxiyHNdoitQFyyyS57RS7BG8/5ZMeKIXEfhQ9KU/D5L3KpgNCBmwVR72vF3nb89aVETrvNIbZEgc/cTdYWquezfPibGoGjWVJ4c38nd30s6rmoMBwoDwznaDg87ameoHUKSCSMx3uVXRZ5uR2C4SmTqVbWNKLXszL4iIW54EaLf3M+AYjoAb+EupaPMuEonJukdzkalp03RekYVeIY23U=

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ZJONSSON ZJONSSON force-pushed the parquet-mr branch 2 times, most recently from 0094133 to 8764ac4 Compare February 28, 2018 02:26
@ZJONSSON
Copy link
Contributor Author

Here is a failing branch: https://github.com/ZJONSSON/parquetjs/tree/parquet-mr-fail
Problems with the RLE encoding

image

* bitpacking should work for any length of data, not just multiple of 8 (last packed is padded if less than 8)

* Improve runs estimation - only start a new run if we are at a mod 8 === 0, otherwise use bitpacking
@ZJONSSON
Copy link
Contributor Author

ZJONSSON commented Mar 1, 2018

This PR has been rebased on #57 to include fixes for RLE in dlevels and rlevels + more test added to verify that the results are correct as seen from parquet-mr

@justinsoliz
Copy link

I seem to be running into this issue as well. Are there any outstanding items on this PR that I might be able to help with to get it merged in?

@ZJONSSON
Copy link
Contributor Author

Do your problems go away when you use this branch? The only outstanding thing here is a code review afaik.

@justinsoliz
Copy link

NPM install per this comment does the trick for me:
#29 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants