Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Read and Write Apache Parquet #6699

Open
simonaubertbd opened this issue Jan 9, 2024 · 5 comments
Open

Read and Write Apache Parquet #6699

simonaubertbd opened this issue Jan 9, 2024 · 5 comments
Labels
meal This will take a day or two wish

Comments

@simonaubertbd
Copy link

What's your use case?
Apache Parquet ( https://parquet.apache.org/ ) becomes more and more popular and I think it's like a standard now in the data community, this is no more restricted to Hadoop People. Qlik supports it, Alteryx will support it in the next release, even LibreOffice is working on it, etc, etc.
Why?
-opensource format
-fast

What's your proposed solution?
To have Orange Data Mining support Apache Parquet files for read and write.

Are there any alternative solutions?
To convert parquet files before but seems useless

@markotoplak
Copy link
Member

markotoplak commented Jan 9, 2024

Makes sense indeed. Orange lacks a robust and fast file format.

When I need fast reading, I resort to picked tables, but a robust format like that would be a big improvement.

@janezd janezd added the meal This will take a day or two label Jan 12, 2024
@simonaubertbd
Copy link
Author

Hello @markotoplak do you plan to add it in a future release ? Another point is that it would help the corporate and the research worlds communicate each other.

Best regards,

Simon

@markotoplak
Copy link
Member

@simonaubertbd, first one on our list is HDF5 support, then we can also consider Parquet.

But if anyone does Parque we'll gladly merge it.

@simonaubertbd
Copy link
Author

@zhuyubei Whoa, pretty impressive, congrats. Is there a pull request for that ? O_o

@zhuyubei
Copy link

@zhuyubei Whoa, pretty impressive, congrats. Is there a pull request for that ? O_o

But my code redesigned the whole "Table" class to store data as dataframe instead of numpy. It might not work with the current version.
Let me try to figure out how to implement a parquet reader in current Orange3 version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
meal This will take a day or two wish
Projects
None yet
Development

No branches or pull requests

4 participants