Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Website] Correct statement about compression in FAQ #541

Merged
merged 2 commits into from
Sep 14, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 6 additions & 4 deletions faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -180,10 +180,12 @@ This efficiency comes at the cost of relatively expensive reading into memory,
as Parquet data cannot be directly operated on but must be decoded in
large chunks.

Conversely, Arrow is an in-memory format meant for direct and efficient use
for computational purposes. Arrow data is not compressed (or only lightly so,
when using dictionary encoding) but laid out in natural format for the CPU,
so that data can be accessed at arbitrary places at full speed.
Conversely, Arrow is an in-memory format meant primarily for direct and
efficient use for computational purposes. Arrow data is typically not
compressed but laid out in natural format for the CPU, so that data can be
accessed at arbitrary places at full speed. (However, Arrow does provide a
limited set of options for increasing space efficiency, including
dictionary encoding, run-end encoding, and buffer compression.)

Therefore, Arrow and Parquet complement each other
and are commonly used together in applications. Storing your data on disk
Expand Down