This package exports a compressors
object intended to be passed into hyparquet.
Apache Parquet is a popular columnar storage format that is widely used in data engineering, data science, and machine learning applications for efficiently storing and processing large datasets. It supports a number of different compression formats, but most parquet files use snappy compression.
The hyparquet library by default only supports uncompressed
and snappy
compressed files. The hyparquet-compressors
package extends support for all legal parquet compression formats.
The hyparquet-compressors
package works in both node.js and the browser. Uses js and wasm packages, no system dependencies.
import { parquetRead } from 'hyparquet'
import { compressors } from 'hyparquet-compressors'
await parquetRead({ file, compressors, onComplete: console.log })
See hyparquet repo for further info.
Parquet compression types supported with hyparquet-compressors
:
- Uncompressed
- Snappy
- Gzip
- LZO
- Brotli
- LZ4
- ZSTD
- LZ4_RAW
Snappy compression uses hysnappy for fast snappy decompression using minimal wasm.
New gzip implementation adapted from fflate. Includes modifications to handle repeated back-to-back gzip streams that sometimes occur in parquet files (but was not supported by fflate).
Includes a minimal port of brotli.js which pre-compresses the brotli dictionary using gzip to minimize the distribution bundle size.
New LZ4 implementation includes support for legacy hadoop LZ4 frame format used on some old parquet files.
Uses fzstd for Zstandard decompression.
File | Size |
---|---|
hyparquet-compressors.min.js | 116.1kb |
hyparquet-compressors.min.js.gz | 75.2kb |
- https://parquet.apache.org/docs/file-format/data-pages/compression/
- https://en.wikipedia.org/wiki/Brotli
- https://en.wikipedia.org/wiki/Gzip
- https://en.wikipedia.org/wiki/LZ4_(compression_algorithm)
- https://en.wikipedia.org/wiki/Snappy_(compression)
- https://en.wikipedia.org/wiki/Zstd
- https://github.com/101arrowz/fflate
- https://github.com/101arrowz/fzstd
- https://github.com/foliojs/brotli.js
- https://github.com/hyparam/hysnappy