Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Root project is vortex-array #67

Merged
merged 5 commits into from
Mar 5, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
58 changes: 29 additions & 29 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ members = [
"codecz-sys",
"fastlanez-sys",
"pyvortex",
"vortex",
"vortex-array",
"vortex-alloc",
"vortex-alp",
"vortex-dict",
Expand Down
26 changes: 20 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,23 @@
# Vortex

[![Build Status](https://github.com/fulcrum-so/vortex/actions/workflows/rust.yml/badge.svg)](https://github.com/fulcrum-so/vortex/actions)
[![Crates.io](https://img.shields.io/crates/v/vortex-array.svg)](https://crates.io/crates/vortex-array)
[![Documentation](https://docs.rs/vortex-rs/badge.svg)](https://docs.rs/vortex-array)
[![Rust](https://img.shields.io/badge/rust-1.76.0%2B-blue.svg?maxAge=3600)](https://github.com/fulcrum-so/vortex)

An in-memory format for 1-dimensional array data.

Vortex is a maximally [Apache Arrow](https://arrow.apache.org/) compatible data format that aims to separate logical and physical representation of data, and allow pluggable physical layout.
Vortex is a maximally [Apache Arrow](https://arrow.apache.org/) compatible data format that aims to separate logical and
physical representation of data, and allow pluggable physical layout.

Array operations are separately defined in terms of their semantics, dealing only with logical types and physical layout that defines exact ways in which values are transformed.
Array operations are separately defined in terms of their semantics, dealing only with logical types and physical layout
that defines exact ways in which values are transformed.

# Logical Types

Vortex type system only conveys semantic meaning of the array data without prescribing physical layout. When operating over arrays you can focus on semantics of the operation. Separately you can provide low level implementation dependent on particular physical operation.
Vortex type system only conveys semantic meaning of the array data without prescribing physical layout. When operating
over arrays you can focus on semantics of the operation. Separately you can provide low level implementation dependent
on particular physical operation.

```
Null: all null array
Expand All @@ -27,10 +36,15 @@ Struct: Named tuple of types

# Physical Encodings

Vortex calls array implementations encodings, they encode the physical layout of the data. Encodings are recurisvely nested, i.e. encodings contain other encodings. For every array you have their value data type and the its encoding that defines how operations will be performed. By default necessary encodings to zero copy convert to and from Apache Arrow are included in the package.
Vortex calls array implementations encodings, they encode the physical layout of the data. Encodings are recurisvely
nested, i.e. encodings contain other encodings. For every array you have their value data type and the its encoding that
defines how operations will be performed. By default necessary encodings to zero copy convert to and from Apache Arrow
are included in the package.

When performing operations they're disptached on the encodings to provide specialized implementation.
When performing operations they're dispatched on the encodings to provide specialized implementation.

## Compression

The advantage of separating physical layout from the semantic of the data is compression. Vortex can compress data without requiring changes to the logical operations. To support efficient data access we focus on lightweight compression algorithms only falling back to general purpose compressors for binary data.
The advantage of separating physical layout from the semantic of the data is compression. Vortex can compress data
without requiring changes to the logical operations. To support efficient data access we focus on lightweight
compression algorithms only falling back to general purpose compressors for binary data.
2 changes: 1 addition & 1 deletion bench-vortex/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ workspace = true

[dependencies]
arrow-array = "50.0.0"
vortex = { path = "../vortex" }
vortex-array = { path = "../vortex-array" }
vortex-alp = { path = "../vortex-alp" }
vortex-dict = { path = "../vortex-dict" }
vortex-fastlanes = { path = "../vortex-fastlanes" }
Expand Down
8 changes: 2 additions & 6 deletions bench-vortex/src/lib.rs
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@
use itertools::Itertools;
use vortex::array::bool::BoolEncoding;
use vortex::array::chunked::ChunkedEncoding;
use vortex::array::constant::ConstantEncoding;
Expand All @@ -18,7 +17,7 @@ use vortex_roaring::{RoaringBoolEncoding, RoaringIntEncoding};
use vortex_zigzag::ZigZagEncoding;

pub fn enumerate_arrays() -> Vec<&'static dyn Encoding> {
let encodings: Vec<&dyn Encoding> = vec![
vec![
// TODO(ngates): fix https://github.com/fulcrum-so/vortex/issues/35
// Builtins
&BoolEncoding,
Expand All @@ -41,9 +40,7 @@ pub fn enumerate_arrays() -> Vec<&'static dyn Encoding> {
&RoaringBoolEncoding,
&RoaringIntEncoding,
&ZigZagEncoding,
];
println!("{}", encodings.iter().map(|e| e.id()).format(", "));
encodings
]
}

#[cfg(test)]
Expand Down Expand Up @@ -96,7 +93,6 @@ mod test {

#[test]
fn compression_ratio() {
enumerate_arrays();
setup_logger();

let file = File::open(download_taxi_data()).unwrap();
Expand Down
2 changes: 1 addition & 1 deletion codecz-sys/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ workspace = true

[dependencies]
safe-transmute = "0.11.2"
vortex-alloc = { version = "0.1.0", path = "../vortex-alloc" }
vortex-alloc = { path = "../vortex-alloc" }

[build-dependencies]
bindgen = "0.69.1"
Expand Down
4 changes: 2 additions & 2 deletions codecz/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -19,11 +19,11 @@ enum-display = "0.1.3"
paste = "1.0.14"
safe-transmute = "0.11.2"
thiserror = "1.0.56"
codecz-sys = { version = "0.1.0", path = "../codecz-sys" }
codecz-sys = { path = "../codecz-sys" }
half = "2.3.1"
arrow-buffer = "50.0.0"
itertools = "0.12.1"
vortex-alloc = { version = "0.1.0", path = "../vortex-alloc" }
vortex-alloc = { path = "../vortex-alloc" }

[dependencies.num-traits]
version = "0.2"
Expand Down
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ packages = ["dummy"] # Required for workspace project
[tool.rye]
managed = true
dev-dependencies = [
"pytest==7.4.0",
"pytest>=7.4.0",
"pytest-benchmark>=4.0.0",
"ruff>=0.1.11",
"pip>=23.3.2",
Expand Down
2 changes: 1 addition & 1 deletion pyvortex/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ crate-type = ["rlib", "cdylib"]

[dependencies]
arrow = { version = "50.0.0", features = ["ffi"] }
vortex = { path = "../vortex" }
vortex-array = { path = "../vortex-array" }
vortex-alp = { path = "../vortex-alp" }
vortex-dict = { path = "../vortex-dict" }
vortex-fastlanes = { path = "../vortex-fastlanes" }
Expand Down
1 change: 0 additions & 1 deletion pyvortex/test/test_array.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
import pyarrow as pa
import pytest

import vortex


Expand Down
1 change: 0 additions & 1 deletion pyvortex/test/test_compress.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
import numpy as np
import pyarrow as pa

import vortex


Expand Down
3 changes: 1 addition & 2 deletions pyvortex/test/test_serde.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
import pyarrow as pa
from pyarrow import fs

import vortex
from pyarrow import fs

local = fs.LocalFileSystem()

Expand Down
4 changes: 1 addition & 3 deletions requirements-dev.lock
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
# pre: false
# features: []
# all-features: false
# with-sources: false

-e file:pyvortex
-e file:.
Expand Down Expand Up @@ -36,7 +37,6 @@ pathspec==0.12.1
platformdirs==4.2.0
pluggy==1.4.0
py-cpuinfo==9.0.0
py-spy==0.3.14
pyarrow==15.0.0
pygments==2.17.2
pymdown-extensions==10.7
Expand All @@ -50,8 +50,6 @@ regex==2023.12.25
requests==2.31.0
ruff==0.2.2
six==1.16.0
snakeviz==2.2.0
tornado==6.4
urllib3==2.2.1
verspec==0.1.0
watchdog==4.0.0
Expand Down
1 change: 1 addition & 0 deletions requirements.lock
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
# pre: false
# features: []
# all-features: false
# with-sources: false

-e file:pyvortex
-e file:.
4 changes: 2 additions & 2 deletions vortex-alp/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -13,10 +13,10 @@ rust-version = { workspace = true }

[dependencies]
arrow = { version = "50.0.0" }
vortex = { "path" = "../vortex" }
vortex-array = { path = "../vortex-array" }
linkme = "0.3.22"
itertools = "0.12.1"
codecz = { version = "0.1.0", path = "../codecz" }
codecz = { path = "../codecz" }
log = { version = "0.4.20", features = [] }

[lints]
Expand Down
4 changes: 2 additions & 2 deletions vortex/Cargo.toml → vortex-array/Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
[package]
name = "vortex"
name = "vortex-array"
version = { workspace = true }
description = "Vortex in memory columnar data format"
homepage = { workspace = true }
Expand Down Expand Up @@ -37,5 +37,5 @@ polars-ops = { version = "0.37.0", features = ["search_sorted"] }
rand = { version = "0.8.5", features = [] }
rayon = "1.8.1"
roaring = "0.10.3"
vortex-alloc = { version = "0.1.0", path = "../vortex-alloc" }
vortex-alloc = { path = "../vortex-alloc" }
thiserror = "1.0.57"
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -54,8 +54,8 @@ impl<'a, T: NativePType> StatsCompute for NullableValues<'a, T> {

if first_non_null.is_none() {
return Ok(StatsSet::from(HashMap::from([
(Stat::Min, NullableScalar::None(T::PTYPE.into()).boxed()),
(Stat::Max, NullableScalar::None(T::PTYPE.into()).boxed()),
(Stat::Min, NullableScalar::none(T::PTYPE.into()).boxed()),
(Stat::Max, NullableScalar::none(T::PTYPE.into()).boxed()),
(Stat::IsConstant, true.into()),
(Stat::IsSorted, true.into()),
(Stat::IsStrictSorted, true.into()),
Expand Down Expand Up @@ -205,7 +205,7 @@ mod test {
bit_width_freq,
vec![
0u64, 1, 2, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0
0, 0, 0, 0, 0, 0,
]
);
assert_eq!(run_count, 5);
Expand All @@ -228,4 +228,13 @@ mod test {
assert_eq!(min, Some(1));
assert_eq!(max, Some(2));
}

#[test]
fn all_null() {
let arr = PrimitiveArray::from_iter(vec![Option::<i32>::None, None, None]);
let min: Option<i32> = arr.stats().get_or_compute_as(&Stat::Min);
let max: Option<i32> = arr.stats().get_or_compute_as(&Stat::Max);
assert_eq!(min, None);
assert_eq!(max, None);
}
}
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
2 changes: 1 addition & 1 deletion vortex-dict/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ rust-version = { workspace = true }

[dependencies]
ahash = "0.8.7"
vortex = { "path" = "../vortex" }
vortex-array = { path = "../vortex-array" }
half = { version = "2.3.1", features = ["std", "num-traits"] }
hashbrown = "0.14.3"
linkme = "0.3.22"
Expand Down
Loading