Skip to content

Commit

Permalink
Some readme
Browse files Browse the repository at this point in the history
  • Loading branch information
robert3005 committed Mar 1, 2024
1 parent 8dc5dc4 commit cfa0710
Showing 1 changed file with 35 additions and 1 deletion.
36 changes: 35 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,2 +1,36 @@
# Vortex
An in-memory specification for 1-dimensional array data.

An in-memory format for 1-dimensional array data.

Vortex is a maximally [Apache Arrow](https://arrow.apache.org/) compatible data format that aims to separate logical and physical representation of data, and allow pluggable physical layout.

Array operations are separately defined in terms of their semantics, dealing only with logical types and physical layout that defines exact ways in which values are transformed.

# Logical Types

Vortex type system only conveys semantic meaning of the array data without prescribing physical layout. When operating over arrays you can focus on semantics of the operation. Separately you can provide low level implementation dependent on particular physical operation.

```
Null: all null array
Bool: Single bit value
Integer: Fixed width signed/unsigned number. Supports 8, 16, 32, 64 bit widths
Float: Fixed width floating point number. Supports 16, 32, 64 bit float types
Decimal: Fixed width decimal with specified precision (total number of digits) and scale (number of digits after decimal point)
Instant: An instantaneous point on the time-line. Number of seconds/miliseconds/microseconds/nanoseconds from epoch
LocalDate: A date without a time-zone
LocalTime: A time without a time-zone
ZonedDateTime: A data and time including ISO-8601 timezone
List: Sequence of items of same type
Map: Key, value mapping
Struct: Named tuple of types
```

# Physical Encodings

Vortex calls array implementations encodings, they encode the physical layout of the data. Encodings are recurisvely nested, i.e. encodings contain other encodings. For every array you have their value data type and the its encoding that defines how operations will be performed. By default necessary encodings to zero copy convert to and from Apache Arrow are included in the package.

When performing operations they're disptached on the encodings to provide specialized implementation.

## Compression

The advantage of separating physical layout from the semantic of the data is compression. Vortex can compress data without requiring changes to the logical operations. To support efficient data access we focus on lightweight compression algorithms only falling back to general purpose compressors for binary data.

0 comments on commit cfa0710

Please sign in to comment.