This the implementation of the Engine
contract of Open Data Fabric using the Apache Arrow DataFusion data processing framework. It is currently in use in kamu-cli data management tool.
This engine is experimental and has limited functionality due to being batch-oriented, but is extremely fast and low-footprint. There are ongoing attempts to add stream processing functionality.
We recommend using this engine only for basic filter/map operations that do not require temporal processing. If you need temporal JOINs, aggregations, windowing, and watermark semantics - take a look at Apache Flink ODF Engine.
Also note that this engine does not automatically handle retractions and corrections. If you perform map/filter operations on the stream that can contain retractions and corrections - make sure to manually propagate the op
column. If output does not contain an op
column - all emitted records will be considered as appends.
More information and engine comparisons are available here.
This is a Rust-based project. You can follow similar steps as in kamu-cli
development guide.