Apache Spark Engine

This the implementation of the Engine contract of Open Data Fabric using the Apache Spark data processing framework. It is currently in use in kamu-cli data management tool.

Features

Spark engine currently provides the most rich SQL dialect for map/filter style transformations
Integrates GeoSpark to provide geo-spatial SQL functions
It is used by kamu-cli for ingesting data into Parquet
It is used by kamu-cli along with Apache Livy to provide SQL queries functionality in the Jupyter notebooks

Known Issues

Takes a long time to start up which is hurting the user experience
Does not support temporal table joins
- You might be better off using Flink-based engine for joining and aggregating event streams
TODO

Developing

See the Developer Guide

Name		Name	Last commit message	Last commit date
Latest commit History 128 Commits
adapter		adapter
core.manifests @ 72df027		core.manifests @ 72df027
core.utils @ e31d856		core.utils @ e31d856
image		image
project		project
src		src
.dockerignore		.dockerignore
.gitignore		.gitignore
.gitmodules		.gitmodules
.scalafmt.conf		.scalafmt.conf
CHANGELOG.md		CHANGELOG.md
DEVELOPER.md		DEVELOPER.md
LICENSE.txt		LICENSE.txt
Makefile		Makefile
README.md		README.md
build.sbt		build.sbt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Apache Spark Engine

Features

Known Issues

Developing

About

Releases

Packages

Contributors 2

Languages

License

kamu-data/kamu-engine-spark

Folders and files

Latest commit

History

Repository files navigation

Apache Spark Engine

Features

Known Issues

Developing

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages