Spark engine image consists of the following main parts:
- Spark application - the engine itself
- ODF adapter - RPC server used to communicate with the engine
- Apache Livy - HTTP gateway for Spark used by Jupyter Notebook integration
This repo includes submodules.
When cloning use:
git clone --recurse-submodules https://github.com/kamu-data/kamu-engine-spark.git
If you forgot to do so you can pull them later with:
git submodule update --init
You will need:
docker
rust
toolchain (see kamu-cli dev guide)- Java/Scala toolchain
To install Java & Scala we recommend using SdkMan. Install the tool itself and then you can use following versions of components:
sdk use java 17.0.10-oracle
sdk use maven 3.9.6
sdk use sbt 1.9.8
sdk use scala 2.12.18
Once installed, you can configure IntelliJ IDEA to use this runtime and tools and open the directory as an SBT project.
ODF adapter is a rust
application - to build it follow the same approach as for kamu-cli.
Livy, unfortunately, has to be built from a fork. Follow the instructions here and then place the final artifact under /image/apache-livy-{version}-bin.zip
.
Luckily, this is needed to be done once and having to rebuild it is very rare.
When developing the engine you can run sbt
in the root directory and then use:
compile
- to compile the codeassembly
- to build a distributiontest
- to run all teststestOnly dev.kamu.engine.spark.ingest.WatermarkTest
- to run a specific test suitetestOnly dev.kamu.engine.spark.ingest.WatermarkTest -- -z "returns max"
- to run a specific test case
IMPORTANT: Integration tests (those named as
Engine*
) run engine inside a docker image and mount your local assembly into it - make sure you both have built a new image (or pulled an older one) and have ransbt assembly
before re-running such tests, otherwise your changes will not have an effect.
To release a new version:
- Commit your changes
- Increment the version number in
Makefile
- Build image using
make image-multi-arch
- This will build the adapter and the engine assembly
- See docker manual about building multi-arch images
- Note: on Linux to perform multi-arch build it seems to be enough to create a
buildx
runner as:The runner seems to come equipped with QEMU allowing us to cross-builddocker buildx create --use --name multi-arch-builder
- Push image to the registry using
make image-push
- Tag your last commit with
vX.Y.Z
- Push changes to git repo