Data Caterer is a metadata driven data generation tool that aids in creating production like data across batch and event data systems. Run data validations to ensure your systems have ingested it as expected. Use the Java, Scala API, or YAML files to help with setup or customisation that are all run via Docker.
This repo contains example Java and Scala API usage for Data Caterer.
Can follow detailed documentation found here for more details.
- Create new Java class similar
to DocumentationJavaPlanRun.java
- Needs to extend
io.github.datacatering.datacaterer.javaapi.api.PlanRun
- Needs to extend
- Create new Scala class similar
to DocumentationPlanRun.scala
- Needs to extend
io.github.datacatering.datacaterer.api.PlanRun
- Needs to extend
Requires:
- Docker
./run.sh
#check results under docker/sample/report/index.html folder
Create your own Docker image via:
./gradlew clean build
docker build -t <my_image_name>:<my_image_tag> .
docker run -e PLAN_CLASS=io.github.datacatering.plan.DocumentationPlanRun -v ${PWD}/docs/run:/opt/app/data <my_image_name>:<my_image_tag>
#check results under docs/run folder
Run with own class from either Java or Scala API:
./gradlew clean build
cd docker
PLAN_CLASS=io.github.datacatering.plan.DocumentationPlanRun DATA_SOURCE=postgres docker-compose up -d datacaterer
Details from docs.
Docker compose sample found under docker
folder.
cd docker
docker-compose up -d datacaterer
Check result under here.
Change to another data source via:
- postgres
- mysql
- cassandra
- solace
- kafka
- http
DATA_SOURCE=cassandra docker-compose up -d datacaterer
helm install data-caterer ./data-caterer-example/helm/data-caterer
Base benchmark tests can be run via:
bash benchmark/run_benchmark.sh
Results can be found under benchmark/results.