ZeroTune is a novel zero-shot learned approach for determining cost-effective parallelism degrees in distributed stream processing systems.
Please cite our papers, if you find this work useful or use it in your paper as a baseline.
@inproceedings {agnihotri24icde,
author = {Agnihotri, Pratyush and Koldehofe, Boris and Stiegele, Paul and Heinrich, Roman and Binnig, Carsten and Luthra, Manisha},
title = {ZeroTune: Learned Zero-Shot Cost Model for Parallelism Tuning in Stream Processing},
year = {2024},
booktitle={40th IEEE International Conference on Data Engineering (ICDE)},
pages = {1–14},
numpages = {14}
}
@inproceedings {agnihotri23aidm,
author = {Agnihotri, Pratyush and Koldehofe, Boris and Binnig, Carsten and Luthra, Manisha},
title = {Zero-Shot Cost Models for Parallel Stream Processing},
year = {2023},
isbn = {9798400701931},
url = {[Zero-Shot Cost Models for Parallel Stream Processing | Proceedings of the Sixth International Workshop on Exploiting Artificial Intelligence Techniques for Data Management](https://doi.org/10.1145/3593078.3593934)},
doi = {10.1145/3593078.3593934},
booktitle = {Proceedings of the Sixth International Workshop on Exploiting Artificial Intelligence Techniques for Data Management (aiDM@SIGMOD)},
articleno = {5},
numpages = {5},
series = {aiDM '23}
}
@inproceedings {agnihotri22debs,
author = {Agnihotri, Pratyush and Koldehofe, Boris and Binnig, Carsten and Luthra, Manisha},
title = {PANDA: performance prediction for parallel and dynamic stream processing},
year = {2022},
isbn = {9781450393089},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3524860.3543281},
doi = {10.1145/3524860.3543281},
booktitle = {Proceedings of the 16th ACM International Conference on Distributed and Event-Based Systems},
pages = {180–181},
numpages = {2},
series = {DEBS '22}
}
This repository is created to support our paper submission titled "ZeroTune", showcasing the capabilities of zero-shot model.
-
zerotune-management: The main instructions to setup is in zerotune-management. It consists collection of scripts that facilitate the seamless setup of both local and remote clusters. These clusters serve as the foundation for the parallel query plan generator and environment for zero-shot model for training and test purpose.
-
zerotune-plan-generator: Apache flink client application which functions as an essential tool for generating synthetic and benchmark parallel query plans. These plans are vital for the training and testing of data, a crucial aspect of our zero-shot learning model.
-
zerotune-learning: zero-shot model that specializes in providing accurate cost predictions for distributed parallel stream processing.
-
Flink-Observation: Modified the fork of Apache Flink for custom logging of observation of workload characteristics and login them in MongoDB database.
-
zerotune-VM_image: VM image includes all the necessary code and dependencies that are required to generate training data and training of the model.