-
Notifications
You must be signed in to change notification settings - Fork 7
Home
- Switch to 中文页面
- This project aims at building a scalable transactional stream processing engine on modern hardware. It allows ACID transactions to be run directly on streaming data. It shares similar project vision with Flink StreamingLedger from Data Artisans , but MorphStream emphsizes more on improving system performance leveraging modern multicore processors.
- MorphStream is built based on our previous work of TStream (ICDE'20) but with significant changes: the codebase are exclusive.
- The code is still under active development and more features will be introduced. We welcome your contributions, if you are interested to contribute to the project, please fork and submit a PR. If you have questions, feel free to log an issue or write an email to me: shuhao_zhang AT hust.edu.cn
- [Under Review] Jianjun Zhao, Yancan Mao, Zhonghao Yang, Haikun Liu and Shuhao Zhang. Scalable Window-based Transactional Stream Processing with Non-deterministic State Access.
- [VLDBJ] Zhang, S., Soto, J. & Markl, V. A survey on transactional stream processing. The VLDB Journal 33, 451–479 (2024)
- [ICDE] Jianjun Zhao*, Haikun Liu, Shuhao Zhang, Zhuohui Duan, Xiaofei Liao, Hai Jin, Yu Zhang. Fast Parallel Recovery for Transactional Stream Processing on Multicores, ICDE, 2024
- [ICDE] Siqi Xiang*, Zhonghao Yang*, Shuhao Zhang, Jianjun Zhao, Yancan Mao. MorphStream: Scalable Processing of Transactions over Streams, ICDE (Demo), 2024
- [SIGMOD] Yancan Mao and Jianjun Zhao and Shuhao Zhang and Haikun Liu and Volker Markl. MorphStream: Adaptive Scheduling for Scalable Transactional Stream Processing on Multicores, SIGMOD, 2023
- [ICDE] Shuhao Zhang, Yingjun Wu, Feng Zhang, Bingsheng He. Towards Concurrent Stateful Stream Processing on Multicore Processors, ICDE, 2020
- [SIGMOD] Shuhao Zhang, Jiong He, Chi Zhou (Amelie), Bingsheng He. BriskStream: Scaling Stream Processing on Multicore Architectures, SIGMOD, 2019 (code: https://github.com/Xtra-Computing/briskstream)
- [ICDE] Shuhao Zhang, Bingsheng He, Daniel Dahlmeier, Amelie Chi Zhou, Thomas Heinze. Revisiting the design of data stream processing systems on multi-core processors, ICDE, 2017 (code: https://github.com/ShuhaoZhangTony/ProfilingStudy)
-
affinity
contains libraries used for pinning threads to corresponding CPUs in Multicore architecture. -
appliication
contains all experimental workloads implemented based on MorphStream's programming APIs. -
common
contains common utility variable/functions for the entire project. -
scripts
contain all experiment scripts to reproduce our experiments. -
state-engine
contains state management modules in MorphStream. -
stream-engine
contains transactional events scheduling modules in MorphStream.
-
System specification: We run all our experiment in
Component Details Server Dual-socket Intel Xeon Gold 6248R server with 384 GB DRAM OS Ubuntu Cores per Socket 24 cores of 3.00GHz L3 Cache 35.75MB NUMA Single socket used to isolate the impact of NUMA Core Pinning Each thread pinned to one core, using 1 to 24 cores to evaluate scalability OS Kernel Linux 4.15.0-118-generic JDK Version JDK 1.8.0_301 JVM Configuration -Xmx and -Xms set to 300 GB Garbage Collector G1GC, configured to not clear temporal objects such as processed TPGs and multi-versions of states -
Third-party Lib: Make sure to install and configure environments all required dependencies as shown below.
- wget:
sudo apt install wget -y
- make/cmake/gcc/g++:
sudo apt install make cmake gcc g++ -y
- JDK 1.8.x
- Maven 3.8.6:
sudo apt install maven -y
-
Intel Vtune: Configure Vtune path to
/opt/intel/oneapi/vtune/latest/bin64/vtune
, set/proc/sys/kernel/perf_event_paranoid
to 0.
- wget:
-
Environment Configurations: All environment configurations are extracted to
global.sh
, in fact, users only need to setproject_Dir
before running all our experiments. -
We categorize all our experiment scripts by 4 subsections in our paper, you can reproduce the results of each subsection by enter the associated folder and run scripts. Inside every folder, we can run each experiment easily with the following command. More detailed instructions can be found in
scripts/README.md
.cd $EXPERIMENT_FOLDER_PATH # The experiment must be done in the associated folder. bash ${EXPERIMENT_SCRIPT}.sh # e.g., EXPERIMENT_SCRIPT=PerformanceComparison
-
Results: All raw data results can be found in
${project_Dir}/result/data/
. We demonstrate how to draw all figures in our experiments inscripts/draw/README.md
.