Skip to content
Jianjun Zhao edited this page Nov 11, 2024 · 14 revisions

Welcome to MorphStream Wiki

Project Overview

Java CI with Maven

  • This project aims at building a scalable transactional stream processing engine on modern hardware. It allows ACID transactions to be run directly on streaming data. It shares similar project vision with Flink StreamingLedger from Data Artisans , but MorphStream emphsizes more on improving system performance leveraging modern multicore processors.
  • MorphStream is built based on our previous work of TStream (ICDE'20) but with significant changes: the codebase are exclusive.
  • The code is still under active development and more features will be introduced. We welcome your contributions, if you are interested to contribute to the project, please fork and submit a PR. If you have questions, feel free to log an issue or write an email to me: shuhao_zhang AT hust.edu.cn

Publications

  • [Under Review] Jianjun Zhao, Yancan Mao, Zhonghao Yang, Haikun Liu and Shuhao Zhang. Scalable Window-based Transactional Stream Processing with Non-deterministic State Access.
  • [VLDBJ] Zhang, S., Soto, J. & Markl, V. A survey on transactional stream processing. The VLDB Journal 33, 451–479 (2024)
  • [ICDE] Jianjun Zhao*, Haikun Liu, Shuhao Zhang, Zhuohui Duan, Xiaofei Liao, Hai Jin, Yu Zhang. Fast Parallel Recovery for Transactional Stream Processing on Multicores, ICDE, 2024
  • [ICDE] Siqi Xiang*, Zhonghao Yang*, Shuhao Zhang, Jianjun Zhao, Yancan Mao. MorphStream: Scalable Processing of Transactions over Streams, ICDE (Demo), 2024
  • [SIGMOD] Yancan Mao and Jianjun Zhao and Shuhao Zhang and Haikun Liu and Volker Markl. MorphStream: Adaptive Scheduling for Scalable Transactional Stream Processing on Multicores, SIGMOD, 2023
  • [ICDE] Shuhao Zhang, Yingjun Wu, Feng Zhang, Bingsheng He. Towards Concurrent Stateful Stream Processing on Multicore Processors, ICDE, 2020
  • [SIGMOD] Shuhao Zhang, Jiong He, Chi Zhou (Amelie), Bingsheng He. BriskStream: Scaling Stream Processing on Multicore Architectures, SIGMOD, 2019 (code: https://github.com/Xtra-Computing/briskstream)
  • [ICDE] Shuhao Zhang, Bingsheng He, Daniel Dahlmeier, Amelie Chi Zhou, Thomas Heinze. Revisiting the design of data stream processing systems on multi-core processors, ICDE, 2017 (code: https://github.com/ShuhaoZhangTony/ProfilingStudy)

Project Structrue

  • affinity contains libraries used for pinning threads to corresponding CPUs in Multicore architecture.
  • appliication contains all experimental workloads implemented based on MorphStream's programming APIs.
  • common contains common utility variable/functions for the entire project.
  • scripts contain all experiment scripts to reproduce our experiments.
  • state-engine contains state management modules in MorphStream.
  • stream-engine contains transactional events scheduling modules in MorphStream.

How to Run MorphStream

  1. System specification: We run all our experiment in

    Component Details
    Server Dual-socket Intel Xeon Gold 6248R server with 384 GB DRAM
    OS Ubuntu
    Cores per Socket 24 cores of 3.00GHz
    L3 Cache 35.75MB
    NUMA Single socket used to isolate the impact of NUMA
    Core Pinning Each thread pinned to one core, using 1 to 24 cores to evaluate scalability
    OS Kernel Linux 4.15.0-118-generic
    JDK Version JDK 1.8.0_301
    JVM Configuration -Xmx and -Xms set to 300 GB
    Garbage Collector G1GC, configured to not clear temporal objects such as processed TPGs and multi-versions of states
  2. Third-party Lib: Make sure to install and configure environments all required dependencies as shown below.

    • wget: sudo apt install wget -y
    • make/cmake/gcc/g++: sudo apt install make cmake gcc g++ -y
    • JDK 1.8.x
    • Maven 3.8.6: sudo apt install maven -y
    • Intel Vtune: Configure Vtune path to /opt/intel/oneapi/vtune/latest/bin64/vtune, set /proc/sys/kernel/perf_event_paranoid to 0.
  3. Environment Configurations: All environment configurations are extracted to global.sh, in fact, users only need to set project_Dir before running all our experiments.

  4. We categorize all our experiment scripts by 4 subsections in our paper, you can reproduce the results of each subsection by enter the associated folder and run scripts. Inside every folder, we can run each experiment easily with the following command. More detailed instructions can be found in scripts/README.md.

    cd $EXPERIMENT_FOLDER_PATH # The experiment must be done in the associated folder.
    bash ${EXPERIMENT_SCRIPT}.sh # e.g., EXPERIMENT_SCRIPT=PerformanceComparison
    
  5. Results: All raw data results can be found in ${project_Dir}/result/data/. We demonstrate how to draw all figures in our experiments in scripts/draw/README.md.

Clone this wiki locally