HorsePower is designed for optimizing database queries with modern hardware. At its core is HorseIR, which is a well-designed array-based intermediate representation (IR) for database queries. Based on HorseIR, sophisticated compiler optimizations can be applied for database operations. Moreover, using array programming offers a promising option for performance speedup with fine-grained parallelism.
Figure 1. The workflow of the HorsePower framework.
In summer 2017, we started this project from scratch. The workflow of the HorsePower framework can be found in Figure 1. A candidate of the source language is our HorseIR language which is an extension of standard SQL. The Horse language is designed for data analytics with extended SQL features. At the current stage, we adopt execution plans from standard database SQL queries and MATLAB code. We provide a front end for parsing and transforming source code to HorseIR. After the optimization phases, multiple back-ends are supported. Static analyses and code optimizations are performed before the target code is generated. On the other hand, we provide an interpreter which allows running programs directly.
In HorsePower, we focus on the following parts.
- Design and implementation of array-based intermediate representation (IR)
- Static analysis for an array-based IR (i.e. HorseIR)
- Query optimizations with compiler optimizations
- Fine-grained primitive functions and highly tuned libraries
Download the repository
git clone [email protected]:Sable/HorsePower.git
Setup environment variables
cd HorsePower && source ./setup_env.sh
Installation with the following command line (About 13 mins)
(cd ${HORSE_LIB_FOLDER} && sh deploy_linux.sh)
After installation, new folders created as follows.
- include
- lib
- pcre2
Note, it is recommended to use gcc 8.1.0 or higher and additional library
uuid-dev
may be required during the installation.
Default data path for TPC-H
${HORSE_BASE}/data/tpch
In order to generate different scale factor datasets, you should run
cd data/tpch
./run.sh deploy ## Read instructions and update Makefile
./run.sh gendb 1 ## Generate database and save to data/tpch/db1
With a specific scale factor, for example, 1, its path is
${HORSE_BASE}/data/tpch/db1
It contains a tbl
file for each table
${HORSE_BASE}/data/tpch/db1/*.tbl
You are recommended to use the latest version as this project is still under active development.
To learn how to run, type
(cd ${HORSE_SRC_CODE} && ./run.sh) # show usage
Name | Notes |
---|---|
Platform | Cross-platform |
Tools | C/C++, Flex & Bison |
Parallelism | OpenMP/Pthread/CUDA/OpenCL |
Conventions | docs/conventions |
IR design
- Official IR design notes
- IR grammar in a yacc file (v3)
- IR types and rules: see also specific type rules
- IR online reference
Database TPC-H
Implementation
- Hanfeng Chen, Joseph Vinish D’silva, Hongji Chen, Bettina Kemme, and Laurie Hendren, HorseIR: Bringing Array Programming Languages together with Database Query Processing, Proceedings of the 14th Symposium on Dynamic Languages, (DLS '18), pp. 37-49, November 2018.
Copyright © 2017-2020, Hanfeng Chen, Laurie Hendren and McGill University.
- PCRE2: PCRE2 Licence