The title of our paper submitted to DLS18 is
HorseIR: Bringing Array Programming Languages together with Database Query Processing
This repository is created for showing the reproducibility of our experiments in this paper. We provide the details of scripts and original data used in the experiments. There are mainly two systems: HorseIR and RDBMS MonetDB. We supply step-by-step instructions to configure and deploy both systems in the experiments.
In this repository, you will find:
- the software you need to install (Sec. 2);
- how to configure databases (Sec. 3);
- how to run experiments (Sec. 4);
- the results used in the paper (Sec. 5);
- the contacts if you have any further questions (Sec. 6).
The logo was based on an anonymous horse and Laurie's dragon.
Let DIR_EXP
be the directory where the experiments take place.
In our local tests, we used
DIR_EXP=/mnt/local
Please install the following systems before you go to next sections.
- TPC-H: version v2.17.0
- GCC-7: version v7.2.0
- MonetDB: version Jul2017-SP2
- HorseIR
# TPC-H (found in ./exp/query/)
unzip tpc-h-tool.zip
cd tpch_2_17_0/dbgen
make
# Generate data with scale factor 1.
./dbgen -s 1
In our local tests, we selected 8 queries which were saved in
<DIR_EXP>/query/chf/
Moreover, 5 databases with 5 different scale factors (SF 1/2/4/8/16) were created as follows.
# SF 1
mkdir <DIR_EXP>/tpch-tbl/db1
./dbgen -s 1
mv *.tbl <DIR_EXP>/tpch-tbl/db1
# ... repeat for SF 2, 4, 8, 16
Then, after you check the directory tpch-tbl
, you can see 5 folders. In each
folder, it contains different sizes of data.
ls <DIR_EXP>/tpch-tbl
db1 db16 db2 db4 db8
Reference: TPC - Current specifications
Note: about 5 hours spent on compiling GCC from source code.
# Download GCC-7
wget https://ftp.gnu.org/gnu/gcc/gcc-7.2.0/gcc-7.2.0.tar.gz
tar -xf gcc-7.2.0.tar.gz
cd gcc-7.2.0
./contrib/download_prerequisites
./configure --prefix=<DIR_EXP>/gcc
make
make install
# set alias
ln -s <DIR_EXP>/gcc/bin/gcc <DIR_BIN>/gcc-7
The directory DIR_BIN
is the directory which has been included in PATH
.
# Download MonetDB 2017-SP2
wget https://www.monetdb.org/downloads/sources/Jul2017-SP2/MonetDB-11.27.9.tar.bz2
tar -xf MonetDB-11.27.9.tar.bz2
cd MonetDB-11.27.9
# configure with performance flags enabled
./configure --prefix=<DIR_EXP>/monetdb17 --enable-debug=no --enable-assert=no --enable-optimize=yes
make
make install
# set alias
ln -s <DIR_EXP>/monetdb17/bin/mclient <DIR_BIN>/mclient
ln -s <DIR_EXP>/monetdb17/bin/monetdb <DIR_BIN>/monetdb
ln -s <DIR_EXP>/monetdb17/bin/monetdbd <DIR_BIN>/monetdbd
ln -s <DIR_EXP>/monetdb17/bin/mserver5 <DIR_BIN>/mserver5
The directory DIR_BIN
is the directory which has been included in PATH
.
Create a configuration file ~/.monetdb
with the following content:
user=monetdb
password=monetdb
language=sql
Then, type mclient
to login MonetDB. Please check if you see the version
number Jul2017-SP2
on the top of the welcome message. Moreover, try the
following command to see if the database has been installed successfully.
sql>SELECT 'Hello world';
+-------------+
| L2 |
+=============+
| Hello world |
+-------------+
1 tuple (1.328ms)
Reference: How to install MonetDB and the instroduction of server and client programs.
Download HorseIR project from GitHub
git clone [email protected]:Sable/HorsePower.git
cd HorsePower
git checkout 34ca12ac4e725e8e1c8e9d5061904c752130bd76
Download and install libraries (about 7 minutes)
(cd libs && sh deploy_linux.sh)
Setup experiments
# build HorseIR system
(cd exp-dls18 && ./setup)
# setup and generate HorseIR programs from execution plans
(cd exp-dls18 && ./setup_translator)
Troubleshooting
- Remove an empty import statement in
genIR.py
- Line
from analysis import *
- Line
External links
- A LL-based grammar for the HorseIR translator: click here
We need to create
- 1 data farm,
TPCH
; and - 5 TPC-H related databases,
tpch1/2/4/8/16
Note: In each TPC-H related database, there are 8 tables: part
, supplier
,
partsupp
, lineitem
, customer
, nation
, region
, and orders
.
Create a datafarm
monetdbd create TPCH
monetdbd start TPCH
Create and release database 'tpch1' (similar for database tpch2/4/8/16)
monetdb create tpch1
monetdb release tpch1
monetdb start tpch1
Login a database
mclient -d tpch1
Load data to TPCH (inside mclient)
# create empty tables
\< <DIR_EXP>/tpch_2_17_0/dbgen/dss.ddl
# import data (should replace /mnt/local with <DIR_EXP> in initTPCH1.txt
\< <DIR_EXP>/script-tpch/initTPCH1.txt
Remember to set soft aliases for your data created in TPC-H, so that data copies can be avoided.
cd src/HorseIR/data
ln -s <DIR_EXP>/tpch-tbl tpch
The overview of the machine used in our local test is
- Server named,
Sable-Intel
- 4 Intel Xeon E7-4850 2.00 GHz
- Total 40 cores with 80 threads
- 128GB RAM
- Ubuntu 16.04.02 LTS
Note: by default, MonetDB uses all 80 threads with mclient
. Therefore,
we need to use mserver5
instead to set the number of threads to, for example,
40.
# SF 1
mserver5 --dbpath=<DIR_EXP>/datafarm/TPCHDB/tpch1 --set monet_vault_key=<DIR_EXP>/datafarm/TPCHDB/tpch1/.vaultkey --set gdk_nr_threads=40
Then, a new terminal should be opened and run MonetDB with the following command:
# SF 1
(cd <DIR_EXP>/script-tpch && time ./runtest | mclient -d tpch1) &> summary1.log
Finally, exit mserver5 by typing \q
or CTRL+D
.
Repeat the process for SF 2, 4, 8, and 16.
In each log file (e.g. <DIR_EXP>/script-tpch/sf1/\*.log
), search for the
keyword avg_query
to identify the average of the last 10 runs. The
details of the 10 runs can be found just above it with a single-column table
(i.e. query_time).
Run HorseIR with 8 queries and 15 times on SF 1/2/4/8/16, and report the average execution time (ms)
# base dir is 'HorsePower'
(cd exp-dls18 && time ./run_all.sh)
The script run_all.sh
sets different scales and invokes the script
test-thread.sh
for each scale. The script test-thread.sh
computes each
query 15 times and returns the average. As a result, all output information is
saved into a log file, for example, sf1/log_thread_1.log
contains the
information of all queries for SF 1.
Fetch brief information from a log file
cat sf1/log_thread_1.log | grep -E 'Run with 15 times|HorseIR Optimization Level'
All log files can be found in the folder exp/script-tpch
Note: You can click a figure on the left side to see its original figure.
Table overview | Description |
---|---|
An example HorseIR module | |
See details in data and scripts for Figure 7. | |
See details in data and scripts for Figure 8. |
Figures | Description |
---|---|
See details in the selected 8 queries. | |
See details in MonetDB-thread-1-log, MonetDB-thread-2-log, MonetDB-thread-4-log, MonetDB-thread-8-log, and MonetDB-thread-16-log, |
If you have any questions regarding MonetDB or HorseIR, such as database configuration problems, please contact Hanfeng Chen ([email protected]).