Skip to content
Rod Butters edited this page Oct 31, 2018 · 3 revisions

Quick Start: running OLTP-Bench for Splice

Welcome! This quick start is designed to get you up and running CH-benCHmark with OLTP-Bench on Splice Machine.

It assumes you're going to start with a local copy of Splice Machine to get OLTP-Bench set up and working. Once that's in place you can review the scripts and parameters and begin exploring the opportunities for HTAP on Splice Machine either in a cluster or on the Splice Machine cloud.

Get started with a standalone copy of Splice Machine

To get started, register and download your standalone copy of Splice Machine here and follow the setup process here.

A convenient location for installation of Splice Machine is /usr/local/splicemachine. For example:

$ mkdir /usr/local/splicemachine
$ tar -xf SPLICEMACHINE-2.7.0.1815.standalone.tar -C /usr/local/splicemachine --strip-components=1

The remainder of this quick start assumes you'll be using this location. You'll need to adjust some of the paths used in the scripts and instructions that follow if you choose a different location.

Be sure to go all the way through the steps in the setup process including starting the Splice Machine CLI, sqlshell.sh. You'll be using the CLI to create the user, database, and load the benchmark data. But don't worry - sample scripts are included in this quickstart directory to get you up and running fast.

Once your installation of Splice Machine is complete, leave it up and running for the steps that follow.

Note: Java 8 is required for both Splice Machine and OLTP-Bench. The instructions for installing Splice Machine include installation of Java 8. To check your Java version use java -version. You should have a minimum of 16GB of memory to run this benchmark locally.

Build OLTP-Bench

The modified code and configuration files supporting Splice Machine are in this repository. Included is an ant build script that will build the oltpbench executable.

Get the source code:

$ mkdir ~/htap-benchmark
$ cd ~/htap-benchmark
$ git clone https://github.com/splicemachine/htap-benchmark

Build OLTP-Bench:

$ ant
Buildfile: /Users/rdb/dev/htap-benchmark/build.xml

build:
    [mkdir] Created dir: /Users/rdb/dev/htap-benchmark/build/META-INF
     [copy] Copying 1 file to /Users/rdb/dev/htap-benchmark/build/META-INF
    [javac] Compiling 419 source files to /Users/rdb/dev/htap-benchmark/build
    [javac] Note: Some input files use or override a deprecated API.
    [javac] Note: Recompile with -Xlint:deprecation for details.
    [javac] Note: Some input files use unchecked or unsafe operations.
    [javac] Note: Recompile with -Xlint:unchecked for details.
     [copy] Copying 132 files to /Users/rdb/dev/htap-benchmark/build
    [mkdir] Created dir: /Users/rdb/dev/htap-benchmark/build/tests
    [javac] Compiling 54 source files to /Users/rdb/dev/htap-benchmark/build/tests
    [javac] Note: Some input files use or override a deprecated API.
    [javac] Note: Recompile with -Xlint:deprecation for details.

BUILD SUCCESSFUL
Total time: 7 seconds

Download the HTAP-25 data set

Start small - this is local on your laptop. The scripts included in the repository are for loading and executing CH-benCHmark on a 25 warehouse data set. To speed up the process, the data set is available on AWS S3. Download the data set using the AWS CLI. (If you changed the location of your Splice Machine installation from /usr/local/splicemachine, you'll need to adjust the instructions and scripts that follow.)

$ # Download the htap-25 data set
$ mkdir /usr/local/splicemachine/demodata/htap-25 
$ aws s3 cp s3://splice-benchmark-data/flat/HTAP/htap-25 /usr/local/splicemachine/demodata/htap-25 --recursive

Now you're ready to go.

Try it out

Create the user, database and load the data:

$ cd ~/htap-benchmark/quickstart
$ ./load-htap-25.sh

This script uses the default username and password set up by the local splicemachine install, splice and admin. It loads the data set for 25 warehouses that you downloaded from AWS S3. Check out the SQL command file, load-htap-25.sql, to see the details.

Run an htap-25 on your local Splice Machine:

$ ./exec-htap-25.sh

This script runs CH-benCHmark through four scenarios for 0, 1, 2, and 4 analytic workers with a constant 25 transactional process workers on the data store. Logging is directed to the terminal so you can see the action.

Review the results

The results files appear in the subdirectory ~/htap-benchmark/results and are in .csv format. The results files will include:

  • htap-25_0-res.csv: this file contains the throughput results for the run with 25 transactional process workers and no analytic workers. htap-25_[0-4]-res.csv contains the results for 25 transactional process workers with 1, 2, and 4 analytic workers respectively.

  • htap-25_0_NewOrder-res.csv: this file contains the throughput for the New Order transactional process. The column "throughput(req/sec)" x 60 gives you the the aggregate number of new order transactions per minute (tpmC) that is quoted in TPC-C results. This number will be less than the throughput you will see with a larger cluster and larger data set - remember we're starting off small and manageable. Likewise, the files htap-25_[1-4]_NewOrder-res.csv represent the results with additional analytics workers on the system.

  • htap-25_[0-4]_*-res.csv: these are the results for the other queries in the benchmark.

  • htap-25_[0-4]-raw.csv: in case you are interested, these are the raw results from the run.

Try it again for more results

To compare runs, particularly at larger scale, it's best to start out with the same initial state of the data set. Included is a script that will clear the database before a subsequent run:

$ ./drop-htap.sh

Go through the load and execute processes again for more results.

Try different configurations

We've included data sets in the S3 bucket for 2, 25, 250, 1,000 and 10,000 warehouses.

$ aws s3 ls s3://splice-benchmark-data/flat/HTAP/
                           PRE htap-1000/
                           PRE htap-10k/
                           PRE htap-2/
                           PRE htap-25/
                           PRE htap-250/

It won't be practical to run the data sets larger than htap-25 on a single machine. For these a 4 node cluster or the Splice Machine cloud is recommended. Also, check out the configuration files, ~/htap-benchmark/config/htap_config_splicemachine_[1-4].xml. These control the parameters used in the benchmark run including the number of workers, the time of the run, and the mix of the workload. For more information on additional command options and configuration file parameters in OLTP-Bench, visit the OLTP-Bench Quick Start.

Some additional command line options and configuration file parameters we found useful:

  • Command line options:
    • -b: specifies multiple benchmarks in a run. Here we are using this to run both tpcc, the transactional portion of CH-benCHmark, and chbenchmark, the analytic portion for the full CH-benCHmark.
    • -s: this is the running time of the benchmark in seconds.
    • -ss: provides results files for each specific query.

Example:

./oltpbenchmark -b 'tpcc,chbenchmark' -c config/htap_config_splicemachine_1.xml --execute=true -s 300 -ss -o htap-25_1
  • Configuration file parameters:
    • <active_terminals bench="chbenchmark">1</active_terminals>: determines the number of workers for a particular workload in the benchmark. Allows us to to adjust the ratio of workers for each workload in a run.
    • <rate bench="chbenchmark">unlimited</rate>: specifies an open system model for workload generation. Each worker starts a new transaction as soon as the previous one is complete.

Changes and enhancements in this fork of OLTP-Bench

In the future we will look for opportunities to provide our enhancements back to the original project, but for now, here is a quick summary of some of the changes:

  • Database creation and loading from scripts: OLTP-Bench includes options to create and load the database through the framework. For expediency we generated sample data sets for you and saved them on AWS S3 as this is a time consuming process. Most big data stores like Splice Machine have high throughput pathways for loading data including direct data load from flat files (.csv) as well as parallel loading from bzip2 compressed files or hdfs. If you are planning to run benchmarks based on the htap-1000 or htap-10k data sets we suggest that you first import from S3 to your local files system or hdfs and modify the load script to read from this location. Splice Machine includes a rich set of import system utilities and the import_data routine can utilize CSV files, bzip2 compressed CSV files or hdfs store directly. In the gen-htap-10k folder you will find example scripts that can generate CSV datasets which you can subsequently save to the file system or hdfs and then import into the database or mount as an external table.

  • Stored foreign keys between the H and C tables: the CH-benCHmark implementation in OLTP-Bench utilizes a computed foreign key for the nation key and supplier key from the C tables to the H tables. While this was done to keep the TPC-C implementation pure in OLTP-Bench, we don't feel it's representative of how foreign keys are stored in actual applications and won't take advantage of the performance enhancements and parallelism possible with stored foreign keys. We used the same formula for the keys used in the OLTP-Bench CH-benCHmark implementation and added the requisit fields to the tpcc tables. We did not update other database DDLs and dialect files for the built-in create or loader functions to support these changes at this time.

The Splice Machine team has implemented other enhancements to OLTP-Bench that we plan to include here in the future. Some of these include wait time for a closed system model of workload generation, and thread and connection pooling for workers to support greater benchmark scale.

Acknowledgements

Thanks much to the OLTP-Bench contributors for creating this framework and making it available to the community!