Skip to content

Commit

Permalink
doc: docs for cardinality benchmarking (#183)
Browse files Browse the repository at this point in the history
**Summary**: Updated `README.md`, `SUMMARY.md`, and a new file
`cost_model_benchmarking.md` to document information about cardinality
benchmarking.

**Details**:
* `README.md` contains a quickstart command.
* `cost_model_benchmarking.md` contains conceptual info and notes about
operating and extending the system.
* I name it "benchmarking" instead of "testing" in the docs to
distinguish it from functional testing. I renamed `perftest` and
`cardtest` to `perfbench` and `cardbench` to match how we're calling it
"benchmarking" instead of "testing".
  • Loading branch information
wangpatrick57 authored May 28, 2024
1 parent 5255d8d commit f8f714c
Show file tree
Hide file tree
Showing 19 changed files with 101 additions and 54 deletions.
2 changes: 1 addition & 1 deletion Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,6 @@ members = [
"optd-sqlplannertest",
"optd-adaptive-demo",
"optd-gungnir",
"optd-perftest",
"optd-perfbench",
]
resolver = "2"
11 changes: 9 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ optd is a research project and is still evolving. It should not be used in produ

## Get Started

There are two demos you can run with optd. More information available in the [docs](docs/).
There are three demos you can run with optd. More information available in the [docs](docs/).

```
cargo run --release --bin optd-adaptive-tpch-q8
Expand All @@ -25,6 +25,13 @@ You can also run the Datafusion cli to interactively experiment with optd.
cargo run --bin datafusion-optd-cli
```

You can also test the performance of the cost model with the "cardinality benchmarking" feature (more info in the [docs](docs/)).
Before running this, you will need to manually run Postgres on your machine.
Note that there is a CI script which tests this command (TPC-H with scale factor 0.01) before every merge into main, so it should be very reliable.
```
cargo run --release --bin optd-perfbench cardbench tpch --scale-factor 0.01
```

## Documentation

The documentation is available in the mdbook format in the [docs](docs) directory.
Expand All @@ -38,7 +45,7 @@ The documentation is available in the mdbook format in the [docs](docs) director
* `optd-adaptive-demo`: Demo of adaptive optimization capabilities of optd. More information available in the [docs](docs/).
* `optd-sqlplannertest`: Planner test of optd based on [risinglightdb/sqlplannertest-rs](https://github.com/risinglightdb/sqlplannertest-rs).
* `optd-gungnir`: Scalable, memory-efficient, and parallelizable statistical methods for cardinality estimation (e.g. TDigest, HyperLogLog).
* `optd-perftest`: A CLI program for testing performance (cardinality, throughput, etc.) against other databases.
* `optd-perfbench`: A CLI program for benchmarking performance (cardinality, throughput, etc.) against other databases.


# Related Works
Expand Down
2 changes: 1 addition & 1 deletion dev_scripts/which_queries_work.sh
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ fi
successful_ids=()
IFS=','
for id in $all_ids; do
cargo run --release --bin optd-perftest cardtest $benchmark_name --query-ids $id &>/dev/null
cargo run --release --bin optd-perfbench cardbench $benchmark_name --query-ids $id &>/dev/null

if [ $? -eq 0 ]; then
echo >&2 $id succeeded
Expand Down
5 changes: 4 additions & 1 deletion docs/src/SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,10 @@
- [Three Join Demo](./demo_three_join.md)
- [TPC-H Q8 Demo](./demo_tpch_q8.md)

# Testing
# Performance Benchmarking
- [Cost Model Cardinality Benchmarking](./cost_model_benchmarking.md)

# Functional Testing

- [SQLPlannerTest](./sqlplannertest.md)
- [Datafusion CLI](./datafusion_cli.md)
Expand Down
37 changes: 37 additions & 0 deletions docs/src/cost_model_benchmarking.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
# Cost Model Cardinality Benchmarking

## Overview
You can benchmark the cardinality estimates of optd's cost model against other DBMSs using the optd-perfbench module.

All aspects of benchmarking (except for setting up comparison DBMSs) are handled automatically. This includes loading workload data, building statistics, gathering the true cardinality of workload queries, running explains on workload queries, and aggregating cardinality estimation results.

We elected not to automate the installation and setup of the DBMS in order to accomodate the needs of all users. For instance, some users prefer installing Postgres on Homebrew, others choose to install the Mac application, while others wish to create a Postgres Docker container. However, it could be feasible in the future to standardize on Docker and automatically start a container. The only difficult part in that scenario is tuning Postgres/other DBMSs to the machine being run on, as this is currently done manually using PGTune.

Additionally, our system provides **fine-grained, robust caching** for every single step of the process. After the first run of a workload, all subsequent runs will *only require running explains*, which takes in a matter of seconds for all workloads. We use "acknowledgement files" to ensure that the caching is robust in that we never cache incomplete results.

## Basic Operation
First, you need to manually install, configure, and start the DBMS(s) being compared against. Currently, only Postgres is supported. To see an example of how Postgres is installed, configured, and started on a Mac, check the `patrick/` folder in the [gungnir-experiments](https://github.com/wangpatrick57/gungnir-experiments) repository.

Once the DBMS(s) being compared against are set up, run this to quickly get started. It should take a few minutes on the first run and a few seconds on subsequent runs. This specific command that tests TPC-H with scale factor 0.01 is **run in a CI script** before every merge to main, so it should be very reliable.
```
cargo run --release --bin optd-perfbench cardbench tpch --scale-factor 0.01
```

After this, you can try out different workloads and scale factors based on the CLI options.

Roughly speaking, there are two main ways the benchmarking system is used: (a) to compare the cardinality estimates of optd against another system *in aggregate* or (b) to investigate the cardinality estimates of a small subset of queries. The command above is for use case (a). The system automatically outputs a variety of *aggregate* information about the q-error including median, p95, max, and more. Additionally, the system outputs *comparative* information which shows the # of queries in which a given DBMS performs the best or is tied for the best.

For use case (b), you will want to set the `RUST_LOG` environment variable to `info` and use the `--query-ids` parameter. Setting `RUST_LOG` to `info` will show the results of the explain commands on all DBMSs and `--query-ids` will let you only run specific queries to avoid cluttering the output.
```
RUST_LOG=info cargo run --release --bin optd-perfbench cardbench tpch --scale-factor 0.01 --query-ids 2
```

## Supporting More Queries
Currently, we are missing support for a few queries in TPC-H, JOB, and JOB-light. An *approximate* list of supported queries can be found in the `[workload].rs` files (e.g. `tpch.rs` and `job.rs`). If `--query-ids` is ommitted from the command, we use the list of supported queries as defined in the `[workload].rs` file by default. Some of these queries are not supported by DataFusion, some by optd, and some because we run into an OOM error when trying to execute them on Postgres. Because of the last point, the set of supported queries may be different on different machines. The list of queries in `[workload].rs` (at least the one in `tpch.rs`) is tested to be working on the CI machine.

The *definitive* list of supported queries on your machine can be found by running `dev_scripts/which_queries_work.sh`, which simply runs the benchmarking system for each query individually. While this script does take a long time to complete when first run, it has the nice side effect of warming up all your caches so that subsequent runs are fast. The script outputs a string to replace the `WORKING_*QUERY_IDS` variable in `[workload].rs` as well as another string to use as the `--query-ids` argument. If you are use `which_queries_work.sh` to figure out the queries that work on your machine, you probably want to use `--query-ids` instead of setting `WORKING_*QUERY_IDS`.

If you add support for more queries, you will want to rerun `dev_scripts/which_queries_work.sh`. Since you are permanently adding support for more queries, you will want to update `WORKING_*QUERY_IDS`.

## Adding More DBMSs
Currently, only Postgres is supported. Additional DBMSs can be easily added using the `CardbenchRunnerDBMSHelper` trait and optionally the `TruecardGetter` trait. `CardbenchRunnerDBMSHelper` must be implemented by all DBMSs that are supported because it has functions for gathering estimated cardinalities from DBMSs. `TruecardGetter` only needs to be implemented by at least one DBMS. The true cardinality should be the same across all DBMSs, so we only execute the queries for real on a single DBMS to drastically reduce benchmarking runtime. `TruecardGetter` is currently implemented for Postgres, so it is unnecessary to implement this for any other DBMS unless one wishes to improve the runtime of benchmarking (e.g. by gathering true cardinalities using an OLAP DBMS for OLAP workloads). Do keep in mind that true cardinalities are cached after the first run of a workload and can be shared across users (in the future, perhaps we'll even put the cached true cardinalities in the GitHub repository itself), so this optimization is not terribly important.
2 changes: 1 addition & 1 deletion optd-perftest/Cargo.toml → optd-perfbench/Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
[package]
name = "optd-perftest"
name = "optd-perfbench"
version = "0.1.0"
edition = "2021"

Expand Down
File renamed without changes.
30 changes: 15 additions & 15 deletions optd-perftest/src/cardtest.rs → optd-perfbench/src/cardbench.rs
Original file line number Diff line number Diff line change
Expand Up @@ -9,12 +9,12 @@ use anyhow::{self};
use async_trait::async_trait;

/// This struct performs cardinality testing across one or more DBMSs.
/// Another design would be for the CardtestRunnerDBMSHelper trait to expose a function
/// Another design would be for the CardbenchRunnerDBMSHelper trait to expose a function
/// to evaluate the Q-error. However, I chose not to do this design for reasons
/// described in the comments of the CardtestRunnerDBMSHelper trait. This is why
/// you would use CardtestRunner even for computing the Q-error of a single DBMS.
pub struct CardtestRunner {
pub dbmss: Vec<Box<dyn CardtestRunnerDBMSHelper>>,
/// described in the comments of the CardbenchRunnerDBMSHelper trait. This is why
/// you would use CardbenchRunner even for computing the Q-error of a single DBMS.
pub struct CardbenchRunner {
pub dbmss: Vec<Box<dyn CardbenchRunnerDBMSHelper>>,
truecard_getter: Box<dyn TruecardGetter>,
}

Expand All @@ -25,12 +25,12 @@ pub struct Cardinfo {
pub truecard: usize,
}

impl CardtestRunner {
impl CardbenchRunner {
pub async fn new(
dbmss: Vec<Box<dyn CardtestRunnerDBMSHelper>>,
dbmss: Vec<Box<dyn CardbenchRunnerDBMSHelper>>,
truecard_getter: Box<dyn TruecardGetter>,
) -> anyhow::Result<Self> {
Ok(CardtestRunner {
Ok(CardbenchRunner {
dbmss,
truecard_getter,
})
Expand All @@ -57,7 +57,7 @@ impl CardtestRunner {
.into_iter()
.zip(truecards.iter())
.map(|(estcard, &truecard)| Cardinfo {
qerror: CardtestRunner::calc_qerror(estcard, truecard),
qerror: CardbenchRunner::calc_qerror(estcard, truecard),
estcard,
truecard,
})
Expand Down Expand Up @@ -90,8 +90,8 @@ impl CardtestRunner {
/// When more performance tests are implemented, you would probably want to extract
/// get_name() into a generic "DBMS" trait.
#[async_trait]
pub trait CardtestRunnerDBMSHelper {
// get_name() has &self so that we're able to do Box<dyn CardtestRunnerDBMSHelper>
pub trait CardbenchRunnerDBMSHelper {
// get_name() has &self so that we're able to do Box<dyn CardbenchRunnerDBMSHelper>
fn get_name(&self) -> &str;

// The order of queries in the returned vector has to be the same between all databases,
Expand All @@ -103,7 +103,7 @@ pub trait CardtestRunnerDBMSHelper {
}

/// The core logic of cardinality testing.
pub async fn cardtest_core<P: AsRef<Path>>(
pub async fn cardbench_core<P: AsRef<Path>>(
workspace_dpath: P,
rebuild_cached_optd_stats: bool,
pguser: &str,
Expand All @@ -115,10 +115,10 @@ pub async fn cardtest_core<P: AsRef<Path>>(
let truecard_getter = pg_dbms.clone();
let df_dbms =
Box::new(DatafusionDBMS::new(&workspace_dpath, rebuild_cached_optd_stats, adaptive).await?);
let dbmss: Vec<Box<dyn CardtestRunnerDBMSHelper>> = vec![pg_dbms, df_dbms];
let dbmss: Vec<Box<dyn CardbenchRunnerDBMSHelper>> = vec![pg_dbms, df_dbms];

let mut cardtest_runner = CardtestRunner::new(dbmss, truecard_getter).await?;
let cardinfos_alldbs = cardtest_runner
let mut cardbench_runner = CardbenchRunner::new(dbmss, truecard_getter).await?;
let cardinfos_alldbs = cardbench_runner
.eval_benchmark_cardinfos_alldbs(&benchmark)
.await?;
Ok(cardinfos_alldbs)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ use std::{

use crate::{
benchmark::Benchmark,
cardtest::CardtestRunnerDBMSHelper,
cardbench::CardbenchRunnerDBMSHelper,
job::{JobKit, JobKitConfig},
tpch::{TpchKit, TpchKitConfig},
};
Expand Down Expand Up @@ -47,7 +47,7 @@ const WITH_LOGICAL_FOR_TPCH: bool = true;
const WITH_LOGICAL_FOR_JOB: bool = true;

#[async_trait]
impl CardtestRunnerDBMSHelper for DatafusionDBMS {
impl CardbenchRunnerDBMSHelper for DatafusionDBMS {
fn get_name(&self) -> &str {
"DataFusion"
}
Expand Down
2 changes: 1 addition & 1 deletion optd-perftest/src/job.rs → optd-perfbench/src/job.rs
Original file line number Diff line number Diff line change
Expand Up @@ -192,7 +192,7 @@ impl JobKit {
}

/// Get an iterator through all generated .sql files _in order_ of a given config
/// It's important to iterate _in order_ due to the interface of CardtestRunnerDBMSHelper
/// It's important to iterate _in order_ due to the interface of CardbenchRunnerDBMSHelper
pub fn get_sql_fpath_ordered_iter(
&self,
job_kit_config: &JobKitConfig,
Expand Down
2 changes: 1 addition & 1 deletion optd-perftest/src/lib.rs → optd-perfbench/src/lib.rs
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
pub mod benchmark;
pub mod cardtest;
pub mod cardbench;
mod datafusion_dbms;
pub mod job;
mod postgres_dbms;
Expand Down
30 changes: 15 additions & 15 deletions optd-perftest/src/main.rs → optd-perfbench/src/main.rs
Original file line number Diff line number Diff line change
@@ -1,18 +1,18 @@
use clap::{Parser, Subcommand, ValueEnum};
use optd_perftest::benchmark::Benchmark;
use optd_perftest::cardtest::Cardinfo;
use optd_perftest::job::JobKitConfig;
use optd_perftest::shell;
use optd_perftest::tpch::{TpchKitConfig, TPCH_KIT_POSTGRES};
use optd_perftest::{cardtest, job, tpch};
use optd_perfbench::benchmark::Benchmark;
use optd_perfbench::cardbench::Cardinfo;
use optd_perfbench::job::JobKitConfig;
use optd_perfbench::shell;
use optd_perfbench::tpch::{TpchKitConfig, TPCH_KIT_POSTGRES};
use optd_perfbench::{cardbench, job, tpch};
use prettytable::{format, Table};
use std::fs;
use std::path::Path;

#[derive(Parser)]
struct Cli {
#[clap(long)]
#[clap(default_value = "optd_perftest_workspace")]
#[clap(default_value = "optd_perfbench_workspace")]
#[clap(
help = "The directory where artifacts required for performance testing (such as pgdata or TPC-H queries) are generated. See comment of parse_pathstr() to see what paths are allowed (TLDR: absolute and relative both ok)."
)]
Expand All @@ -31,7 +31,7 @@ enum BenchmarkName {

#[derive(Subcommand)]
enum Commands {
Cardtest {
Cardbench {
#[clap(value_enum)]
#[clap(default_value = "tpch")]
benchmark_name: BenchmarkName,
Expand All @@ -46,7 +46,7 @@ enum Commands {

#[clap(long)]
#[clap(value_delimiter = ',', num_args = 1..)]
// This is the current list of all queries that work in perftest
// This is the current list of all queries that work in perfbench
#[clap(default_value = None)]
#[clap(help = "The queries to get the Q-error of")]
query_ids: Vec<String>,
Expand Down Expand Up @@ -87,10 +87,10 @@ fn percentile(sorted_v: &[f64], percentile: f64) -> f64 {
sorted_v[idx]
}

/// cardtest::cardtest_core() expects sanitized inputs and returns outputs in their simplest form.
/// This function wraps around cardtest::cardtest_core() to sanitize the inputs and print the outputs nicely.
/// cardbench::cardbench_core() expects sanitized inputs and returns outputs in their simplest form.
/// This function wraps around cardbench::cardbench_core() to sanitize the inputs and print the outputs nicely.
#[allow(clippy::too_many_arguments)]
async fn cardtest<P: AsRef<Path>>(
async fn cardbench<P: AsRef<Path>>(
workspace_dpath: P,
benchmark_name: BenchmarkName,
scale_factor: f64,
Expand Down Expand Up @@ -131,7 +131,7 @@ async fn cardtest<P: AsRef<Path>>(
}),
};

let cardinfo_alldbs = cardtest::cardtest_core(
let cardinfo_alldbs = cardbench::cardbench_core(
&workspace_dpath,
rebuild_cached_optd_stats,
&pguser,
Expand Down Expand Up @@ -268,7 +268,7 @@ async fn main() -> anyhow::Result<()> {
}

match cli.command {
Commands::Cardtest {
Commands::Cardbench {
benchmark_name,
scale_factor,
seed,
Expand All @@ -278,7 +278,7 @@ async fn main() -> anyhow::Result<()> {
pgpassword,
adaptive,
} => {
cardtest(
cardbench(
workspace_dpath,
benchmark_name,
scale_factor,
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
use crate::{
benchmark::Benchmark,
cardtest::CardtestRunnerDBMSHelper,
cardbench::CardbenchRunnerDBMSHelper,
job::{JobKit, JobKitConfig},
tpch::{TpchKit, TpchKitConfig},
truecard::{TruecardCache, TruecardGetter},
Expand Down Expand Up @@ -404,7 +404,7 @@ impl PostgresDBMS {
}

#[async_trait]
impl CardtestRunnerDBMSHelper for PostgresDBMS {
impl CardbenchRunnerDBMSHelper for PostgresDBMS {
fn get_name(&self) -> &str {
POSTGRES_DBMS_NAME
}
Expand Down
File renamed without changes.
2 changes: 1 addition & 1 deletion optd-perftest/src/tpch.rs → optd-perfbench/src/tpch.rs
Original file line number Diff line number Diff line change
Expand Up @@ -264,7 +264,7 @@ impl TpchKit {
}

/// Get an iterator through all generated .sql files _in order_ of a given config
/// It's important to iterate _in order_ due to the interface of CardtestRunnerDBMSHelper
/// It's important to iterate _in order_ due to the interface of CardbenchRunnerDBMSHelper
pub fn get_sql_fpath_ordered_iter(
&self,
tpch_kit_config: &TpchKitConfig,
Expand Down
File renamed without changes.
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
#[cfg(test)]
mod tests {
use assert_cmd::prelude::CommandCargoExt;
use optd_perftest::shell;
use optd_perfbench::shell;
use std::{
fs,
process::{Command, Stdio},
};

const WORKSPACE: &str = "optd_perftest_integration_workspace";
const WORKSPACE: &str = "optd_perfbench_integration_workspace";

/// Make sure Postgres is running before this test is run
/// The reason I don't start Postgres automatically is because everyone has a different
Expand All @@ -17,18 +17,18 @@ mod tests {
/// While it'd be nice to test JOB, JOB only has one scale factor and that scale factor
/// takes 30 minutes to build stats as of 4/15/24, so we don't test it right now.
#[test_case::test_case("tpch")]
fn cli_run_cardtest_twice(benchmark_name: &str) {
fn cli_run_cardbench_twice(benchmark_name: &str) {
// perform cleanup (clear workspace)
let workspace_dpath = shell::parse_pathstr(WORKSPACE).unwrap();
shell::make_into_empty_dir(&workspace_dpath).unwrap();

// run command twice
for i in 1..=2 {
let mut cmd = create_cardtest_run_cmd(benchmark_name, false);
let mut cmd = create_cardbench_run_cmd(benchmark_name, false);
let output = cmd.output().unwrap();
assert!(
output.status.success(),
"cardtest run #{} failed with ```{}```",
"cardbench run #{} failed with ```{}```",
i,
String::from_utf8_lossy(&output.stderr)
);
Expand All @@ -38,13 +38,13 @@ mod tests {
fs::remove_dir_all(&workspace_dpath).unwrap();
}

fn create_cardtest_run_cmd(benchmark_name: &str, debug_print: bool) -> Command {
let mut cmd = Command::cargo_bin("optd-perftest").unwrap();
fn create_cardbench_run_cmd(benchmark_name: &str, debug_print: bool) -> Command {
let mut cmd = Command::cargo_bin("optd-perfbench").unwrap();
cmd.current_dir("..");
cmd.args([
"--workspace",
WORKSPACE,
"cardtest",
"cardbench",
benchmark_name,
// make sure scale factor is low so the test runs fast
"--scale-factor",
Expand Down
Loading

0 comments on commit f8f714c

Please sign in to comment.