Skip to content

Commit

Permalink
V0.5.7 (#86)
Browse files Browse the repository at this point in the history
* Prepare next release

* TPC-H: OracleDB

* Ignore more package files

* Tools as scripts in setup

* More DBMS (OracleDB and OmniSci-CPU)

* Docs: Entry page setup

* Docs: Entry page setup

* Docs: Entry page setup

* More DBMS (MonetDB current version)

* More DBMS (MonetDB current version)

* Docs: Entry page setup

* TPC-H: Some DBMS for tests

* Tools as scripts in setup changed

* TPC-H: Citus joins

* Tools as scripts in setup changed

* Tool: Dashboard includes Jupyter notebook

* Config: Changed format

* Docs: Entry page setup - quick start

* TPC-H: Éxample script

* Docs: Entry page setup - reference

* Docs: Entry page setup - quick start data

* Images referring dockerhub repository

* Docs: Entry page setup - quick start

* Images referring dockerhub repository updated

* Images referring dockerhub repository updated

* Images referring dockerhub repository updated

* Images referring dockerhub repository updated

* Configuration: Fetch missing info about CPU cores
  • Loading branch information
perdelt authored Aug 23, 2021
1 parent efa7e03 commit 03342e7
Show file tree
Hide file tree
Showing 37 changed files with 1,884 additions and 238 deletions.
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -6,4 +6,5 @@ tmp/*
1*/
jars/*
build/*
bexhoma.egg-info/*
bexhoma.egg-info/*
dist/*
52 changes: 36 additions & 16 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# Benchmark Experiment Host Manager
This Python tools helps **managing benchmark experiments of Database Management Systems (DBMS) in a High-Performance-Computing (HPC) cluster environment**.
This Python tools helps **managing benchmark experiments of Database Management Systems (DBMS) in a Kubernetes-based High-Performance-Computing (HPC) cluster environment**.
It enables users to configure hardware / software setups for easily repeating tests over varying configurations.

It serves as the **orchestrator** [2] for distributed parallel benchmarking experiments in a Kubernetes Cloud.
Expand All @@ -8,35 +8,55 @@ It serves as the **orchestrator** [2] for distributed parallel benchmarking expe
<img src="https://raw.githubusercontent.com/Beuth-Erdelt/Benchmark-Experiment-Host-Manager/v0.5.6/docs/experiment-with-orchestrator.png" width="800">
</p>

The basic workflow is [1]: start a virtual machine, install monitoring software and a database management system, import data, run benchmarks (external tool) and shut down everything with a single command.
The basic workflow is [1,2]: start a virtual machine, install monitoring software and a database management system, import data, run benchmarks (external tool) and shut down everything with a single command.
A more advanced workflow is: Plan a sequence of such experiments, run plan as a batch and join results for comparison.

## Installation

1. Download this repository
1. Download the repository: https://github.com/Beuth-Erdelt/Benchmark-Experiment-Host-Manager
1. Install the package `pip install bexhoma`
1. Make sure you have a working `kubectl` installed
1. Adjust configuration [tbd in detail]
1. Rename `k8s-cluster.config` to `cluster.config`
1. Set name of context, namespace and name of cluster in that file
1. Install data [tbd in detail]
Example for TPC-H SF=1:
* Run `kubectl create -f k8s/job-data-tpch-1.yml`
* When job is done, clean up with
`kubectl delete job -l app=bexhoma -l component=data-source` and
`kubectl delete deployment -l app=bexhoma -l component=data-source`.
1. Install result folder
Run `kubectl create -f k8s/pvc-bexhoma-results.yml`

2. Run `pip install -r requirements.txt`

3. Adjust configuration [tbd]

4. Install data [tbd]

## Quickstart

The repository contains a tool for running TPC-H (reading) queries at MonetDB and PostgreSQL.

Run `python tpch.py run`
1. Run `tpch run`.
This is equivalent to `python tpch.py run`.
1. You can watch status using `bexperiments status` while running.
This is equivalent to `python cluster.py status`.
1. After benchmarking has finished, run `bexperiments dashboard` to connect to a dashboard. You can open dashboard in browser at `http://localhost:8050`.
This is equivalent to `python cluster.py dashboard`
Alternatively you can open a Jupyter notebook at `http://localhost:8888`.

## More Informations

For full power, use this tool as an orchestrator as in [2]. It also starts a monitoring container using [Prometheus](https://prometheus.io/) and a metrics collector container using [cAdvisor](https://github.com/google/cadvisor). It also uses the Python package [dbmsbenchmarker](https://github.com/Beuth-Erdelt/DBMS-Benchmarker) as query executor [2] and evaluator [1].

This module has been tested with Brytlyt, Citus, Clickhouse, DB2, Exasol, Kinetica, MariaDB, MariaDB Columnstore, MemSQL, Mariadb, MonetDB, MySQL, OmniSci, Oracle DB, PostgreSQL, SingleStore, SQL Server and SAP HANA.

## References

[1] [A Framework for Supporting Repetition and Evaluation in the Process of Cloud-Based DBMS Performance Benchmarking](https://doi.org/10.1007/978-3-030-84924-5_6)
```
Erdelt P.K. (2021)
A Framework for Supporting Repetition and Evaluation in the Process of Cloud-Based DBMS Performance Benchmarking.
In: Nambiar R., Poess M. (eds) Performance Evaluation and Benchmarking. TPCTC 2020.
Lecture Notes in Computer Science, vol 12752. Springer, Cham.
https://doi.org/10.1007/978-3-030-84924-5_6
```

> Erdelt P.K. (2021)
> A Framework for Supporting Repetition and Evaluation in the Process of Cloud-Based DBMS Performance Benchmarking.
> In: Nambiar R., Poess M. (eds) Performance Evaluation and Benchmarking. TPCTC 2020.
> Lecture Notes in Computer Science, vol 12752. Springer, Cham.
> https://doi.org/10.1007/978-3-030-84924-5_6

[2] [Orchestrating DBMS Benchmarking in the Cloud with Kubernetes](https://www.researchgate.net/publication/353236865_Orchestrating_DBMS_Benchmarking_in_the_Cloud_with_Kubernetes)

Expand Down
5 changes: 4 additions & 1 deletion bexhoma/configurations.py
Original file line number Diff line number Diff line change
Expand Up @@ -816,7 +816,10 @@ def getCores(self):
#cores = os.popen(fullcommand).read()
stdin, stdout, stderr = self.experiment.cluster.executeCTL(command=command, pod=self.pod_sut, container='dbms')
cores = stdout#os.popen(fullcommand).read()
return int(cores)
if len(cores)>0:
return int(cores)
else:
return 0
def getHostsystem(self):
print("getHostsystem")
cmd = {}
Expand Down
4 changes: 4 additions & 0 deletions bexhoma/scripts/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
"""
The clustermanager module
"""
__all__ = ["experimentsmanager","tpch"]
212 changes: 212 additions & 0 deletions bexhoma/scripts/experimentsmanager.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,212 @@
"""
This script contains some code snippets for testing the detached mode in Kubernetes
Copyright (C) 2021 Patrick Erdelt
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU Affero General Public License as
published by the Free Software Foundation, either version 3 of the
License, or (at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU Affero General Public License for more details.
You should have received a copy of the GNU Affero General Public License
along with this program. If not, see <https://www.gnu.org/licenses/>.
"""
from bexhoma import *
from dbmsbenchmarker import *
import logging
import urllib3
import logging
import argparse
import time
import pandas as pd
from tabulate import tabulate
from datetime import datetime
urllib3.disable_warnings()
logging.basicConfig(level=logging.ERROR)


def manage():
description = """This tool helps managing running Bexhoma experiments in a Kubernetes cluster.
"""
print(description)
# argparse
parser = argparse.ArgumentParser(description=description)
parser.add_argument('mode', help='manage experiments: stop, get status, connect to dbms or connect to dashboard', choices=['stop','status','dashboard', 'master'])
parser.add_argument('-e', '--experiment', help='code of experiment', default=None)
parser.add_argument('-c', '--connection', help='name of DBMS', default=None)
parser.add_argument('-v', '--verbose', help='gives more details about Kubernetes objects', action='store_true')
parser.add_argument('-cx', '--context', help='context of Kubernetes (for a multi cluster environment), default is current context', default=None)
clusterconfig = 'cluster.config'
args = parser.parse_args()
if args.mode == 'stop':
cluster = clusters.kubernetes(clusterconfig, context=args.context)
if args.experiment is None:
experiment = experiments.default(cluster=cluster, code=cluster.code)
cluster.stop_sut()
cluster.stop_monitoring()
cluster.stop_benchmarker()
else:
experiment = experiments.default(cluster=cluster, code=args.experiment)
experiment.stop_sut()
cluster.stop_monitoring()
cluster.stop_benchmarker()
elif args.mode == 'dashboard':
cluster = clusters.kubernetes(clusterconfig, context=args.context)
cluster.connect_dashboard()
elif args.mode == 'master':
cluster = clusters.kubernetes(clusterconfig, context=args.context)
cluster.connect_master(experiment=args.experiment, configuration=args.connection)
elif args.mode == 'status':
cluster = clusters.kubernetes(clusterconfig, context=args.context)
app = cluster.appname
# get all volumes
pvcs = cluster.getPVCs(app=app, component='storage', experiment='', configuration='')
#print("PVCs", pvcs)
volumes = {}
for pvc in pvcs:
volumes[pvc] = {}
pvcs_labels = cluster.getPVCsLabels(app=app, component='storage', experiment='', configuration='', pvc=pvc)
#print("PVCsLabels", pvcs_labels)
pvc_labels = pvcs_labels[0]
volumes[pvc]['configuration'] = pvc_labels['configuration']
volumes[pvc]['experiment'] = pvc_labels['experiment']
volumes[pvc]['loaded [s]'] = pvc_labels['loaded']
if 'timeLoading' in pvc_labels:
volumes[pvc]['timeLoading [s]'] = pvc_labels['timeLoading']
else:
volumes[pvc]['timeLoading [s]'] = ""
volumes[pvc]['dbms'] = pvc_labels['dbms']
#volumes[pvc]['labels'] = pvcs_label
pvcs_specs = cluster.getPVCsSpecs(app=app, component='storage', experiment='', configuration='', pvc=pvc)
pvc_specs = pvcs_specs[0]
#print("PVCsSpecs", pvcs_specs)
#volumes[pvc]['specs'] = pvc_specs
volumes[pvc]['storage_class_name'] = pvc_specs.storage_class_name
volumes[pvc]['storage'] = pvc_specs.resources.requests['storage']
pvcs_status = cluster.getPVCsStatus(app=app, component='storage', experiment='', configuration='', pvc=pvc)
#print("PVCsStatus", pvcs_status)
volumes[pvc]['status'] = pvcs_status[0].phase
#print(volumes)
if len(volumes) > 0:
df = pd.DataFrame(volumes).T
#print(df)
h = ['Volumes'] + list(df.columns)
print(tabulate(df, headers=h, tablefmt="grid", floatfmt=".2f", showindex="always"))
# get all pods
pod_labels = cluster.getPodsLabels(app=app)
#print("Pod Labels", pod_labels)
experiment_set = set()
for pod, labels in pod_labels.items():
if 'experiment' in labels:
experiment_set.add(labels['experiment'])
#print(experiment_set)
for experiment in experiment_set:
if args.verbose:
print(experiment)
apps = {}
pod_labels = cluster.getPodsLabels(app=app, experiment=experiment)
configurations = set()
for pod, labels in pod_labels.items():
if 'configuration' in labels:
configurations.add(labels['configuration'])
for configuration in configurations:
logging.debug(configuration)
apps[configuration] = {}
component = 'sut'
apps[configuration][component] = ''
apps[configuration]['loaded [s]'] = ''
if args.verbose:
deployments = cluster.getDeployments(app=app, component=component, experiment=experiment, configuration=configuration)
print("Deployments", deployments)
services = cluster.getServices(app=app, component=component, experiment=experiment, configuration=configuration)
print("SUT Services", services)
pods = cluster.getPods(app=app, component=component, experiment=experiment, configuration=configuration)
if args.verbose:
print("SUT Pods", pods)
for pod in pods:
status = cluster.getPodStatus(pod)
#print(status)
if pod in pod_labels and 'experimentRun' in pod_labels[pod]:
experimentRun = '{}. '.format(pod_labels[pod]['experimentRun'])
else:
experimentRun = ''
apps[configuration][component] = "{pod} ({experimentRun}{status})".format(pod='', experimentRun=experimentRun, status=status)
if pod in pod_labels and 'loaded' in pod_labels[pod]:
if pod_labels[pod]['loaded'] == 'True':
#apps[configuration]['loaded'] += "True"
apps[configuration]['loaded [s]'] = pod_labels[pod]['timeLoading']#+' [s]'
elif 'timeLoadingStart' in pod_labels[pod]:
#apps[configuration]['loaded'] = 'Started at '+pod_labels[pod]['timeLoadingStart']
dt_object = datetime.fromtimestamp(int(pod_labels[pod]['timeLoadingStart']))
t = dt_object.strftime('%Y-%m-%d %H:%M:%S')
apps[configuration]['loaded [s]'] = 'Started at '+t
#if 'timeLoadingStart' in pod_labels[pod]:
# apps[configuration]['loaded'] += ' '+pod_labels[pod]['timeLoadingStart']
#if 'timeLoadingEnd' in pod_labels[pod]:
# apps[configuration]['loaded'] += '-'+pod_labels[pod]['timeLoadingEnd']
#if 'timeLoading' in pod_labels[pod]:
# apps[configuration]['loaded'] += '='+pod_labels[pod]['timeLoading']+'s'
############
component = 'worker'
apps[configuration][component] = ''
if args.verbose:
stateful_sets = cluster.getStatefulSets(app=app, component=component, experiment=experiment, configuration=configuration)
print("Stateful Sets", stateful_sets)
services = cluster.getServices(app=app, component=component, experiment=experiment, configuration=configuration)
print("Worker Services", services)
pods = cluster.getPods(app=app, component=component, experiment=experiment, configuration=configuration)
if args.verbose:
print("Worker Pods", pods)
for pod in pods:
status = cluster.getPodStatus(pod)
#print(status)
apps[configuration][component] += "{pod} ({status})".format(pod='', status=status)
############
component = 'monitoring'
apps[configuration][component] = ''
if args.verbose:
stateful_sets = cluster.getStatefulSets(app=app, component=component, experiment=experiment, configuration=configuration)
print("Stateful Sets", stateful_sets)
services = cluster.getServices(app=app, component=component, experiment=experiment, configuration=configuration)
print("Monitoring Services", services)
pods = cluster.getPods(app=app, component=component, experiment=experiment, configuration=configuration)
if args.verbose:
print("Monitoring Pods", pods)
for pod in pods:
status = cluster.getPodStatus(pod)
#print(status)
apps[configuration][component] += "{pod} ({status})".format(pod='', status=status)
############
component = 'benchmarker'
apps[configuration][component] = ''
if args.verbose:
jobs = cluster.getJobs(app=app, component=component, experiment=experiment, configuration=configuration)
# status per job
for job in jobs:
success = cluster.getJobStatus(job)
print(job, success)
# all pods to these jobs
pods = cluster.getJobPods(app=app, component=component, experiment=experiment, configuration=configuration)
if args.verbose:
print("Benchmarker Pods", pods)
for pod in pods:
status = cluster.getPodStatus(pod)
#print(status)
if pod in pod_labels and 'client' in pod_labels[pod]:
experimentRun = '{}. '.format(pod_labels[pod]['client'])
else:
experimentRun = ''
apps[configuration][component] += "{pod} ({experimentRun}{status})".format(pod='', experimentRun=experimentRun, status=status)
#print(apps)
df = pd.DataFrame(apps)
df = df.T
df.sort_index(inplace=True)
df.index.name = experiment
#print(df)
h = [df.index.name] + list(df.columns)
print(tabulate(df, headers=h, tablefmt="grid", floatfmt=".2f", showindex="always"))
Loading

0 comments on commit 03342e7

Please sign in to comment.