V0.5.7 (#86)

* Prepare next release * TPC-H: OracleDB * Ignore more package files * Tools as scripts in setup * More DBMS (OracleDB and OmniSci-CPU) * Docs: Entry page setup * Docs: Entry page setup * Docs: Entry page setup * More DBMS (MonetDB current version) * More DBMS (MonetDB current version) * Docs: Entry page setup * TPC-H: Some DBMS for tests * Tools as scripts in setup changed * TPC-H: Citus joins * Tools as scripts in setup changed * Tool: Dashboard includes Jupyter notebook * Config: Changed format * Docs: Entry page setup - quick start * TPC-H: Éxample script * Docs: Entry page setup - reference * Docs: Entry page setup - quick start data * Images referring dockerhub repository * Docs: Entry page setup - quick start * Images referring dockerhub repository updated * Images referring dockerhub repository updated * Images referring dockerhub repository updated * Images referring dockerhub repository updated * Configuration: Fetch missing info about CPU cores
Beuth-Erdelt · Aug 23, 2021 · 03342e7 · 03342e7
1 parent efa7e03
commit 03342e7
Show file tree

Hide file tree

Showing 37 changed files with 1,884 additions and 238 deletions.
diff --git a/.gitignore b/.gitignore
@@ -6,4 +6,5 @@ tmp/*
 1*/
 jars/*
 build/*
-bexhoma.egg-info/*
+bexhoma.egg-info/*
+dist/*
diff --git a/README.md b/README.md
@@ -1,5 +1,5 @@
 # Benchmark Experiment Host Manager
-This Python tools helps **managing benchmark experiments of Database Management Systems (DBMS) in a High-Performance-Computing (HPC) cluster environment**.
+This Python tools helps **managing benchmark experiments of Database Management Systems (DBMS) in a Kubernetes-based High-Performance-Computing (HPC) cluster environment**.
 It enables users to configure hardware / software setups for easily repeating tests over varying configurations.
 
 It serves as the **orchestrator** [2] for distributed parallel benchmarking experiments in a Kubernetes Cloud.
@@ -8,35 +8,55 @@ It serves as the **orchestrator** [2] for distributed parallel benchmarking expe
     <img src="https://raw.githubusercontent.com/Beuth-Erdelt/Benchmark-Experiment-Host-Manager/v0.5.6/docs/experiment-with-orchestrator.png" width="800">
 </p>
 
-The basic workflow is [1]: start a virtual machine, install monitoring software and a database management system, import data, run benchmarks (external tool) and shut down everything with a single command.
+The basic workflow is [1,2]: start a virtual machine, install monitoring software and a database management system, import data, run benchmarks (external tool) and shut down everything with a single command.
 A more advanced workflow is: Plan a sequence of such experiments, run plan as a batch and join results for comparison.
 
 ## Installation
 
-1. Download this repository
+1. Download the repository: https://github.com/Beuth-Erdelt/Benchmark-Experiment-Host-Manager
+1. Install the package `pip install bexhoma`
+1. Make sure you have a working `kubectl` installed
+1. Adjust configuration [tbd in detail]
+    1. Rename `k8s-cluster.config` to `cluster.config`
+    1. Set name of context, namespace and name of cluster in that file
+1. Install data [tbd in detail]  
+  Example for TPC-H SF=1:  
+    * Run `kubectl create -f k8s/job-data-tpch-1.yml`  
+    * When job is done, clean up with  
+    `kubectl delete job -l app=bexhoma -l component=data-source` and  
+    `kubectl delete deployment -l app=bexhoma -l component=data-source`.
+1. Install result folder  
+  Run `kubectl create -f k8s/pvc-bexhoma-results.yml`
 
-2. Run `pip install -r requirements.txt`
-
-3. Adjust configuration [tbd]
-
-4. Install data [tbd]
 
 ## Quickstart
 
 The repository contains a tool for running TPC-H (reading) queries at MonetDB and PostgreSQL.
 
-Run `python tpch.py run`
+1. Run `tpch run`.  
+  This is equivalent to `python tpch.py run`.
+1. You can watch status using `bexperiments status` while running.  
+  This is equivalent to `python cluster.py status`.
+1. After benchmarking has finished, run `bexperiments dashboard` to connect to a dashboard. You can open dashboard in browser at `http://localhost:8050`.  
+  This is equivalent to `python cluster.py dashboard`  
+  Alternatively you can open a Jupyter notebook at `http://localhost:8888`.
+
+## More Informations
+
+For full power, use this tool as an orchestrator as in [2]. It also starts a monitoring container using [Prometheus](https://prometheus.io/) and a metrics collector container using [cAdvisor](https://github.com/google/cadvisor). It also uses the Python package [dbmsbenchmarker](https://github.com/Beuth-Erdelt/DBMS-Benchmarker) as query executor [2] and evaluator [1].
+
+This module has been tested with Brytlyt, Citus, Clickhouse, DB2, Exasol, Kinetica, MariaDB, MariaDB Columnstore, MemSQL, Mariadb, MonetDB, MySQL, OmniSci, Oracle DB, PostgreSQL, SingleStore, SQL Server and SAP HANA.
 
 ## References
 
 [1] [A Framework for Supporting Repetition and Evaluation in the Process of Cloud-Based DBMS Performance Benchmarking](https://doi.org/10.1007/978-3-030-84924-5_6)
-```
-Erdelt P.K. (2021)
-A Framework for Supporting Repetition and Evaluation in the Process of Cloud-Based DBMS Performance Benchmarking.
-In: Nambiar R., Poess M. (eds) Performance Evaluation and Benchmarking. TPCTC 2020.
-Lecture Notes in Computer Science, vol 12752. Springer, Cham.
-https://doi.org/10.1007/978-3-030-84924-5_6
-```
+
+> Erdelt P.K. (2021)
+> A Framework for Supporting Repetition and Evaluation in the Process of Cloud-Based DBMS Performance Benchmarking.
+> In: Nambiar R., Poess M. (eds) Performance Evaluation and Benchmarking. TPCTC 2020.
+> Lecture Notes in Computer Science, vol 12752. Springer, Cham.
+> https://doi.org/10.1007/978-3-030-84924-5_6
+
 
 [2] [Orchestrating DBMS Benchmarking in the Cloud with Kubernetes](https://www.researchgate.net/publication/353236865_Orchestrating_DBMS_Benchmarking_in_the_Cloud_with_Kubernetes)
 

diff --git a/bexhoma/configurations.py b/bexhoma/configurations.py
@@ -816,7 +816,10 @@ def getCores(self):
         #cores = os.popen(fullcommand).read()
         stdin, stdout, stderr = self.experiment.cluster.executeCTL(command=command, pod=self.pod_sut, container='dbms')
         cores = stdout#os.popen(fullcommand).read()
-        return int(cores)
+        if len(cores)>0:
+            return int(cores)
+        else:
+            return 0
     def getHostsystem(self):
         print("getHostsystem")
         cmd = {}

diff --git a/bexhoma/scripts/__init__.py b/bexhoma/scripts/__init__.py
@@ -0,0 +1,4 @@
+"""
+The clustermanager module
+"""
+__all__ = ["experimentsmanager","tpch"]
diff --git a/bexhoma/scripts/experimentsmanager.py b/bexhoma/scripts/experimentsmanager.py
@@ -0,0 +1,212 @@
+"""
+    This script contains some code snippets for testing the detached mode in Kubernetes
+
+    Copyright (C) 2021  Patrick Erdelt
+
+    This program is free software: you can redistribute it and/or modify
+    it under the terms of the GNU Affero General Public License as
+    published by the Free Software Foundation, either version 3 of the
+    License, or (at your option) any later version.
+
+    This program is distributed in the hope that it will be useful,
+    but WITHOUT ANY WARRANTY; without even the implied warranty of
+    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU Affero General Public License for more details.
+
+    You should have received a copy of the GNU Affero General Public License
+    along with this program.  If not, see <https://www.gnu.org/licenses/>.
+"""
+from bexhoma import *
+from dbmsbenchmarker import *
+import logging
+import urllib3
+import logging
+import argparse
+import time
+import pandas as pd
+from tabulate import tabulate
+from datetime import datetime
+urllib3.disable_warnings()
+logging.basicConfig(level=logging.ERROR)
+
+
+def manage():
+	description = """This tool helps managing running Bexhoma experiments in a Kubernetes cluster.
+	"""
+	print(description)
+	# argparse
+	parser = argparse.ArgumentParser(description=description)
+	parser.add_argument('mode', help='manage experiments: stop, get status, connect to dbms or connect to dashboard', choices=['stop','status','dashboard', 'master'])
+	parser.add_argument('-e', '--experiment', help='code of experiment', default=None)
+	parser.add_argument('-c', '--connection', help='name of DBMS', default=None)
+	parser.add_argument('-v', '--verbose', help='gives more details about Kubernetes objects', action='store_true')
+	parser.add_argument('-cx', '--context', help='context of Kubernetes (for a multi cluster environment), default is current context', default=None)
+	clusterconfig = 'cluster.config'
+	args = parser.parse_args()
+	if args.mode == 'stop':
+		cluster = clusters.kubernetes(clusterconfig, context=args.context)
+		if args.experiment is None:
+			experiment = experiments.default(cluster=cluster, code=cluster.code)
+			cluster.stop_sut()
+			cluster.stop_monitoring()
+			cluster.stop_benchmarker()
+		else:
+			experiment = experiments.default(cluster=cluster, code=args.experiment)
+			experiment.stop_sut()
+			cluster.stop_monitoring()
+			cluster.stop_benchmarker()
+	elif args.mode == 'dashboard':
+		cluster = clusters.kubernetes(clusterconfig, context=args.context)
+		cluster.connect_dashboard()
+	elif args.mode == 'master':
+		cluster = clusters.kubernetes(clusterconfig, context=args.context)
+		cluster.connect_master(experiment=args.experiment, configuration=args.connection)
+	elif args.mode == 'status':
+		cluster = clusters.kubernetes(clusterconfig, context=args.context)
+		app = cluster.appname
+		# get all volumes
+		pvcs = cluster.getPVCs(app=app, component='storage', experiment='', configuration='')
+		#print("PVCs", pvcs)
+		volumes = {}
+		for pvc in pvcs:
+			volumes[pvc] = {}
+			pvcs_labels = cluster.getPVCsLabels(app=app, component='storage', experiment='', configuration='', pvc=pvc)
+			#print("PVCsLabels", pvcs_labels)
+			pvc_labels = pvcs_labels[0]
+			volumes[pvc]['configuration'] = pvc_labels['configuration']
+			volumes[pvc]['experiment'] = pvc_labels['experiment']
+			volumes[pvc]['loaded [s]'] = pvc_labels['loaded']
+			if 'timeLoading' in pvc_labels:
+				volumes[pvc]['timeLoading [s]'] = pvc_labels['timeLoading']
+			else:
+				volumes[pvc]['timeLoading [s]'] = ""
+			volumes[pvc]['dbms'] = pvc_labels['dbms']
+			#volumes[pvc]['labels'] = pvcs_label
+			pvcs_specs = cluster.getPVCsSpecs(app=app, component='storage', experiment='', configuration='', pvc=pvc)
+			pvc_specs = pvcs_specs[0]
+			#print("PVCsSpecs", pvcs_specs)
+			#volumes[pvc]['specs'] = pvc_specs
+			volumes[pvc]['storage_class_name'] = pvc_specs.storage_class_name
+			volumes[pvc]['storage'] = pvc_specs.resources.requests['storage']
+			pvcs_status = cluster.getPVCsStatus(app=app, component='storage', experiment='', configuration='', pvc=pvc)
+			#print("PVCsStatus", pvcs_status)
+			volumes[pvc]['status'] = pvcs_status[0].phase
+		#print(volumes)
+		if len(volumes) > 0:
+			df = pd.DataFrame(volumes).T
+			#print(df)
+			h = ['Volumes'] + list(df.columns)
+			print(tabulate(df, headers=h, tablefmt="grid", floatfmt=".2f", showindex="always"))
+		# get all pods
+		pod_labels = cluster.getPodsLabels(app=app)
+		#print("Pod Labels", pod_labels)
+		experiment_set = set()
+		for pod, labels in pod_labels.items():
+			if 'experiment' in labels:
+				experiment_set.add(labels['experiment'])
+		#print(experiment_set)
+		for experiment in experiment_set:
+			if args.verbose:
+				print(experiment)
+			apps = {}
+			pod_labels = cluster.getPodsLabels(app=app, experiment=experiment)
+			configurations = set()
+			for pod, labels in pod_labels.items():
+				if 'configuration' in labels:
+					configurations.add(labels['configuration'])
+			for configuration in configurations:
+				logging.debug(configuration)
+				apps[configuration] = {}
+				component = 'sut'
+				apps[configuration][component] = ''
+				apps[configuration]['loaded [s]'] = ''
+				if args.verbose:
+					deployments = cluster.getDeployments(app=app, component=component, experiment=experiment, configuration=configuration)
+					print("Deployments", deployments)
+					services = cluster.getServices(app=app, component=component, experiment=experiment, configuration=configuration)
+					print("SUT Services", services)
+				pods = cluster.getPods(app=app, component=component, experiment=experiment, configuration=configuration)
+				if args.verbose:
+					print("SUT Pods", pods)
+				for pod in pods:
+					status = cluster.getPodStatus(pod)
+					#print(status)
+					if pod in pod_labels and 'experimentRun' in pod_labels[pod]:
+						experimentRun = '{}. '.format(pod_labels[pod]['experimentRun'])
+					else:
+						experimentRun = ''
+					apps[configuration][component] = "{pod} ({experimentRun}{status})".format(pod='', experimentRun=experimentRun, status=status)
+					if pod in pod_labels and 'loaded' in pod_labels[pod]:
+						if pod_labels[pod]['loaded'] == 'True':
+							#apps[configuration]['loaded'] += "True"
+							apps[configuration]['loaded [s]'] = pod_labels[pod]['timeLoading']#+' [s]'
+						elif 'timeLoadingStart' in pod_labels[pod]:
+							#apps[configuration]['loaded'] = 'Started at '+pod_labels[pod]['timeLoadingStart']
+							dt_object = datetime.fromtimestamp(int(pod_labels[pod]['timeLoadingStart']))
+							t = dt_object.strftime('%Y-%m-%d %H:%M:%S')
+							apps[configuration]['loaded [s]'] = 'Started at '+t
+						#if 'timeLoadingStart' in pod_labels[pod]:
+						#	apps[configuration]['loaded'] += ' '+pod_labels[pod]['timeLoadingStart']
+						#if 'timeLoadingEnd' in pod_labels[pod]:
+						#	apps[configuration]['loaded'] += '-'+pod_labels[pod]['timeLoadingEnd']
+						#if 'timeLoading' in pod_labels[pod]:
+						#	apps[configuration]['loaded'] += '='+pod_labels[pod]['timeLoading']+'s'
+				############
+				component = 'worker'
+				apps[configuration][component] = ''
+				if args.verbose:
+					stateful_sets = cluster.getStatefulSets(app=app, component=component, experiment=experiment, configuration=configuration)
+					print("Stateful Sets", stateful_sets)
+					services = cluster.getServices(app=app, component=component, experiment=experiment, configuration=configuration)
+					print("Worker Services", services)
+				pods = cluster.getPods(app=app, component=component, experiment=experiment, configuration=configuration)
+				if args.verbose:
+					print("Worker Pods", pods)
+				for pod in pods:
+					status = cluster.getPodStatus(pod)
+					#print(status)
+					apps[configuration][component] += "{pod} ({status})".format(pod='', status=status)
+				############
+				component = 'monitoring'
+				apps[configuration][component] = ''
+				if args.verbose:
+					stateful_sets = cluster.getStatefulSets(app=app, component=component, experiment=experiment, configuration=configuration)
+					print("Stateful Sets", stateful_sets)
+					services = cluster.getServices(app=app, component=component, experiment=experiment, configuration=configuration)
+					print("Monitoring Services", services)
+				pods = cluster.getPods(app=app, component=component, experiment=experiment, configuration=configuration)
+				if args.verbose:
+					print("Monitoring Pods", pods)
+				for pod in pods:
+					status = cluster.getPodStatus(pod)
+					#print(status)
+					apps[configuration][component] += "{pod} ({status})".format(pod='', status=status)
+				############
+				component = 'benchmarker'
+				apps[configuration][component] = ''
+				if args.verbose:
+					jobs = cluster.getJobs(app=app, component=component, experiment=experiment, configuration=configuration)
+					# status per job
+					for job in jobs:
+						success = cluster.getJobStatus(job)
+						print(job, success)
+				# all pods to these jobs
+				pods = cluster.getJobPods(app=app, component=component, experiment=experiment, configuration=configuration)
+				if args.verbose:
+					print("Benchmarker Pods", pods)
+				for pod in pods:
+					status = cluster.getPodStatus(pod)
+					#print(status)
+					if pod in pod_labels and 'client' in pod_labels[pod]:
+						experimentRun = '{}. '.format(pod_labels[pod]['client'])
+					else:
+						experimentRun = ''
+					apps[configuration][component] += "{pod} ({experimentRun}{status})".format(pod='', experimentRun=experimentRun, status=status)
+			#print(apps)
+			df = pd.DataFrame(apps)
+			df = df.T
+			df.sort_index(inplace=True)
+			df.index.name = experiment
+			#print(df)
+			h = [df.index.name] + list(df.columns)
+			print(tabulate(df, headers=h, tablefmt="grid", floatfmt=".2f", showindex="always"))
-Original file line number
+Diff line change
@@ Expand Up / @@ -6,4 +6,5 @@ tmp/* @@
 */
     jars/*
     build/*
-    bexhoma.egg-info/*
+    bexhoma.egg-info/*
+    dist/*