Skip to content

Commit

Permalink
JOSS: Config
Browse files Browse the repository at this point in the history
  • Loading branch information
perdelt committed Dec 28, 2024
1 parent a105a05 commit f26be8c
Showing 1 changed file with 132 additions and 0 deletions.
132 changes: 132 additions & 0 deletions paper.md
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,138 @@ KOBE [@10.1007/978-3-030-77385-4_40] for benchmarking federated query processors
* **Ingestion**: *job* of pods for data generation and for ingestion of data into the DBMS, synchronized using a Redis queue
* **Benchmarking**: *job* of pods for running the driver, synchronized using a Redis queue

## Installation

1. Download the repository: https://github.com/Beuth-Erdelt/Benchmark-Experiment-Host-Manager
1. Install the package `pip install bexhoma`
1. Make sure you have a working `kubectl` installed.
* (Also make sure to have access to a running Kubernetes cluster - for example [Minikube](https://minikube.sigs.k8s.io/docs/start/))
* (Also make sure, you can create PV via PVC and dynamic provisioning)
1. Adjust [configuration](https://bexhoma.readthedocs.io/en/latest/Config.html)
1. Copy `k8s-cluster.config` to `cluster.config`
1. Set name of context, namespace and name of cluster in that file
2. Make sure the `resultfolder` is set to a folder that exists on your local filesystem
1. Other components like the shared data and result directories, the message queue and the evaluator are installed automatically when you start an experiment. Before that, you might want to adjust
* Result directory: https://github.com/Beuth-Erdelt/Benchmark-Experiment-Host-Manager/blob/master/k8s/pvc-bexhoma-results.yml
* `storageClassName`: must be an available storage class of type `ReadWriteMany` in your cluster
* `storage`: size of the directory
* Data directory: https://github.com/Beuth-Erdelt/Benchmark-Experiment-Host-Manager/blob/master/k8s/pvc-bexhoma-data.yml
* `storageClassName`: must be an available storage class of type `ReadWriteMany` in your cluster
* `storage`: size of the directory

Bexhoma is now ready to use.

## Configuration

We here give more details about the configuration and files included in bexhoma.

### Cluster-Config

The configuration of the cluster, that is the possible host and experiment settings, is set in a file `cluster.config` and consists of these parts (see also [example](https://github.com/Beuth-Erdelt/Benchmark-Experiment-Host-Manager/blob/master/k8s-cluster.config) config file):


### Basic settings

```
'benchmarker': {
'resultfolder': './', # Local path to results folder of benchmark tool
'jarfolder': './jars/' # Optional: Local path to JDBC drivers
},
```

* `resultfolder`: Where the benchmarker puts it's result folders. Make sure this is an existing folder bexhoma can write to.
* `jarfolder`: Where the benchmarker expects the JDBC jar files. You probably should leave this as is.

### Credentials of the Cluster

You will have to adjust the name of the namespace `my_namespace`.
The rest probably can stay as is.

```
'credentials': {
'k8s': {
'appname': 'bexhoma',
'context': {
'my_context': {
'namespace': 'my_namespace',
'clustername': 'My Cluster',
'service_sut': '{service}.{namespace}.svc.cluster.local',
'port': 9091, # K8s: Local port for connecting via JDBC after port forwarding
},
```
* `my_context`: Context (name) of the cluster. Repeat this section for every K8s cluster you want to use. This also allows to use and compare several clouds.
* `my_namespace`: Namespace in the cluster. Make sure you have access to that namespace.
* `clustername`: Customize the cluster name for your convenience.


### (Hardware) Monitoring

It follows a dict of hardware metrics that should be collected per DBMS.
This probably can stay as is.
The attributes are set by bexhoms automatically so that corresponding pods can be identified.
The host is found using the service of the DBMS.
See [monitoring section](https://bexhoma.readthedocs.io/en/latest/Monitoring.html) for more details.

### Data Sources

Data sources and imports can be adressed using a key.
This probably can stay as is.
It is organized as follows:

```
'volumes': {
'tpch': {
'initscripts': {
'Schema': [
'initschema-tpch.sql',
],
'Schema_dummy': [
'initschemadummy-tpch.sql',
],
'Index': [
'initindexes-tpch.sql',
],
'Index_and_Constraints': [
'initindexes-tpch.sql',
'initconstraints-tpch.sql',
],
'Index_and_Constraints_and_Statistics': [
'initindexes-tpch.sql',
'initconstraints-tpch.sql',
'initstatistics-tpch.sql',
],
'SF1': [
'initschema-tpch.sql',
'initdata-tpch-SF1.sql',
'initdata-tpch-SF1.sh'
],
}
}
},
```

* `tpch`: Name of the data source (addressed by the corresponding experiments)
* `initscripts`: Dict of scripts to prepare the database, ingest data, create indexes etc.
It consists of
* a name, for example `Index_and_Constraints`,
* a list of script names.
The scripts `.sql` are sent to the command line tool of the DBMS (`loadData` parameter in the DBMS configuration) and the files `.sh` are executed as shell scripts.
The scripts must be present in a [config folder](https://github.com/Beuth-Erdelt/Benchmark-Experiment-Host-Manager/tree/master/experiments/tpch), say `experiments/tpch/`.

Example: For TPC-H the script `tpch.py` may run (depending on the CLI parameters)
* `Schema` before ingestion - this runs the script `initschema-tpch.sql`
* `Index_and_Constraints` after ingestion - this runs the script `initindexes-tpch.sql` and `initconstraints-tpch.sql`

The data itself is expected to be stored in a shared disk, that will be mounted into the DBMS container as `/data/`.
The examples scripts above (like `initdata-tpch-SF1.sql` for example) refer to `/data/tpch/SF1/` for example.

### DBMS

Database systems are described in the `docker` section.
Please see [DBMS section](https://bexhoma.readthedocs.io/en/latest/DBMS.html) for more informations.



# A Basic Example

The [documentation](https://bexhoma.readthedocs.io/en/latest/) contains a lot of examples.
Expand Down

0 comments on commit f26be8c

Please sign in to comment.