Getting started

ZKNDB framework

zkndb is a simple storage benchmarking application. So far it has implementations for HDFS, Zookeeper and NDB MySQL.

It is composed by 3 packages: benchmark, metrics and storage. Bellow is the architecture of the system:

zkndb architecture

If you have little patience in reading about zkndb architecture, you can directly go to How To Build and How To Execute pages to start running existing benchmark. Else, please continue on this page to read more about the architecture.

Packages

 benchmark package: 
   Contains Benchmark.
   This package contains the different benchmark applications (BenchmarkImpl). 
   It is responsible for creating and running the metrics and storage.
   A BenchmarkImpl should receive the following arguments:
     argv[0] : Number of StorageImpl threads to run
     argv[1] : Period in between metric logging (ms)
     argv[2] : Execution time (ms)
     argv[3] : Time per cycle (ms)
   Optional arguments:
     argv[4] : Number of writes per cycle
     argv[5] : Number of reads per cycle
metrics package:
Contains Metric and MetricsEngine.
This package contains the different metrics (MetricImpl) and metric logging logic (EngineImpl).
Metric contains the attributes that will be changed during the benchmark and are accessed by the MetricsEngine
and Storage.
MetricsEngine determines when and where to log the metrics.
storage package:
Contains Storage.
Contains the database dependent load generators (StorageImpl).
They perform the read and writes to the database and update the Metrics.

Application Example

The class BenchmarkUtils implements most of the needed mechanisms to run a benchmark. Its methods setMetric, setEngine and setStorage define respectively the classes to use for metrics, metricsEngine and storage. They are implemented using java reflection and refer to classes such as ThroughputMetricImpl.java, ThroughputEngineImpl.java and DummyStorageImpl.java.

The method readInput reads from the application arguments as indicated previously.

The method run handles the creation, initialization and execution of all the threads and shared data.

In the following example one can implement both the metrics and storage system independently.

/*Main of DummyBenchmarkImpl.java*/
public static void main(String[] args){
        
        /*Reads the inputs*/
        BenchmarkUtils.readInput(args);

        /*Sets the wanted metrics*/
        BenchmarkUtils.setMetric("ThroughputMetricImpl");
    
        /*Engine that aggregates the results in periods of argv[1] (ms)*/    
        BenchmarkUtils.setEngine("ThroughputEngineImpl");

        /*Database specific implementation*/
        BenchmarkUtils.setStorage("DummyStorageImpl");
        
        /*Runs the StorageImplementation in as many threads as specified in arg[0]*/
        BenchmarkUtils.run();
    }

If you want to create a new storage implementation, create a copy of the DummyStorageImpl and refactor its name.

Running the experiment

In order to run the experiment, you should have built all the projects and deployed the storage systems(refer to [How To Build](https://github.com/4knahs/zkndb/wiki/How-to-build) for more info on the build and the deployment). The execution is explained in - [How To Execute](https://github.com/4knahs/zkndb/wiki/How-to-execute).

Synchronization?

The benchmark has a list of Metrics with a separate metric for each Storage thread.

There is no locking between threads and it runs better for a number of threads equal or slightly bigger than the number of threads supported by the machine CPU. The MetricEngine performs only non-blocking reads
within periods of argv[1] while the StorageImpl perform reads/writes (increments counter).

Actually there is commented code to allow synchronization which fixes the metric overflow by reseting the metrics but also increases the overhead. This synchronization is based on fine-grained locks since it is done at the Metric level.

Possible future work

Calculate the deviation of throughput between different threads instead of calculating a single average value.

Since reads are implemented as read my last write, it would be nice to add a mechanism to escape the caching of reads in read intensive benchmarks.

Add support for setting output filename in BenchmarkUtils.

Add mechanism to turn on/off the locking mechanism in case someone wants to run long experiments. It should read the counters now and then and only perform a lock in case it needs to reset the counter.

Add mechanisms to generate graphs from java code.

Current issues

For long executions it is possible that the throughput metric overflows since it may grow bigger than the size of a long (9,223,372,036,854,775,807 requests/whole execution).

The Zookeeper implementation seems to limit the throughput (this is the YARN implementation).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly