Skip to content
4knahs edited this page Dec 20, 2012 · 31 revisions

ZKNDB framework

zkndb is a simple storage benchmarking application. So far it has implementations for HDFS, Zookeeper and NDB MySQL. It is composed by 3 packages: benchmark, metrics and storage.

zkndb

Packages

benchmark package: 
  Contains Benchmark.
  This package contains the different benchmark applications (BenchmarkImpl). 
  It is responsible for creating and running the metrics and storage.
  A BenchmarkImpl should receive the following arguments:
    argv[0] : Number of StorageImpl threads to run
    argv[1] : Period in between metric logging (ms)
    argv[2] : Execution time (ms)
    argv[3] : Time per cycle (ms)
  Optional arguments:
    argv[4] : Number of writes per cycle
    argv[5] : Number of reads per cycle

metrics package: Contains Metric and MetricsEngine. This package contains the different metrics (MetricImpl) and metric logging logic (EngineImpl). Metric contains the attributes that will be changed during the benchmark and are accessed by the MetricsEngine and Storage. MetricsEngine determines when and where to log the metrics.

storage package: Contains Storage. Contains the database dependent load generators (StorageImpl). They perform the read and writes to the database and update the Metrics.

Application Example

In the following example one can implement both the metrics and storage system independently.
/*Main of DummyBenchmarkImpl.java*/
public static void main(String[] args){
    /*Reads the inputs*/
    BenchmarkUtils.readInput(args);

    /*Sets the wanted metrics*/
    BenchmarkUtils.setMetric("ThroughputMetricImpl");

    /*Engine that aggregates the results in periods of argv[1] (ms)*/    
    BenchmarkUtils.setEngine("ThroughputEngineImpl");

    /*Database specific implementation*/
    BenchmarkUtils.setStorage("DummyStorageImpl");
    
    /*Runs the StorageImplementation in as many threads as specified in arg[0]*/
    BenchmarkUtils.run();
}

Synchronization

There is no locking between threads and it runs better for a number of threads equal or slightly bigger than the number of threads supported by the machine CPU. The MetricEngine performs only reads within periods of argv[1].
Actually there is commented code to allow synchronization which fixes the metric overflow but also increases the overhead. This synchronization is based on fine-grained locks since it is done at the Metric level. 
  The benchmark has a list of Metrics, there is a separate metric for each Storage thread. 
  MetricsEngine accesses this list within specific periods of time (argv[1]).
  For now the only reason for the synchronization is because the MetricsEngine resets the metrics after
  logging so they do not overload.

Possible future work

  Calculate the deviation of throughput between different threads instead of calculating a single average value.

Current issues

  For long executions it is possible that the throughput metric overflows since it may grow bigger than the size of a long (9,223,372,036,854,775,807 requests/whole execution).
  The Zookeeper implementation seems to limit the throughput (this is the YARN implementation).
Clone this wiki locally