-
Notifications
You must be signed in to change notification settings - Fork 1
Getting started
4knahs edited this page Dec 20, 2012
·
31 revisions
zkndb is a simple storage benchmarking application. So far it has implementations for HDFS,
Zookeeper and NDB MySQL.
It is composed by 3 packages: benchmark, metrics and storage.
benchmark package: Contains Benchmark. This package contains the different benchmark applications (BenchmarkImpl). It is responsible for creating and running the metrics and storage. A BenchmarkImpl should receive the following arguments: argv[0] : Number of StorageImpl threads to run argv[1] : Period in between metric logging (ms) argv[2] : Execution time (ms) argv[3] : Time per cycle (ms) Optional arguments: argv[4] : Number of writes per cycle argv[5] : Number of reads per cycleIn the following example one can implement both the metrics and storage system independently. The class BenchmarkUtils implements most of the needed mechanisms to run a benchmark. Its methods setMetric, setEngine and setStorage define respectively the classes to use for metrics, metricsEngine and storage. They are implemented using java reflection and refer to classes such as ThroughputMetricImpl.java, ThroughputEngineImpl.java and DummyStorageImpl.java. The method readInput reads from the application arguments as indicated previously. The method run handles the creation, initialization and execution of all the threads and shared data.metrics package: Contains Metric and MetricsEngine. This package contains the different metrics (MetricImpl) and metric logging logic (EngineImpl). Metric contains the attributes that will be changed during the benchmark and are accessed by the MetricsEngine and Storage. MetricsEngine determines when and where to log the metrics.
storage package: Contains Storage. Contains the database dependent load generators (StorageImpl). They perform the read and writes to the database and update the Metrics.
/*Main of DummyBenchmarkImpl.java*/ public static void main(String[] args){/*Reads the inputs*/ BenchmarkUtils.readInput(args); /*Sets the wanted metrics*/ BenchmarkUtils.setMetric("ThroughputMetricImpl"); /*Engine that aggregates the results in periods of argv[1] (ms)*/ BenchmarkUtils.setEngine("ThroughputEngineImpl"); /*Database specific implementation*/ BenchmarkUtils.setStorage("DummyStorageImpl"); /*Runs the StorageImplementation in as many threads as specified in arg[0]*/ BenchmarkUtils.run(); }
There is no locking between threads and it runs better for a number of threads equal or slightly bigger than the number of threads supported by the machine CPU. The MetricEngine performs only reads within periods of argv[1]. Actually there is commented code to allow synchronization which fixes the metric overflow by reseting the metrics but also increases the overhead. This synchronization is based on fine-grained locks since it is done at the Metric level. The benchmark has a list of Metrics, there is a separate metric for each Storage thread. MetricsEngine accesses this list within specific periods of time (argv[1]). For now the only reason for the synchronization would be for the MetricsEngine to reset the metrics after logging so they do not overflow.
Calculate the deviation of throughput between different threads instead of calculating a single average value.
For long executions it is possible that the throughput metric overflows since it may grow bigger than the size of a long (9,223,372,036,854,775,807 requests/whole execution). The Zookeeper implementation seems to limit the throughput (this is the YARN implementation).