-
Notifications
You must be signed in to change notification settings - Fork 5
BigFrame User Guide
BigFrame is a benchmark generator, which captures your requirement by the 3V's, namely, Volume, Variety and Velocity emphasized in the Big Data environment. Given the benchmark specification you provided, it will generate
- The set of data for initial load (with data loading utility)
- The refresh pattern for each data set (with refresh driver)
- The query stream (with query implementation and driver to run on different systems)
- The benchmark metrics
Since BigFrame relies on hadoop to do the parallel data generation, it is a must to install hadoop beforehand.
BigFrame requires:
- JDK 1.6 is needed, JDK 1.7 is recommended.
- Hadoop 1.0.4 (other versions are not tested)
To build BigFrame from source, execute the the command,
sbt/sbt assembly
You can tailor the specification to meet you special need by modifying the file
cong/bigframe-core.xml
For example, to select an application domain to benchmark on, you can specify the corresponding domain name
<property>
<name>bigframe.application.domain</name>
<value>BI</value>
<description>
Choose the application domain you want to benchmark on.
Currently, supported applications are: BI
</description>
</property>
To specify the data volume, you can choose from five candidate size,
<property>
<name>bigframe.datavolume</name>
<value>tiny</value>
<description>
tiny: around 10GB
small: around 100GB
medium: around 1TB
large: around 10TB
extra large: around 100TB
</description>
</property>
Besides the 3V's, you can also specify which engine the query will run on. Further more, if a query involves several data types, you can even tell BigFrame which engine will do the job for each specific data type. For example,
<property>
<name>bigframe.queryengine.relational</name>
<value>hadoop</value>
</property>
<property>
<name>bigframe.queryengine.graph</name>
<value>spark</value>
</property>
<property>
<name>bigframe.queryengine.nested</name>
<value>spark</value>
</property>
<property>
<name>bigframe.queryengine.text</name>
<value>hadoop</value>
</property>
Of course, you need to install and setup the corresponding systems before actually run the queries, BigFrame will not do this job for you.
Before running BigFrame, you need to edit the conf/config.sh
to set the following variables:
HADOOP_HOME: By default, it tries to get it from the environment variables.
TPCDS_LOCAL: A temp directory to store the imtermediate data for tpcds generator.
There are other variable related to the drivers. For example, you need to tell where BigFrame can find Spark if you want to run the benchmark on Spark. This can be done by specified the SPARK_HOME parameter as follow
SPARK_HOME=/path/to/spark_home
After finish all the setup above, you can now run the BigFrame program. The first program you need to run is the data generator. To start the data generator, you can type the following command in BigFrame's root directory
/bin/datagen -mode datagen
Then, it will try to generate the set of data you specified before. Be sure that you have started the HDFS and MapReduce Engine.
After the data generation finish, you can then run the benchmark queries by this command
/bin/qgen -mode runqueries
It will prepare a set of queries based on your benchmark specification, and then run the queries on the system you specified.