Skip to content

Building Shark Master Branch

michaeljones46 edited this page May 2, 2014 · 5 revisions

Shark's latest master branch depends on Spark's master branch, which is usually not published to Maven yet. We can however publish Spark to local ivy repository.

git clone [email protected]:apache/spark.git
cd spark
sbt/sbt package publish-local

Then check out the AMPLab distribution of Apache Hive and build it.

git clone https://github.com/amplab/hive.git -b shark-0.11
cd hive
ant package

ant package builds all Hive jars and put them into build/dist directory. On the EC2 AMI, you may have to first install ant-antlr.noarch and ant-contrib.noarch:

yum install ant-antlr.noarch
yum install ant-contrib.noarch

Now check out Shark

git clone [email protected]:amplab/shark.git
cd shark

Edit the configuration file conf/shark-env.sh

#!/usr/bin/env bash

export SHARK_MASTER_MEM=1g

export HIVE_DEV_HOME="/scratch/rxin/hive"
export HIVE_HOME="$HIVE_DEV_HOME/build/dist"

SPARK_JAVA_OPTS="-Dspark.local.dir=/tmp "
SPARK_JAVA_OPTS+="-Dspark.kryoserializer.buffer.mb=10 "
SPARK_JAVA_OPTS+="-verbose:gc -XX:-PrintGCDetails -XX:+PrintGCTimeStamps "
export SPARK_JAVA_OPTS

export SCALA_VERSION=2.9.2=3
export SCALA_HOME="/scratch/rxin/scala-2.9.3"
export SPARK_HOME="/scratch/rxin/spark"
export HADOOP_HOME="/scratch/rxin/hadoop-0.20.205.0"
export JAVA_HOME="/usr/lib/jvm/java-6-openjdk/jre"

Finally, build Shark

sbt/sbt package