SparkOnKudu

Overview

This is a simple reusable lib for working with Kudu with Spark

##Functionality Current functionality supports the following functions

RDD foreachPartition with iterator and kudu client
RDD mapPartition with iterator and kudu client
DStream foreachPartition with iterator and kudu client
DStream mapPartition with iterator and kudu client
Spark SQL integration with Kudu (Basic no filter push down yet)

##Examples

Basic example ** Creating Kudu tables ** Connecting with SparkSQL ** Converting values from Kudu to SparkSQL to MlLib
Gamer example ** Creating Kudu Gamer table ** Generating Gamer data and pushing it to Kafka ** Reading Gamer data from Kafka with Spark Streaming ** Aggregating Gamer data in Spark Streaming then pushing mutations to Kudu ** Running Impala SQL on Kudu Gamer table ** Running SparkSQL on Kudu Gamer table ** Converting SparkSQL results to Vectors so we can do KMeans

##Near Future

Key SQL predict push down
Need to update POM file with public repo
Need to work with Kudu project to integrate into Kudu

##Build mvn clean package

##Setup for Gamer Example

###Setting up Kafka kafka-topics --zookeeper ZooKeeperNode:2181 --create --topic gamer --partitions 1 --replication-factor 1 kafka-topics --zookeeper ZooKeeperNode:2181 --list

###Basic Testing with Kafka kafka-console-producer --broker-list BrokerNode:9092 --topic test kafka-cocsole-consumer --zookeeper ZooKeeperNode:2181 --topic gamer --from-beginning

###Populating Kafka java -cp KuduSpark.jar org.kududb.spark.demo.gamer.KafkaProducerGenerator mriggs-strata-1.vpc.cloudera.com:9092 gamer 10000 300 1000

###create Table java -cp KuduSpark.jar org.kududb.spark.demo.gamer.CreateKuduTable mriggs-strata-1.vpc.cloudera.com gamer

###Run Spark Streaming spark-submit
--master yarn --deploy-mode client
--executor-memory 2G
--num-executors 2
--jars kudu-mapreduce-0.1.0-20150903.033037-21-jar-with-dependencies.jar
--class org.kududb.spark.demo.gamer.GamerAggergatesSparkStreaming KuduSpark.jar
BrokerNode:9092 gamer KuduMaster gamer C

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
src/main/scala/org		src/main/scala/org
.gitignore		.gitignore
GamerSetup.txt		GamerSetup.txt
LICENSE		LICENSE
README.md		README.md
notes.txt		notes.txt
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SparkOnKudu

Overview

About

Releases

Packages

Languages

License

scottsappen/SparkOnKudu

Folders and files

Latest commit

History

Repository files navigation

SparkOnKudu

Overview

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages