Map Reduce - Map Only - Cassandra Import Example

This project provides a simple example of how to write map only jobs that export data to Cassandra from hadoop, using a Cassandra driver based insert from within the map task.

The reason this example was created was that it was heard users are doing/interested in this approach but there was little documentation available for this approach.

Problem

The problem that this example solves is as follows:

how to load a lot of data into Cassandra from Hadoop in parallel without using a reducer.

Solution

The solution approach taken in this example was to create a map only mr job that leverages the DataStax 2.0 driver to insert data directly into Cassandra from within the map tasks.

This is a very simplified example that simply inserts key/value pairs (the full split is inserted). The approach can be expanded upon by either leveraging a different InputFormat to create unique splits, or by transforming data within a map task.

Steps to Execute

The following steps should be followed to execute this example:

Setup your environment
1. We used hadoop 2.2 + yarn
2. We used DataStax Enterprise 4.0.1 (OSS C* 2.0 could be used as well)
Create .ddl in Cassandra using the schema.ddl file
1. ./cqlsh -f schema.ddl
Copy pom.xml and src directory into a local directoy - clone
Use Maven to create a project
In the MRExample.java file change the following line to include your node ip addresses 6. private static final String NODES = "ENTER YOUR NODES LIST HERE";
Make a jar containing the mrexample file
Explicitly download the dependencies for the DataStax driver to pass into Hadoop
1. Find dependencies here for the DataStax Java Driver 2.0.1
Execute the following command for hadoop
1. hadoop jar {yourjar.jar} com.datastax.mrexample.MRExample -libjars cassandra-driver-core-2.0.1.jar,guava-16.0.1.jar,metrics-core-3.0.2.jar,netty-3.9.0.Final.jar,lz4-1.2.0.jar,testng-6.8.8.jar,snappy-java-1.0.4.1.jar {input path on hadoop} {output path on hadoop}

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
src/com/datastax/mrexample		src/com/datastax/mrexample
README.md		README.md
pom.xml		pom.xml
schema.ddl		schema.ddl

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Map Reduce - Map Only - Cassandra Import Example

Problem

Solution

Steps to Execute

About

Releases

Packages

Languages

jlacefie/mrexample

Folders and files

Latest commit

History

Repository files navigation

Map Reduce - Map Only - Cassandra Import Example

Problem

Solution

Steps to Execute

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages