Skip to content

Connector Setup

Ronen Botzer edited this page Jan 5, 2020 · 2 revisions

Building the Aerospike Hadooop Connector requires maven and git.

Install Maven

Note: Do not use sudo apt-get install maven ! It may not build the packages properly. Instead, do the following. As hduser on ztg-client, in /home/hduser:

hduser@ztg-client:~$ wget http://mirror.cc.columbia.edu/pub/software/apache/maven/maven-3/3.0.5/binaries/apache-maven-3.0.5-bin.tar.gz
... ‘apache-maven-3.0.5-bin.tar.gz’ saved 
hduser@ztg-client:~$ sudo tar xzf apache-maven-3.0.5-bin.tar.gz -C /usr/local
[sudo] password for hduser: 
hduser@ztg-client:~$ cd /usr/local
hduser@ztg-client:/usr/local$ sudo ln -s apache-maven-3.0.5 maven
hduser@ztg-client:/usr/local$ ls
apache-maven-3.0.5  bin  etc  games  include  lib  man  maven  sbin  share  src

Next, set up Maven path system-wide. Add the two lines below in maven.sh:

hduser@ztg-client:/usr/local$ sudo vi /etc/profile.d/maven.sh
export M2_HOME=/usr/local/maven
export PATH=${M2_HOME}/bin:${PATH}

Login again or do:

hduser@ztg-client:/usr/local$ source /etc/profile
hduser@ztg-client:/usr/local$ cd ~

Check maven install.

hduser@ztg-client:~$ mvn -version
Apache Maven 3.0.5 (r01de14724cdef164cd33c7c8c2fe155faf9602da; 2013-02-19 05:51:28-0800)
Maven home: /usr/local/maven
Java version: 1.7.0_79, vendor: Oracle Corporation
Java home: /usr/lib/jvm/java-7-openjdk-amd64/jre
Default locale: en_US, platform encoding: UTF-8
OS name: "linux", version: "3.13.0-45-generic", arch: "amd64", family: "unix"
hduser@ztg-client:~$

Install Git

Install git to get Aerospike Hadoop Connector. As hduser on ztg-client, in /home/hduser:

hduser@ztg-client:~$ pwd
/home/hduser
hduser@ztg-client:~$ sudo apt-get install git

Check if hdclient can also access git. su to hdclient and,

hdclient@ztg-client:~$ git --version
git version 1.9.1

Install Aerospike Hadoop Connector

Note: We will install the Aerospike Hadoop Connector in ztg-client only since hdclient will compile its jars and send to the Apache Hadoop cluster for running jobs. Start in hdclient’s home directory. As hdclient on ztg-client, in /home/hdclient:

hdclient@ztg-client:~$ cd ~
hdclient@ztg-client:~$

Git - steps to download Aerospike Hadoop Connector

Aerospike has provided the Aerospike Hadoop Connector via a public repo on github. You must first fork this repo into your github account and then clone it. We walk through this example using organization acmegroup and user johndoe on github.com.

(Hint: To search Aerospike’s public repositories on github, use the search string “user:aerospike” in the search entry box for Repositories at github.com.)

johndoe has forked this repo at acmegroup/aerospike-hadoop. User johndoe has access to acmegroup organization. In user johndoe’s account settings, we can add as many id_rsa.pub keys as we want. Users with those public keys can then clone repos in acmegroup. So, we added hdclient’s public key to johndoe and then cloned the fork as follows:

hdclient@ztg-client:~$ more ~/.ssh/id_rsa.pub
ssh-rsa AAAAB3Nza[ . . . . . . . ]u2qD hdclient@ztg-client

Copy and paste this key to johndoe -> Account -> Settings -> Add SSH Keys on github’s web interface.

Now we can clone the fork and pull Aerospike’s Hadoop Connector code with examples into hdclient@ztg-client.

hdclient@ztg-client:~$ git clone [email protected]:zintecgroup/aerospike-hadoop.git
Cloning into 'aerospike-hadoop'...
Warning: Permanently added the RSA host key for IP address '192.30.252.129' to the list of known hosts.
remote: Counting objects: 1291, done.
remote: Total 1291 (delta 0), reused 0 (delta 0), pack-reused 1291
Receiving objects: 100% (1291/1291), 206.54 KiB | 0 bytes/s, done.
Resolving deltas: 100% (356/356), done.
Checking connectivity... done.
hdclient@ztg-client:~$

We now have the ~/aerospike-hadoop with the connector code in hdclient at ztg-client.

Build using Maven

hdclient@ztg-client:~$ cd ~/aerospike-hadoop
hdclient@ztg-client:~/aerospike-hadoop$ mvn clean package
[INFO] Reactor Summary:
[INFO] 
[INFO] aerospike-mapreduce ............................... SUCCESS [9.316s]
[INFO] sampledata ........................................ SUCCESS [8.747s]
[INFO] word_count_input .................................. SUCCESS [9.411s]
[INFO] aggregate_int_input ............................... SUCCESS [7.161s]
[INFO] word_count_output ................................. SUCCESS [7.741s]
[INFO] session_rollup .................................... SUCCESS [7.102s]
[INFO] generate_profiles ................................. SUCCESS [7.740s]
[INFO] external_join ..................................... SUCCESS [7.260s]
[INFO] spark_session_rollup .............................. SUCCESS [20.529s]
[INFO] aerospike-hadoop-examples ......................... SUCCESS [0.001s]
[INFO] aerospike-hadoop-parent ........................... SUCCESS [0.001s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 1:25.184s
[INFO] Finished at: Mon Jun 29 04:51:09 PDT 2015
[INFO] Final Memory: 148M/502M

Alternatively you can use gradle that runs on top of maven to build individual packages.

hdclient@ztg-client:~/aerospike-hadoop$ ls
build.gradle  gradlew      mapreduce  sampledata       WORLDCUP_FILELIST
examples      gradlew.bat  pom.xml    settings.gradle
gradle        LICENSE      README.md  TODO.md

hdclient@ztg-client:~/aerospike-hadoop$ ./gradlew :mapreduce:jar

Downloading http://services.gradle.org/distributions/gradle-1.11-bin.zip
.....
Unzipping /home/hdclient/.gradle/wrapper/dists/gradle-1.11-bin/4h5v8877arc3jhuqbm3osbr7o7/gradle-1.11-bin.zip to /home/hdclient/.gradle/wrapper/dists/gradle-1.11-bin/4h5v8877arc3jhuqbm3osbr7o7
Set executable permissions for: /home/hdclient/.gradle/wrapper/dists/gradle-1.11-bin/4h5v8877arc3jhuqbm3osbr7o7/gradle-1.11/bin/gradle
Download http://repo1.maven.org/maven2/org/sonatype/sisu/inject/cglib/2.2.1-v20090111/cglib-2.2.1-v20090111.pom
Download http://repo1.maven.org/maven2/org/sonatype/sisu/inject/cglib/2.2.1-v20090111/cglib-2.2.1-v20090111.jar
:mapreduce:compileJava
:mapreduce:processResources UP-TO-DATE
:mapreduce:classes
:mapreduce:jar
BUILD SUCCESSFUL
Total time: 2 mins 51.503 secs
hdclient@ztg-client:~/aerospike-hadoop$

We are now ready to test the examples on our Apache Hadoop cluster.