-
Notifications
You must be signed in to change notification settings - Fork 4
Event Information
These instructions are meant to be used on the day of the HackReduce event. The servers will not be accessible except at the venue.
{CLUSTER NUMBER}: Will be assigned to your team at the event
git clone https://github.com/hoppertravel/HackReduce.git
-
cd ~/.ssh
-
Obtain the key:
- OSX:
curl -O http://hackreduce-manager.hopper.to/hackreduce.tar
- Linux:
wget http://hackreduce-manager.hopper.to/hackreduce.tar
-
tar xvf hackreduce.tar
-
chmod 700 hackreduce.pem
The team folders will be used for storing your code and data on the cluster's master node.
-
ssh -i ~/.ssh/hackreduce.pem hadoop@hackreduce-cluster-{CLUSTER_NUMBER}.hopper.to
-
Create the code folder:
mkdir -p ~/users/{team name}
. This is where you will be storing all your code (small file storage) -
Create the data folder:
mkdir -p /mnt/users/{team name}
. If there are large data files that you will be downloading to the cluster or saving the output of your jobs to, you need to store them here (otherwise you might run out of disk space!).
Starting on your local system:
-
cd {HackReduce project}
-
Compile your code with the following commands depending on whether you're using Gradle or Ant:
- Gradle:
gradle
- Ant:
ant
-
Copy your jar to the cluster's master node:
scp -i ~/.ssh/hackreduce.pem build/libs/{HackReduce custom}.jar hadoop@hackreduce-cluster-{CLUSTER NUMBER}.hopper.to:~/users/{team name}
-
Log onto the cluster:
ssh -i ~/.ssh/hackreduce.pem hadoop@hackreduce-cluster-{CLUSTER NUMBER}.hopper.to
-
Launch your job:
hadoop/bin/hadoop jar ~/users/{team name}/{HackReduce custom}.jar {Java job class} /datasets/{dataset chosen} /users/{team name}/job/
e.g.
hadoop/bin/hadoop jar ~/users/hopper/myjar.jar org.hackreduce.examples.bixi.RecordCounter /datasets/bixi /users/hopper/bixi_recordcounts
-
Track the progress of your job on:
http://hackreduce-cluster-{CLUSTER NUMBER}.hopper.to:50030
-
When the job is finished, you can download the output from HDFS into the local file system:
hadoop/bin/hadoop dfs -copyToLocal /users/{team name}/job /mnt/users/{team name}/
-
Visit http://hackreduce-cluster-{CLUSTER NUMBER}.hopper.to:50070
-
Click on "Live Nodes"
-
Select any node in the list, and you'll see the contents of HDFS
-
Log onto your namenode (hackreduce-cluster-{CLUSTER NUMBER}.hopper.to)
-
Run the command
hadoop/bin/hadoop dfs
and see the commands
-
The number of reducers used by a job needs to be defined manually by one of the following methods:
- Java: http://hadoop.apache.org/common/docs/r0.20.2/api/org/apache/hadoop/mapreduce/Job.html#setNumReduceTasks(int)
- Streaming: http://hadoop.apache.org/common/docs/current/streaming.html#Specifying+the+Number+of+Reducers
- More information can be found on http://hadoop.apache.org/common/docs/current/mapred_tutorial.html#Reducer (emphasis on the "How Many Reduces?" section)