Skip to content

Event Information

greglu edited this page Jun 25, 2011 · 7 revisions

These instructions are meant to be used on the day of the HackReduce event. The servers will not be accessible except at the venue.

{CLUSTER NUMBER}: Will be assigned to your team at the event

Getting Started

Download the project

git clone https://github.com/hoppertravel/HackReduce.git

Setup SSH key for accessing the cluster

  1. cd ~/.ssh

  2. Obtain the key:

  • OSX: curl -O http://hackreduce-manager.hopper.to/hackreduce.tar
  • Linux: wget http://hackreduce-manager.hopper.to/hackreduce.tar
  1. tar xvf hackreduce.tar

  2. chmod 700 hackreduce.pem

Create your team folders

The team folders will be used for storing your code and data on the cluster's master node.

  1. ssh -i ~/.ssh/hackreduce.pem hadoop@hackreduce-cluster-{CLUSTER_NUMBER}.hopper.to

  2. Create the code folder: mkdir -p ~/users/{team name}. This is where you will be storing all your code (small file storage)

  3. Create the data folder: mkdir -p /mnt/users/{team name}. If there are large data files that you will be downloading to the cluster or saving the output of your jobs to, you need to store them here (otherwise you might run out of disk space!).

Executing jobs on the Amazon clusters

Compile your jar (for Java participants)

Starting on your local system:

  1. cd {HackReduce project}

  2. Compile your code with the following commands depending on whether you're using Gradle or Ant:

  • Gradle: gradle
  • Ant: ant
  1. Copy your jar to the cluster's master node:

    scp -i ~/.ssh/hackreduce.pem build/libs/{HackReduce custom}.jar hadoop@hackreduce-cluster-{CLUSTER NUMBER}.hopper.to:~/users/{team name}

  2. Log onto the cluster:

    ssh -i ~/.ssh/hackreduce.pem hadoop@hackreduce-cluster-{CLUSTER NUMBER}.hopper.to

  3. Launch your job:

    hadoop/bin/hadoop jar ~/users/{team name}/{HackReduce custom}.jar {Java job class} /datasets/{dataset chosen} /users/{team name}/job/

    e.g. hadoop/bin/hadoop jar ~/users/hopper/myjar.jar org.hackreduce.examples.bixi.RecordCounter /datasets/bixi /users/hopper/bixi_recordcounts

  4. Track the progress of your job on:

    http://hackreduce-cluster-{CLUSTER NUMBER}.hopper.to:50030

  5. When the job is finished, you can download the output from HDFS into the local file system:

    hadoop/bin/hadoop dfs -copyToLocal /users/{team name}/job /mnt/users/{team name}/

Browsing HDFS data

Web GUI

  1. Visit http://hackreduce-cluster-{CLUSTER NUMBER}.hopper.to:50070

  2. Click on "Live Nodes"

  3. Select any node in the list, and you'll see the contents of HDFS

Command line

  1. Log onto your namenode (hackreduce-cluster-{CLUSTER NUMBER}.hopper.to)

  2. Run the command hadoop/bin/hadoop dfs and see the commands

Important notes