If you'd like to pull this image directly from the Docker hub you can build the image as:
- Using caches :
docker build -t cnp2 .
- Don't use caches :
If you want to refresh config files in your image use below command to build your image. This ignores any cache for line
after
ARG RECONFIG=1
and execute every line. It is useful if you want to change config files.
docker build -t cnp2 --build-arg RECONFIG=$(date +%s) .
There are two ways to run hadoop single-node and multi-node (default config of image is single-node). You can change image behavior with some environment variable and change it to multi-node.
Just run this:
docker run -it cnp2
First create the net:
docker network create --subnet=172.20.0.0/16 hadoop-cluster
Then run the created container:
docker run --net hadoop-cluster --ip 172.20.0.22 -it ubuntu bash
At last, run slave and master nodes:
docker run --net hadoop-cluster --ip 172.20.0.11 -it -e HADOOP_HOSTS="172.20.0.10 master,172.20.0.11 slave1" cnp2
docker run --net hadoop-cluster --ip 172.20.0.10 -it -e HADOOP_HOSTS="172.20.0.10 master,172.20.0.11 slave1" -e MY_ROLE="master" cnp2
- Compile
hadoop com.sun.tools.javac.Main WordCount.java
- Create Jar file
jar cf wc.jar WordCount*.class
- Run
hadoop jar wc.jar WordCount /user/sina/data /user/sina/output
You can monitor progress on MapReduce Job Monitoring (port 8088) and HDFS Monitoring (port 50070). Also, you may use Datanode (port 50075) or MapReduce JobHistory Server (port 19888).
- See Output
hdfs dfs -ls /user/sina/output
To see a list of available File System Shell's commands, see here.